Server Shedding in M/M/m\\large {What Mr. Erlang Didn't See in His Own C Function}

Server Shedding in M/M/m
What Mr. Erlang Didn't See in His Own C Function

Neil Gunther

`Created Aug 19, 2010`

Abstract

The morphing model [Gunther 2005] corresponds to transforming from a set of parallel queues at low loads into a single fast queue at high loads. But how does the transformation take place? Consider that m parallel servers are equivalent m sequential queues, each with a reduced service time S/m. Therefore, the morphing transformation can be regarded as a progressive "shedding" of m−1 fast queues until only a single fast queue remains to service high loads. We try to formalize that idea here.

1 Residence Times
    1.1 Exact Formula
    1.2 Approximate Formula
2 Numerical Comparisons
    2.1 Erlang-C Function
    2.2 Exact Residence Times
3 Morphing Models
    3.1 Morphing Multiserver
    3.2 Parallel Residence Time
    3.3 Tandem Residence Time
    3.4 Meaning of Morphing
A Appendices
    A.1 Series
    A.2 Approximation Errors
    A.3 Revisit the Repairman
B Bibliography

1 Residence Times

An M/M/m queue has capacity m ρ. The capacity cannot exceed m servers. The arriving traffic intensity is A = λS, expressed in Erlangs.

To avoid unbounded queue growth, the traffic intensity cannot exceed capacity, i.e., A < m ρ. Therefore, the per-server utilization is:

ρ <

λS

(1)

where λ is the aggregate arrival rate into the system.

1.1 Exact Formula

The exact equation for the residence time R_E (time in the system) is given by

R_E = S +

C(m,A)

m (1 − ρ)

(2)

Here, S is the mean service time, C is the Erlang-C (Call waiting) function

C(m,A) =

B(m,A)

1 − [1 − B(m,A)] ρ

(3)

and B is the Erlang-B (Blocked call) function

B(m,A) =

A^m

e^−A

m
∑
k=0

A^k

e^−A

(4)

To give some idea of the complexity associated with the Erlang formulæ, Fig. 1 shows their expansion as power series in the server utilization, ρ.

Figure 1: Expansion of Erlang B and C as rational functions of ρ for m=1,2,…,6 servers

1.2 Approximate Formula

The approximate residence time R_A, derived in [Gunther 2005, § 2.7], is given by:

R_A =

1 − ρ^m

(5)

R_A ≅ R_E typically to within 10%. See tables in Appendix A.2.

2 Numerical Comparisons

The following tables show numerical results for Erlang-C and R_E/S as functions of ρ at selected values of m.

2.1 Erlang-C Function

Table 1 shows values of the Erlang-C function (3) for m = 1, 2, 3, and 10.

Table 1: Selected values of the Erlang-C function

ρ	m=1	m=2	m=3	m=10
0.1	0.1	0.0181818	0.0037037	1.12642 ×10⁻⁷
0.2	0.2	0.0666667	0.0246575	0.0000477373
0.25	0.25	0.1	0.0441176	0.000287631
0.5	0.5	0.333333	0.236842	0.0361054
0.7	0.7	0.576471	0.492344	0.221731
0.75	0.75	0.642857	0.567757	0.306611
0.8	0.8	0.711111	0.647191	0.40918
0.85	0.85	0.781081	0.730377	0.529861
0.9	0.9	0.852632	0.817061	0.668732
0.95	0.95	0.925641	0.907009	0.825586
0.98	0.98	0.970101	0.962451	0.928161

2.2 Exact Residence Times

For any service time S, multiply the appropriate entry in Table 2 by that value.

Table 2: Normalized residence times

ρ	m=1	m=2	m=3	m=10
0.1	1.11111	1.0101	1.00137	1.
0.2	1.25	1.04167	1.01027	1.00001
0.25	1.33333	1.06667	1.01961	1.00004
0.5	2.	1.33333	1.15789	1.00722
0.7	3.33333	1.96078	1.54705	1.07391
0.75	4.	2.28571	1.75701	1.12264
0.8	5.	2.77778	2.07865	1.20459
0.85	6.66667	3.6036	2.62306	1.35324
0.9	10.	5.26316	3.72354	1.66873
0.95	20.	10.2564	7.04672	2.65117
0.98	50.	25.2525	17.0409	5.64081

3 Morphing Models

The objective is to derive (2) in a more intuitive way than is found in standard textbooks or Erlang's original papers [Erlang 1909,Erlang 1917,Erlang 1920]. An exemplar is the following derivation of the M/M/1 residence time that does not resort to the use of Markov chains or other probabilisitic formalism.

Starting with the arrival theorem (from Mean Value Analysis) [Gunther 1998]:

= S + W

= S + QS

= S + (λR)S … from Q = λR

= S + ρR … from ρ = λS

On rearrangement, we find:

R =

1 − ρ

(6)

If a similar derivation is not achievable for M/M/m, then I want to understand why not. What is it that makes the simple addition of servers to the same waiting line, so much more complicated?

3.1 Morphing Multiserver

In [Gunther 2005, § 2.7], I show that (2) can be understood as arising from a kind of "morphing" process whereby the M/M/m multiserver queue behaves like a set of m parallel queues with little or no waiting at low loads and transforms itself into a single, high-speed, M/M/1 queue at high loads.

Such morphing can be demonstrated formally by generalizing (6) as:

R = φ(m, ρ)




1 − ρ




(7)

where the morphing factor

φ(m, ρ) =

1 + ρ+ ρ² + ... ρ^m−1

1 − ρ

1 − ρ^m

(8)

is the inverse of a finite geometric series with ρ defined in (1). See Section A.1.

This is an attractively simple result from a mathematical standpoint. However, substituting (8) into (7) yields the approximate formula (5), not the exact Erlang formula (2). Although morphing (in this sense) does not produce the exact result, it does suggest that the objective of a more intuitive derivation of the Erlang formula (2) is achievable.

Remark 1 The motivation for the morphing function (8) comes from mathematical induction, starting with m = 1, 2, but why it works is never explained in [Gunther 2005]. How does this morphing occur and why is it associated with a geometric series? Some possible connections are:

The population in an M/M/1 is geometrically distributed with mean p/q = ρ/(1 − ρ)

Standard derivation of Erlang-C [Gunther 2010, § 5.1, eqns. 37a-g]

Markov chains and product-form QNMs [Gunther 2005]

Bubble diagrams [Gunther 1998, Chap. 14]

A finite geometric series is associated with scalability in the IBM MP factor [Gunther 2002]

Coxian stages [Allen 1990, § 3.2.11] and [Gunther 2002, § 3.2]

Repairman model in Section A.3

Remark 2 In the limit m → ∞, (8) becomes:

φ(m → ∞, ρ) = 1
1 + ρ+ ρ² + ...
→ (1 − ρ)

and R → S in (7). See Section A.1. This limit is consistent with an infinite server or delay center.
Let's see if we can address Remark 1 about the dynamics of morphing.

3.2 Parallel Residence Time

The aggregate arrival rate λ in Figure 2 is split into m streams, each going to one of the m individual queues.

Figure 2: Parallel M/M/1 queues

Each queue only sees one m-th of the arrivals or an arrival rate λ/m. The corresponding residence time:

R_para =

1 −







(9)

is the same for each queue because they are operating in parallel.

Eqn.(9) is identical to an M/M/1 residence time but with the arrival rate scaled by m.

3.3 Tandem Residence Time

The aggregate arrival rate in Figure 3 is λ and is the same at each of the individual queues but the sevice times are shorter by a factor of m. In other words, each server is running m times faster than a server in Figure 2.

Figure 3: Tandem M/M/1 queues

The corresponding residence time at each queue is therefore:

R_m =

S/m

1 − λ







(10)

Notice that (10) is identical to an M/M/1 residence time but with the service time scaled by m.

Remark 3 We also see (10) contained in the waiting time of (2).
Rewriting (2) more explicitly:

R_E = S + W = S + C




S/m

1 − λS/m




(11)

the component of the W term in parentheses is identical to R_m in (10).

Theorem 1 [Parallelism] m parallel queues, each with service time S, are identical to m queues arranged sequentially with service times S/m.
Proof 1 The total time spent in Figure 3 is the sum of the residence times at each of the m queues:

R_tand = m × R_m = S
1 − λ(S/m)

(12)
Since (12) is identical to (9), Theorem 1 follows immediately.
Corollary 1 [Parallel is Serial] Fig. 2 is identical to Fig. 3, by Thm 1, so all parallel processing is just a form of fast sequential processing.
Indeed, this is precisely how parallel queues are implemented in PDQ [Gunther 2005, See, e.g., § 7.4.1].

3.4 Meaning of Morphing

By defining

η_m(ρ) = 1 + ρ+ ρ² + ... ρ^m−1

(13)

the morphing factor (8) can be rewritten more simply as

φ =

η_m(ρ)

(14)

and (7) becomes

R_η =

η_m(ρ)




S/m

1 − λ (S/m)




(15)

We write (13) with a subscript m to remind ourselves that for given number of servers, the morphing is controlled by the value of ρ.

Moreover, (15) is identical to R_A in (5) but now expressed in terms of a fast M/M/1 queue. Numerical comparisons for m = 10 are shown in Table 3.

Table 3: Morphing residence times for fixed server speed S/10

ρ	η	10/η	R_η	R_A
0	1	10	1	1
0.1	1.11111	9.	1.	1.
0.2	1.25	8.	1.	1.
0.3	1.42856	7.00004	1.00001	1.00001
0.4	1.66649	6.00063	1.0001	1.0001
0.5	1.99805	5.00489	1.00098	1.00098
0.6	2.48488	4.02433	1.00608	1.00608
0.65	2.81868	3.54776	1.01365	1.01365
0.7	3.23917	3.08721	1.02907	1.02907
0.75	3.77475	2.64918	1.05967	1.05967
0.8	4.46313	2.24058	1.12029	1.12029
0.85	5.35417	1.8677	1.24514	1.24514
0.9	6.51322	1.53534	1.53534	1.53534
0.95	8.02526	1.24607	2.49213	2.49213
0.99	9.56179	1.04583	10.4583	10.4583

The ratio m/η in the third column of Table 3 provides a new interpretation of the morphing multiserver. What is m/η?

Prob a server is busy: ρ = A/m.
Prob all m servers are busy (on average) at offered load:

0 ≤ η_m(ρ)
m
< 1
(16)

Therefore, the average number of stages visited at load ρ is:

1 (high load) <

η_m(ρ)

≤ m (low load)

(17)

Theorem 2 [Morphing] At very low loads, the fraction m/η of stages visited in Fig. 3 is m since η = 1. This is logically equivalent to parallel processing by virtue of Theorem 1. As the load increases, the fraction of stages visited decreases until at very high loads only a single fast M/M/1 stage is visited since m/η → 1.
At first, Theorem 2 may seem counterintuitive. How can less be more? How can fewer servers produce more performance?

Consider Figure 4 which shows the relative residence times (R/S) for parallel and tandem queues. The top curve corresponds to the the parallel queues in Figure 2. The bottom curve corresponds to R_m/S for an M/M/1 queue with a server that is 10 times faster than the parallel servers. The middle curves correspond to R_E/S (solid) and R_η/S (dashed).

Figure 4: Comparison of parallel and sequential servers. The dots correspond to selected values of m/η from Table 3.

At zero load (ρ ≅ 0), R_η behaves like R_para as discussed in Section 3.2, whereas at high load (ρ ≅ 1), R_η behaves R_m as discussed in Section 3.3. The dots on the dashed middle-curve correspond to the fraction of tandem stages visited at those load values. For example, from Table 3 there are only 8 stages out of the 10 total stages in operation at ρ = 0.2. At ρ = 0.9, there are only 1.5 stages active.

As the load increases, R_η interpolates between these two extremes by decreasing the total delay through the stages in Figure 3 as the waiting lines tend to grow. The total delay is reduced because requests end up visiting fewer stages than would have been the case in the original parallel configuration.

Figure 5: Visiting fewer tandem stages reduce total delay relative to parallel servers

The reduction in visited stages is shown by the downward arrows in Figure 5. The length of each arrow is given by




1 −




R_para

(18)

What causes fewer tandem stages to be visited?

Conjecture 1 [ Morphing Algorithm]
NJG on Sat Sep 11, 2010
Referring to Figure 3:

Min: A request must start service somewhere ⇒ visit at least one tandem stage.

Max: The total accrued service time from visiting multiple stages should strive to reach S = m × S/m, but not exceed it.

On completion of service S/m at a given stage, unless condition 2 is already met, the request looks for a new idle stage to visit. See Remark 4.

Having satisfied 1, if no idle stage is available, the request exists the system.

If all tandem stages are busy on arrival, the request joins the waiting line of a random stage.

Remark 4 We don't revisit the same M/M/1 stage to avoid any complications from feedback. All routing is feedforward.
Point 4 explains how the apparent stage-reduction to a single fast-M/M/1 queue occurs.

As the traffic rate increases, the number of busy tandem stages increases. The same is true for the parallel queues in Fig. 2.
Under heavy traffic, however, an incoming request may be serviced by an idle stage but then it departs in Coxian-like fashion (Remark 1.6). All m-stages in Fig. 3 are still present but only one of them is visited.
Under very heavy traffic, an incoming request will find all stages busy, in which case it joins the waiting line at any stage. Otherwise, if a waiting line was only allowed to form at one (favored) stage, the other stages would deplete thereby allowing multiple stages to be visited on completion in contradiction to the m/η→ 1 limit in Theorem 2.

Remark 5 Is this a rendition of von Neumann's minimax principle?
Remark 6 Should be formally proveable that algorithm 1 leads to the finite geometric series in (13). See Section A.3.
Remark 7 With enough statistical samples, algorithm 1 will produce a fractional number of stages, e.g., 3.54776 in Table 3 at ρ = 0.65.
Remark 8 Algorithm 1 may also explain why the morphing picture underestimates R_E in (2) at high loads. Under heavy traffic, waiting lines can form at all m tandem stages whereas there is only a single waiting line in M/M/m.
Corollary 2 [Morphing is Asymmetric] By virtue of algorithm 1 the symmetry with the original parallel queues is lost. The reduction of visits to m/η stages means that the tandem configuration cannot be mapped back to a smaller subset of parallel queues. They cannot be equivalent because the service time remains fixed at S/m.

A Appendices

A.1 Series

1 − ρ

= 1 + ρ + ρ² + ρ³ + ...

(19)

1 − ρ^m

= 1 + ρ^m + ρ^2m + ρ^3m + ...

(20)

1 − ρ^m

1 − ρ

= 1 + ρ + ρ² + ρ³ + ... + ρ^m−1

(21)

1 − ρ^m




1 − ρ

1 − ρ^m




1 − ρ

(22)

A.2 Approximation Errors

The negative sign means that R_A in (5) underestimates R_E in (2).

Table 4: Error range for m = 3

ρ	R_E	R_A	% Error
0.1	1.00137	1.001	-0.0370233
0.2	1.01027	1.00806	-0.218699
0.3	1.03335	1.02775	-0.541718
0.4	1.07843	1.06838	-0.932401
0.5	1.15789	1.14286	-1.2987
0.6	1.29562	1.27551	-1.55217
0.65	1.40118	1.3786	-1.61175
0.7	1.54705	1.52207	-1.61465
0.75	1.75701	1.72973	-1.55262
0.8	2.07865	2.04918	-1.41781
0.85	2.62306	2.59151	-1.20265
0.9	3.72354	3.69004	-0.899678
0.95	7.04672	7.01139	-0.501367
0.99	33.7056	33.6689	-0.1089

Maximum error ≈ 1.6%.

Table 5: Error range for m = 10

ρ	R_E	R_A	% Error
0	1	1	0
0.1	1.	1.	-1.2416 ×10⁻⁶
0.2	1.00001	1.	-0.000586472
0.3	1.00017	1.00001	-0.0159397
0.4	1.00147	1.0001	-0.136225
0.5	1.00722	1.00098	-0.619879
0.6	1.02532	1.00608	-1.87662
0.65	1.04392	1.01365	-2.89985
0.7	1.07391	1.02907	-4.17556
0.75	1.12264	1.05967	-5.60912
0.8	1.20459	1.12029	-6.99822
0.85	1.35324	1.24514	-7.98864
0.9	1.66873	1.53534	-7.99359
0.95	2.65117	2.49213	-5.99887
0.99	10.6374	10.4583	-1.68363

Maximum error ≈ 8%.

Table 6: Error range for m = 20

ρ	R_E	R_A	% Error
0	1	1	0
0.1	1.	1.	-3.5527 ×10⁻¹³
0.2	1.	1.	-6.4666 ×10⁻⁸
0.3	1.	1.	-0.0000380073
0.4	1.00002	1.	-0.00220676
0.5	1.00037	1.	-0.037202
0.6	1.00302	1.00004	-0.297128
0.65	1.00715	1.00018	-0.692049
0.7	1.01559	1.0008	-1.45678
0.75	1.03209	1.00318	-2.8006
0.8	1.06402	1.01166	-4.92056
0.85	1.12835	1.04032	-7.80161
0.9	1.27538	1.1384	-10.7404
0.95	1.7554	1.55881	-11.1991
0.99	5.73933	5.4917	-4.31472

Maximum error ≈ 11%.

Table 7: Error range for m = 64

ρ	R_E	R_A	% Error
0	1	1	0
0.1	1.	1.	0.
0.2	1.	1.	0.
0.3	1.	1.	0.
0.4	1.	1.	-3.4956 ×10⁻¹⁰
0.5	1.	1.	-1.3324 ×10⁻⁶
0.6	1.	1.	-0.000404175
0.65	1.00004	1.	-0.00361036
0.7	1.00023	1.	-0.0229767
0.75	1.00111	1.	-0.111269
0.8	1.00438	1.	-0.435968
0.85	1.01495	1.00003	-1.46965
0.9	1.04854	1.00118	-4.51682
0.95	1.18375	1.03899	-12.2294
0.99	2.41554	2.10791	-12.7355

Maximum error ≈ 13%.

Table 8: Error range for m = 128

ρ	R_E	R_A	% Error
0	1	1	0
0.1	1.	1.	0.
0.2	1.	1.	0.
0.3	1.	1.	0.
0.4	1.	1.	0.
0.5	1.	1.	-2.0206 ×10⁻¹²
0.6	1.	1.	-1.1883 ×10⁻⁷
0.65	1.	1.	-7.2607 ×10⁻⁶
0.7	1.	1.	-0.000216284
0.75	1.00004	1.	-0.00354116
0.8	1.00036	1.	-0.035532
0.85	1.00244	1.	-0.243851
0.9	1.01319	1.	-1.30205
0.95	1.07153	1.00141	-6.54384
0.99	1.67795	1.3817	-17.6558

Maximum error ≈ 18%.

A.3 Revisit the Repairman

Need to review the M/M/1//N queue more carefully [Allen 1990, p.270].

A = λS =

(23)

and the steady-state probabilities:

p_k = p₀ A^k , k = 0,1,2, …, N

(24)

Sum of states:

N
∑
k=0

p_k = 1

(25)

The normalizing probability is a finite geometric series:

p₀ =

1 − A

1 − A^N+1

(26)

which is the same expression as φ(m,ρ) in (8).

Proof 2

1

= p₀ + p₁ + …+ p_N

= p₀ 
 1 + p₁
p₀
+ p₂
p₀
+ p₃
p₀
…+ p_N
p₀



= p₀ 
 1 + A + A² + A³ + …+ A^N 


= p₀ 1 − A^N+1
1 − A

Not all traffic reaching the system enters it. The probability that a request is not admitted if there are K requests in the system, is p_K. The mean (effective) arrival rate is then:

λ_A = λ(1 − p_N)

(27)

Probability that an arriving request finds n requests already in the system

q_k =

p_k

1 − p_N

(28)

Also cf. ratio (3)

1 − (1 − B)ρ

(29)

B Bibliography

[Allen 1990]: A. Allen, "Probability, Statistics, and Queueing Theory," Academic Press, 1990
[Erlang 1909]: A. Erlang, "The Theory of Probabilities and Telephone Conversations," Nyt Tidsskrift for Matematik B, v. 20, p. 33, 1909
[Erlang 1917]: A. Erlang, "Solution of Some Problems in the Theory of Probabilities of Significance in Automatic Telephone Exchanges," Electroteknikeren, v. 13, p. 5, 1917
[Erlang 1920]: A. Erlang, "Telephone Waiting Times," Nyt Tidsskrift for Matematik B, v. 31, p. 25, 1920
[Gunther 1998]: N. Gunther, The Practical Performance Analyst, McGraw-Hill, 1998
[Gunther 2002]: N. J. Gunther, "A New Interpretation of Amdahl's Law and Geometric Scalability," arxiv.org/abs/cs/0210017
[Gunther 2005]: N. J. Gunther, Analyzing Computer System Performance with Perl::PDQ, Springer 2005
[Gunther 2010]: N. J. Gunther, "Erlang Redux," Unpublished manuscript

File translated from T_EX by T_TH, version 3.81.
On 28 Jul 2016, 16:56.

Server Shedding in M/M/mWhat Mr. Erlang Didn't See in His Own C Function

Neil Gunther

Created Aug 19, 2010

Abstract

Contents

1 Residence Times

1.1 Exact Formula

1.2 Approximate Formula

2 Numerical Comparisons

2.1 Erlang-C Function

2.2 Exact Residence Times

3 Morphing Models

3.1 Morphing Multiserver

3.2 Parallel Residence Time

3.3 Tandem Residence Time

3.4 Meaning of Morphing

A Appendices

A.1 Series

A.2 Approximation Errors

A.3 Revisit the Repairman

B Bibliography

Server Shedding in M/M/m
What Mr. Erlang Didn't See in His Own C Function

`Created Aug 19, 2010`