Server Shedding in M/M/m\\large {What Mr. Erlang Didn't See in His Own C Function}

Server Shedding in M/M/m
What Mr. Erlang Didn't See in His Own C Function

Neil Gunther

Created Aug 19, 2010

Abstract

The morphing model [Gunther 2005] corresponds to transforming from a set of parallel queues at low loads into a single fast queue at high loads. But how does the transformation take place? Consider that m parallel servers are equivalent m sequential queues, each with a reduced service time S/m. Therefore, the morphing transformation can be regarded as a progressive "shedding" of m−1 fast queues until only a single fast queue remains to service high loads. We try to formalize that idea here.

Contents

1  Residence Times
    1.1  Exact Formula
    1.2  Approximate Formula
2  Numerical Comparisons
    2.1  Erlang-C Function
    2.2  Exact Residence Times
3  Morphing Models
    3.1  Morphing Multiserver
    3.2  Parallel Residence Time
    3.3  Tandem Residence Time
    3.4  Meaning of Morphing
A  Appendices
    A.1  Series
    A.2  Approximation Errors
    A.3  Revisit the Repairman
B  Bibliography

1  Residence Times

An M/M/m queue has capacity m ρ. The capacity cannot exceed m servers. The arriving traffic intensity is A = λS, expressed in Erlangs.
To avoid unbounded queue growth, the traffic intensity cannot exceed capacity, i.e., A < m ρ. Therefore, the per-server utilization is:
ρ  <   λS

m
(1)
where λ is the aggregate arrival rate into the system.

1.1  Exact Formula

The exact equation for the residence time RE (time in the system) is given by
RE  =  S  +  C(m,A)

m (1 − ρ)
 S.
(2)
Here, S is the mean service time, C is the Erlang-C (Call waiting) function
C(m,A)  =  B(m,A)

1 − [1 − B(m,A)] ρ
(3)
and B is the Erlang-B (Blocked call) function
B(m,A)  = 
Am

m!
 e−A

m

k=0 
  Ak

k!
 e−A
 .
(4)
To give some idea of the complexity associated with the Erlang formulæ, Fig. 1 shows their expansion as power series in the server utilization, ρ.
Figure 1: Expansion of Erlang B and C as rational functions of ρ for m=1,2,…,6 servers

1.2  Approximate Formula

The approximate residence time RA, derived in [Gunther 2005, § 2.7], is given by:
RA  =  S

1 − ρm
(5)
RA ≅ RE typically to within 10%. See tables in Appendix A.2.

2  Numerical Comparisons

The following tables show numerical results for Erlang-C and RE/S as functions of ρ at selected values of m.

2.1  Erlang-C Function

Table 1 shows values of the Erlang-C function (3) for m = 1, 2, 3,  and 10.
Table 1: Selected values of the Erlang-C function
ρ m=1 m=2 m=3 m=10
0.1 0.1 0.0181818 0.0037037 1.12642 ×10−7
0.2 0.2 0.0666667 0.0246575 0.0000477373
0.25 0.25 0.1 0.0441176 0.000287631
0.5 0.5 0.333333 0.236842 0.0361054
0.7 0.7 0.576471 0.492344 0.221731
0.75 0.75 0.642857 0.567757 0.306611
0.8 0.8 0.711111 0.647191 0.40918
0.85 0.85 0.781081 0.730377 0.529861
0.9 0.9 0.852632 0.817061 0.668732
0.95 0.95 0.925641 0.907009 0.825586
0.98 0.98 0.970101 0.962451 0.928161

2.2  Exact Residence Times

For any service time S, multiply the appropriate entry in Table 2 by that value.
Table 2: Normalized residence times
ρ m=1 m=2 m=3 m=10
0.1 1.11111 1.0101 1.00137 1.
0.2 1.25 1.04167 1.01027 1.00001
0.25 1.33333 1.06667 1.01961 1.00004
0.5 2. 1.33333 1.15789 1.00722
0.7 3.33333 1.96078 1.54705 1.07391
0.75 4. 2.28571 1.75701 1.12264
0.8 5. 2.77778 2.07865 1.20459
0.85 6.66667 3.6036 2.62306 1.35324
0.9 10. 5.26316 3.72354 1.66873
0.95 20. 10.2564 7.04672 2.65117
0.98 50. 25.2525 17.0409 5.64081

3  Morphing Models

The objective is to derive (2) in a more intuitive way than is found in standard textbooks or Erlang's original papers [Erlang 1909,Erlang 1917,Erlang 1920]. An exemplar is the following derivation of the M/M/1 residence time that does not resort to the use of Markov chains or other probabilisitic formalism.
Starting with the arrival theorem (from Mean Value Analysis) [Gunther 1998]:

=  S + W
 
=  S + QS
 
=  S + (λR)S   … from Q = λR
 
=  S + ρR   … from ρ = λS
On rearrangement, we find:
R =   S

1 − ρ
(6)
If a similar derivation is not achievable for M/M/m, then I want to understand why not. What is it that makes the simple addition of servers to the same waiting line, so much more complicated?

3.1  Morphing Multiserver

In [Gunther 2005, § 2.7], I show that (2) can be understood as arising from a kind of "morphing" process whereby the M/M/m multiserver queue behaves like a set of m parallel queues with little or no waiting at low loads and transforms itself into a single, high-speed, M/M/1 queue at high loads.
Such morphing can be demonstrated formally by generalizing (6) as:
R  =  φ(m, ρ)  
S

1 − ρ

(7)
where the morphing factor
φ(m, ρ)  =  1

1 + ρ+ ρ2 + ... ρm−1
  =   1 − ρ

1 − ρm
(8)
is the inverse of a finite geometric series with ρ defined in (1). See Section A.1.
This is an attractively simple result from a mathematical standpoint. However, substituting (8) into (7) yields the approximate formula (5), not the exact Erlang formula (2). Although morphing (in this sense) does not produce the exact result, it does suggest that the objective of a more intuitive derivation of the Erlang formula (2) is achievable.
Remark 1 The motivation for the morphing function (8) comes from mathematical induction, starting with m = 1, 2, but why it works is never explained in [Gunther 2005]. How does this morphing occur and why is it associated with a geometric series? Some possible connections are:
  1. The population in an M/M/1 is geometrically distributed with mean p/q = ρ/(1 − ρ)
  2. Standard derivation of Erlang-C [Gunther 2010, § 5.1, eqns. 37a-g]
  3. Markov chains and product-form QNMs [Gunther 2005]
  4. Bubble diagrams [Gunther 1998, Chap. 14]
  5. A finite geometric series is associated with scalability in the IBM MP factor [Gunther 2002]
  6. Coxian stages [Allen 1990, § 3.2.11] and [Gunther 2002, § 3.2]
  7. Repairman model in Section A.3
Remark 2 In the limit m → ∞, (8) becomes:
φ(m → ∞, ρ)  =  1

1 + ρ+ ρ2 + ...
  →   (1 − ρ)
and R → S in (7). See Section A.1. This limit is consistent with an infinite server or delay center.
Let's see if we can address Remark 1 about the dynamics of morphing.

3.2  Parallel Residence Time

The aggregate arrival rate λ in Figure 2 is split into m streams, each going to one of the m individual queues.
Figure 2: Parallel M/M/1 queues


Each queue only sees one m-th of the arrivals or an arrival rate λ/m. The corresponding residence time:
Rpara  =  S

1 −
λ

m

 S
(9)
is the same for each queue because they are operating in parallel.
Eqn.(9) is identical to an M/M/1 residence time but with the arrival rate scaled by m.

3.3  Tandem Residence Time

The aggregate arrival rate in Figure 3 is λ and is the same at each of the individual queues but the sevice times are shorter by a factor of m. In other words, each server is running m times faster than a server in Figure 2.
Figure 3: Tandem M/M/1 queues


The corresponding residence time at each queue is therefore:
Rm  =  S/m

1 − λ 
S

m

(10)
Notice that (10) is identical to an M/M/1 residence time but with the service time scaled by m.
Remark 3 We also see (10) contained in the waiting time of (2).
Rewriting (2) more explicitly:
RE  =  S + W  =  S  +  C
S/m

1 − λS/m

(11)
the component of the W term in parentheses is identical to Rm in (10).
Theorem 1 [Parallelism] m parallel queues, each with service time S, are identical to m queues arranged sequentially with service times S/m.
Proof 1 The total time spent in Figure 3 is the sum of the residence times at each of the m queues:
Rtand  =  m × Rm  =  S

1 − λ(S/m)
(12)
Since (12) is identical to (9), Theorem 1 follows immediately.
Corollary 1 [Parallel is Serial] Fig. 2 is identical to Fig. 3, by Thm  1, so all parallel processing is just a form of fast sequential processing.
Indeed, this is precisely how parallel queues are implemented in PDQ [Gunther 2005, See, e.g., § 7.4.1].

3.4  Meaning of Morphing

By defining
ηm(ρ)  =  1 + ρ+ ρ2 + ... ρm−1
(13)
the morphing factor (8) can be rewritten more simply as
φ =   1

ηm(ρ)
(14)
and (7) becomes
Rη  =   m

ηm(ρ)
 
  S/m

1 − λ (S/m)
 
(15)
We write (13) with a subscript m to remind ourselves that for given number of servers, the morphing is controlled by the value of ρ.
Moreover, (15) is identical to RA in (5) but now expressed in terms of a fast M/M/1 queue. Numerical comparisons for m = 10 are shown in Table 3.
Table 3: Morphing residence times for fixed server speed S/10
ρ η 10/η Rη RA
0 1 10 1 1
0.1 1.11111 9. 1. 1.
0.2 1.25 8. 1. 1.
0.3 1.42856 7.00004 1.00001 1.00001
0.4 1.66649 6.00063 1.0001 1.0001
0.5 1.99805 5.00489 1.00098 1.00098
0.6 2.48488 4.02433 1.00608 1.00608
0.65 2.81868 3.54776 1.01365 1.01365
0.7 3.23917 3.08721 1.02907 1.02907
0.75 3.77475 2.64918 1.05967 1.05967
0.8 4.46313 2.24058 1.12029 1.12029
0.85 5.35417 1.8677 1.24514 1.24514
0.9 6.51322 1.53534 1.53534 1.53534
0.95 8.02526 1.24607 2.49213 2.49213
0.99 9.56179 1.04583 10.4583 10.4583
The ratio m/η in the third column of Table 3 provides a new interpretation of the morphing multiserver. What is m/η?
  1. Prob a server is busy: ρ = A/m.
  2. Prob all m servers are busy (on average) at offered load:
    0   ≤   ηm(ρ)

    m
      <   1
    (16)
Therefore, the average number of stages visited at load ρ is:
1  (high load)   <   m

ηm(ρ)
  ≤   m  (low load)
(17)
Theorem 2 [Morphing] At very low loads, the fraction m/η of stages visited in Fig. 3 is m since η = 1. This is logically equivalent to parallel processing by virtue of Theorem 1. As the load increases, the fraction of stages visited decreases until at very high loads only a single fast M/M/1 stage is visited since m/η →  1.
At first, Theorem 2 may seem counterintuitive. How can less be more? How can fewer servers produce more performance?
Consider Figure 4 which shows the relative residence times (R/S) for parallel and tandem queues. The top curve corresponds to the the parallel queues in Figure 2. The bottom curve corresponds to Rm/S for an M/M/1 queue with a server that is 10 times faster than the parallel servers. The middle curves correspond to RE/S (solid) and Rη/S (dashed).
Figure 4: Comparison of parallel and sequential servers. The dots correspond to selected values of m/η from Table 3.


At zero load (ρ ≅ 0), Rη behaves like Rpara as discussed in Section 3.2, whereas at high load (ρ ≅ 1), Rη behaves Rm as discussed in Section 3.3. The dots on the dashed middle-curve correspond to the fraction of tandem stages visited at those load values. For example, from Table 3 there are only 8 stages out of the 10 total stages in operation at ρ = 0.2. At ρ = 0.9, there are only 1.5 stages active.
As the load increases, Rη interpolates between these two extremes by decreasing the total delay through the stages in Figure 3 as the waiting lines tend to grow. The total delay is reduced because requests end up visiting fewer stages than would have been the case in the original parallel configuration.
Figure 5: Visiting fewer tandem stages reduce total delay relative to parallel servers


The reduction in visited stages is shown by the downward arrows in Figure 5. The length of each arrow is given by

1  −  1

η

 Rpara
(18)
What causes fewer tandem stages to be visited?
Conjecture 1 [ Morphing Algorithm]
NJG on Sat Sep 11, 2010
Referring to Figure 3:
  1. Min: A request must start service somewhere ⇒ visit at least one tandem stage.
  2. Max: The total accrued service time from visiting multiple stages should strive to reach S = m  ×  S/m, but not exceed it.
  3. On completion of service S/m at a given stage, unless condition 2 is already met, the request looks for a new idle stage to visit. See Remark 4.
  4. Having satisfied 1, if no idle stage is available, the request exists the system.
  5. If all tandem stages are busy on arrival, the request joins the waiting line of a random stage.
Remark 4 We don't revisit the same M/M/1 stage to avoid any complications from feedback. All routing is feedforward.
Point 4 explains how the apparent stage-reduction to a single fast-M/M/1 queue occurs.
Remark 5 Is this a rendition of von Neumann's minimax principle?
Remark 6 Should be formally proveable that algorithm 1 leads to the finite geometric series in (13). See Section A.3.
Remark 7 With enough statistical samples, algorithm 1 will produce a fractional number of stages, e.g., 3.54776 in Table 3 at ρ = 0.65.
Remark 8 Algorithm 1 may also explain why the morphing picture underestimates RE in (2) at high loads. Under heavy traffic, waiting lines can form at all m tandem stages whereas there is only a single waiting line in M/M/m.
Corollary 2 [Morphing is Asymmetric] By virtue of algorithm 1 the symmetry with the original parallel queues is lost. The reduction of visits to m/η stages means that the tandem configuration cannot be mapped back to a smaller subset of parallel queues. They cannot be equivalent because the service time remains fixed at S/m.

A  Appendices

A.1  Series


1

1 − ρ
 =  1 + ρ + ρ2 + ρ3 + ...
(19)

1

1 − ρm
 =  1 + ρm + ρ2m + ρ3m + ...
(20)

1 − ρm

1 − ρ
 =  1 + ρ + ρ2 + ρ3 + ... + ρm−1
(21)

1

1 − ρm
 = 
  1 − ρ

1 − ρm
 
  1

1 − ρ
(22)

A.2  Approximation Errors

The negative sign means that RA in (5) underestimates RE in (2).
Table 4: Error range for m = 3
ρ RE RA % Error
0.1 1.00137 1.001 -0.0370233
0.2 1.01027 1.00806 -0.218699
0.3 1.03335 1.02775 -0.541718
0.4 1.07843 1.06838 -0.932401
0.5 1.15789 1.14286 -1.2987
0.6 1.29562 1.27551 -1.55217
0.65 1.40118 1.3786 -1.61175
0.7 1.54705 1.52207 -1.61465
0.75 1.75701 1.72973 -1.55262
0.8 2.07865 2.04918 -1.41781
0.85 2.62306 2.59151 -1.20265
0.9 3.72354 3.69004 -0.899678
0.95 7.04672 7.01139 -0.501367
0.99 33.7056 33.6689 -0.1089
Maximum error ≈ 1.6%.
Table 5: Error range for m = 10
ρ RE RA % Error
0 1 1 0
0.1 1. 1. -1.2416 ×10−6
0.2 1.00001 1. -0.000586472
0.3 1.00017 1.00001 -0.0159397
0.4 1.00147 1.0001 -0.136225
0.5 1.00722 1.00098 -0.619879
0.6 1.02532 1.00608 -1.87662
0.65 1.04392 1.01365 -2.89985
0.7 1.07391 1.02907 -4.17556
0.75 1.12264 1.05967 -5.60912
0.8 1.20459 1.12029 -6.99822
0.85 1.35324 1.24514 -7.98864
0.9 1.66873 1.53534 -7.99359
0.95 2.65117 2.49213 -5.99887
0.99 10.6374 10.4583 -1.68363
Maximum error ≈ 8%.
Table 6: Error range for m = 20
ρ RE RA % Error
0 1 1 0
0.1 1. 1. -3.5527 ×10−13
0.2 1. 1. -6.4666 ×10−8
0.3 1. 1. -0.0000380073
0.4 1.00002 1. -0.00220676
0.5 1.00037 1. -0.037202
0.6 1.00302 1.00004 -0.297128
0.65 1.00715 1.00018 -0.692049
0.7 1.01559 1.0008 -1.45678
0.75 1.03209 1.00318 -2.8006
0.8 1.06402 1.01166 -4.92056
0.85 1.12835 1.04032 -7.80161
0.9 1.27538 1.1384 -10.7404
0.95 1.7554 1.55881 -11.1991
0.99 5.73933 5.4917 -4.31472
Maximum error ≈ 11%.
Table 7: Error range for m = 64
ρ RE RA % Error
0 1 1 0
0.1 1. 1. 0.
0.2 1. 1. 0.
0.3 1. 1. 0.
0.4 1. 1. -3.4956 ×10−10
0.5 1. 1. -1.3324 ×10−6
0.6 1. 1. -0.000404175
0.65 1.00004 1. -0.00361036
0.7 1.00023 1. -0.0229767
0.75 1.00111 1. -0.111269
0.8 1.00438 1. -0.435968
0.85 1.01495 1.00003 -1.46965
0.9 1.04854 1.00118 -4.51682
0.95 1.18375 1.03899 -12.2294
0.99 2.41554 2.10791 -12.7355
Maximum error ≈ 13%.
Table 8: Error range for m = 128
ρ RE RA % Error
0 1 1 0
0.1 1. 1. 0.
0.2 1. 1. 0.
0.3 1. 1. 0.
0.4 1. 1. 0.
0.5 1. 1. -2.0206 ×10−12
0.6 1. 1. -1.1883 ×10−7
0.65 1. 1. -7.2607 ×10−6
0.7 1. 1. -0.000216284
0.75 1.00004 1. -0.00354116
0.8 1.00036 1. -0.035532
0.85 1.00244 1. -0.243851
0.9 1.01319 1. -1.30205
0.95 1.07153 1.00141 -6.54384
0.99 1.67795 1.3817 -17.6558
Maximum error ≈ 18%.

A.3  Revisit the Repairman

Need to review the M/M/1//N queue more carefully [Allen 1990, p.270].

A  =  λS  =  λ

μ
(23)
and the steady-state probabilities:
pk  =  p0 Ak ,   k = 0,1,2, …, N
(24)
Sum of states:
N

k=0 
pk  =  1
(25)
The normalizing probability is a finite geometric series:
p0  =  1 − A

1 − AN+1
(26)
which is the same expression as φ(m,ρ) in (8).
Proof 2
1
= p0 + p1 + …+ pN
= p0
1 + p1

p0
+ p2

p0
+ p3

p0
…+ pN

p0

= p0
1 + A + A2 + A3 + …+ AN
= p0 1 − AN+1

1 − A
Not all traffic reaching the system enters it. The probability that a request is not admitted if there are K requests in the system, is pK. The mean (effective) arrival rate is then:
λA  =  λ(1 − pN)
(27)
Probability that an arriving request finds n requests already in the system
qk  =  pk

1 − pN
(28)
Also cf. ratio (3)
B

1 − (1 − B)ρ
(29)

B  Bibliography

[Allen 1990]
A. Allen, "Probability, Statistics, and Queueing Theory," Academic Press, 1990
[Erlang 1909]
A. Erlang, "The Theory of Probabilities and Telephone Conversations," Nyt Tidsskrift for Matematik B, v. 20, p. 33, 1909
[Erlang 1917]
A. Erlang, "Solution of Some Problems in the Theory of Probabilities of Significance in Automatic Telephone Exchanges," Electroteknikeren, v. 13, p. 5, 1917
[Erlang 1920]
A. Erlang, "Telephone Waiting Times," Nyt Tidsskrift for Matematik B, v. 31, p. 25, 1920
[Gunther 1998]
N. Gunther, The Practical Performance Analyst, McGraw-Hill, 1998
[Gunther 2002]
N. J. Gunther, "A New Interpretation of Amdahl's Law and Geometric Scalability," arxiv.org/abs/cs/0210017
[Gunther 2005]
N. J. Gunther, Analyzing Computer System Performance with Perl::PDQ, Springer 2005
[Gunther 2010]
N. J. Gunther, "Erlang Redux," Unpublished manuscript



File translated from TEX by TTH, version 3.81.
On 28 Jul 2016, 16:56.