How to Quantify Scalability

How to Quantify Scalability

The Universal Scalability Law

This page is a supplement to Chapters 4-6:
  1. Scalability—A Quantitative Approach
  2. Evaluating Scalability Parameters
  3. Software Scalability
in my Guerrilla Capacity Planning book [Gunther 2007], along with the section on the Universal Scalability Law (USL), originally entitled "Scalability on a Stick," in the printed lift-out Guerrilla Manual included inside the back jacket of the original book printing.
It's also an attempt to provide a quick overview of the USL methodology, including the latest developments since the book was published in 2007.
News: Application of the USL to multicore scalability will appear in a forthcoming book [Wiley 2012].

Contents

1  Universal Scalability Law (USL)
    1.1  USL Model
    1.2  Justification
    1.3  Applicability
2  How to Use It
    2.1  Virtual load testing
    2.2  Number of Measurements
    2.3  Detecting bad measurements
    2.4  Performance heuristics
    2.5  Performance diagnostics
    2.6  Production environments
    2.7  My Blog
3  Tools for Using USL
    3.1  Excel Spreadsheet
    3.2  OpenOffice Spreadsheet
    3.3  R script
4  Presentations
    4.1  Hotsos 2011
    4.2  Surge 2010
    4.3  Velocity 2010
    4.4  Ignite! 2009
5  Bibliography

1  Universal Scalability Law (USL)

The original derivation of the USL was presented at the 1993 CMG conference [Gunther 1993]. A brief account of its application to parallel processing performance (under the name super-serial model) was presented in Chaps. 6 and 14 of my first book [Gunther 1998]. A more complete derivation with example applications is presented in Chaps. 4-6 of my GCaP book [Gunther 2007]. The supporting mathematical theorems (see Section 1.2) are presented in papers cited in the Bibliography (see Section 5).
Some reasons why you should understand the USL include:
  1. A lot of people use the term "scalability" without clearly defining it, let alone defining it quantitatively. Computer system scalability must be quantified. If you can't quantify it, you can't guarantee it. The universal law of computational scaling provides that quantification.

  2. One of the greatest impediments to applying queueing-theory models (whether analytic or simulation) is the inscrutibility of service times within an application. Every queueing facility in a performance model requires a service time as an input parameter. Without the appropriate queues in the model, system performance metrics like throughtput and response time, cannot be predicted. The USL leapfrogs this entire problem by NOT requiring ANY low-level service time measurements as inputs.

1.1  USL Model

The universal scalability law (USL) combines the following independent effects into a single equation (1) expressed in terms of two parameters or coefficients: α and β.

A. Equal bang for the buck B. Cost of sharing resources C. Diminishing returns at higher loads D. Negative return on investment
sdet bench
sdet bench
sdet bench
sdet bench
α = 0, β = 0 α > 0, β = 0 α > 0, β = 0 α > 0, β > 0
The relative capacity C(N) is a normalized throughput given by:
C(N)  =   N

1 + α (N − 1) + β N (N − 1)
(1)
where N represents either:
  1. (Software Scalability) the number of users or load generators on a fixed hardware configuration. In this case, the number of users acts as the independent variable while the CPU configuration remains constant for the range of user load measurements.

  2. (Hardware Scalability) the number of physical processors or nodes in the hardware configuration. In this case, the number of user processes executing per CPU (say 10) is assumed to be the same for every added CPU. Therefore, on a 4 CPU platform you would run 40 virtual users.

with α (alpha) the contention parameter, and β (beta) a measure of the coherency delay. A non-zero value of β causes the throughput to go retrograde (i.e., decreases with increasing load).
NOTE: The objective of using eqn.(1) is NOT to produce a curve that passes through every data point. That's called curve fitting and that's what graphics artists do with splines. As von Neumann said, "Give me 4 parameters and I'll fit an elephant. Give me 5 and I'll make its trunk wiggle!" (At least I only have 2)
The fitted coefficients in eqn.(1) then provide an estimate of the user load at which the maximum scalability will occur in case D:
Nmax  = 

 

 (1  −  α) / β
 
(2)
and the corresponding throughput Xmax at load Nmax is given by:
Xmax  =  X(1) * C(Nmax)
(3)
Scalability profiles B and C (above) correspond to β = 0, which means that Nmax occurs at infinity. Put differently, the scalability curve simply approaches a ceiling at 1/α (the horizontal dashed line in the plots). See Chapter 6 "Software Scalability" in Guerrilla Capacity Planning.

1.2  Justification

The following theorem provides the justification for applying the same USL eqn.(1) to both hardware and software systems:
Theorem (Gunther 2008): The USL is equivalent to the synchronous queueing bound on throughput for a linear load-dependent machine repairman model of a multiprocessor.
The proof can be found in [Gunther 2008]. There is an analogy with Amdahl's law, which can also be interpreted queue-theoretically [Gunther 2002]:
Theorem (Gunther 2002): Amdahl's law for parallel speedup is equivalent to the synchronous queueing bound on throughput in the machine repairman model of a multiprocessor.
Amdahl's law is therefore subsumed by the USL because it corresponds to the case β = 0. Both Amdahl's law and the USL belong to a class of mathematical functions called Rational Functions. Moreover, these theorems have also been confirmed experimentally using event-driven simulations [HoltGun 2008]. In other words, the whole USL approach to quantifying scalability rests on a fundamentally sound physical footing.

1.3  Applicability

This model has wide-spread applicability, including: That's why it's called universal.

2  How to Use It

This section provides some suggestions for ways in which the USL might be applied.

2.1  Virtual load testing

The USL in eqn.(1) allows you take a sparse set of load measurements—at least half a dozen data points (see Section 2.2)—and from those data determine how your application will scale under larger user loads than you may be able to generate in your test lab. This can all be done in a spreadsheet tool like Excel or OpenOffice.

sdet excel

2.2  Number of Measurements

In my GCaP book, I say a minimum of 4 data points (i.e., load points for the independent random variable or regressor N or p) are needed to do the USL regression, but that was based on the principle of polynomial fitting where you need 1 more point than the degree of the polynomial you're fitting. Since the USL polynomial in Excel is a quadratic (degree 2), you would need 3 data points to fit a polynomial unambiguously. But we are not fitting polynomials, we're doing statistical regression analysis. So, I increased the minimum to 4 data points in order to be more statistical meaningful. Since then, I've tended to say: on the order of 6 data points, recognizing that the larger number of measurements can sometimes be a struggle for people to attain in some test environments. The ultimate question is, how meaningful do you want your scalability analysis to be?
Don't bitch about the minimum requirement of 6 data points in Section 2.1. Microsoft Photosynth, for example, requires that you take a minimum of 200 photographs for something that is ultimately completely useless. At least with the USL you learn something.
However, as I was reminded in our GDAT class, a 95% confidence level has an approximate error that is proportional to 1/√(n), where n is the sample size, i.e., the number of measurements. A 95% confidence level means that there is only a 5% chance that the sample results will differ from the true population average.
Table 1: Error bound for a sample size of n based on Student's t-test where SD is the sample std deviation
n Actual Error Bound
 100   t(0.975,99) × SD(n)//√(100)    > 10% 
 25   t(0.975,24) × SD(n)//√(25)    > 20% 
 10   t(0.975,9) × SD(n)//√(10)    > 30% 
 6   t(0.975,5) × SD(n)//√(6)    > 40% 
 4   t(0.975,3) × SD(n)//√(4)    > 50% 
According to Table 1, a sample size of n = 4 carries an error margin greater than 50%. That's why I've since upped the requirement to n = 6 data points in Section 2.1. Even then, the improvement is marginal.
But let's not get too carried away with technicalities. Table 1 assumes a sampling (i.e., re-measuring) the same configuration, viz., N or p in test rig or production environment. That almost never happens. In fact, we're lucky to get a single measurement per configuration. Therefore, when it comes to statistical regression using the USL model, at best we can only be referring to n samples of the total system throughput X where the random variables N or p are also changing! This is hardly the conventional statistical procedure, but that's life in the world of computer performance analysis. Sometimes we have to make do with what we get handed to us. Since this is not a limitation of the USL, the only way to improve the situation is to get better measurements.
In other words, let's not nitpick about how well we are hitting the side of the barn, when the measurements are so statistically uncertain that we're not even sure whose farm we're in.

2.3  Detecting bad measurements

Equation (1) is not a crystal ball. It cannot foretell the onset of broken measurements or intrinsic pathologies. When the data diverge from the model, that does not automatically make the model wrong. You need to stop measuring and find out why.

2.4  Performance heuristics

The relative sizes of the α and β parameters tell you respectively, whether contention effects or coherency effects are responsible for poor scalability.

2.5  Performance diagnostics

What makes (1) easy to apply, also limits its diagnostic capability. If the parameter values are poor, you cannot use it to tell you what to fix. All that information is in there alright, but it's compressed into the values of those two little parameters. However, other people e.g., application developers (the people who wrote the code), the systems architect, may easily identify the problem once the universal law has told them they need to look for one.

2.6  Production environments

Applying the USL to performance data collected from production environments with mixed workloads is a current area of research.
The main issue is determining the appropriate independent variable, e.g., N users or processes, not dependent variables like utilization ρ(N). Then you only need X(N) data as the dependent variable to regress against. Those X(N) values should be determined from data that are collected during a measurement window with as few large-transient effects as possible, i.e., close to steady state. From there, everything should go as per the usual procedure described in Section 2.

2.7  My Blog

Various subtle points about using the USL are covered in my blog postings and the associated commentaries.

3  Tools for Using USL

USL models for estimating both hardware (p) and software (N) scalability. These are the same tools as discussed in my Guerrilla Capacity Planning and Guerrilla Data Analysis Techniques classes.

3.1  Excel Spreadsheet

EXCEL spreadsheet USLcalc.xls.

3.2  OpenOffice Spreadsheet

Compliments of a GCaP guerrilla): USLcalc.ods (Compensates for the lack of a polynomial fitting routine).

3.3  R script

USLcalc.r and example data USLcalc-data.txt.
Download the R software if you don't already have it.

4  Presentations

Here are some recent and upcoming presentations that involve USL analysis.

4.1  Hotsos 2011

Hotsos ORACLE performance symposium
Irving, TX, March 6—10, 2011
Brooks, Cooks and Response Time Scalability
Successful database scaling to meet service level objectives (SLO) requires converting standard Oracle performance data into a form suitable for cost-benefit comparison, otherwise you are likely to be groping in the dark or simply relying on someone else's scalability recipes. Creating convenient cost-benefit comparisons is the purpose of the Universal Scalability Law (USL), which I have previously presented using transaction throughput measurements as the appropriate performance metric. However, Oracle AWR and OWI data are largely based on response time metrics rather than throughput metrics.
In this presentation, I will show you how the USL can be applied to response time data. A surprising result is that the USL contains Brooks' law, which is essentially a variant of the old too-many-cooks adage: hiring more cooks at the last minute to ensure the meal is prepared on time can cause the preparation to take longer. From the standpoint of the USL, this kind of delay inflation arises from two unique interactions in the kitchen: group conferences and tête-à-têtes between the experienced cooks and the rookie cooks. Replace cooks with Oracle average active sessions (AAS) and similar response-time inflation can impact your database SLOs. The USL reveals this effect in a numerical way for easier analysis.
Examples of applying the USL to Oracle performance data will be used to examine Brooks' law and underlying concepts such as delay, wait, latency, averages and response time percentiles.

4.2  Surge 2010

Surge 2010: The Scalability and Performance Conference
Baltimore, MD, Sep 30—Oct 1, 2010
Quantifying Scalability FTW
You probably already collect performance data, but data ain't information. Successfull scalability requires transforming your data so that you can quantify the cost-benefit of any architectural decisions. In other words:
measurement  +  models  ==  information
So, measurement alone is only half the story; you need a method to transform your data. In this presentation I will show you a method that I have developed and applied successfully to large-scale web sites and stack applications to quantify the benefits of proposed scaling strategies. To the degree that you don't quantify your scalability, you run the risk of ending up with WTF rather than FTW.
Speaker list

4.3  Velocity 2010

Accepted title and abstract:
Hidden Scalability Gotchas in Memcached and Friends*
Neil Gunther (Performance Dynamics), Shanti Subramanyam (Oracle USA), Stefan Parvu (Oracle Finland)
Most web deployments have standardized on horizontal scaleout in every tier: web, application, caching and database; using cheap, off-the-shelf, white boxes. In this approach, there are no real expectations for vertical scalability of server apps like memcached or the full LAMP stack. But with the potential for highly concurrent scalability offered by newer multicore processors, it is no longer cost-effective to ignore their underutilization due to poor, thread-level, scalability of the web stack. In this session we show you how to quantify scalability with the Universal Scalability Law (USL) by demonstrating its application to actual performance data collected from a memcached benchmark. As a side effect of our technique, you will see how the USL also identifies the most signficant performance tuning opportunities to improve web app scalability.
* This work was performed while two of us (S.S. and S.P.) were employed by Sun Microsystems, prior to its acquisition by Oracle Corporation.
Cite as:
N. J. Gunther, S. Subramanyam, and S. Parvu. "Hidden Scalability Gotchas in Memcached and Friends." In VELOCITY Web Performance and Operations Conference, Santa Clara, California, O'Reilly, June, 22-24 2010.
http://velocityconf.com/velocity2010/public/schedule/detail/13046
Presentation slides (PDF)
Blog post about being accepted for the Velocity 2010 conference Web Ops track.
My review of the conference.

4.4  Ignite! 2009

Ignite! Velocity (San Jose).
Scalability for QuantHeads: How to Do It Between Spreadsheets
Why are there no 30 ft giants like the one in " Jack and the Beanstalk"? Why are there no 2,000 ft(?) beanstalks, for that matter?
Engineering weight-strength models tell us their scalability reaches a critical point and they would BOTH collapse. How big can your web site scale before a performance collapse? How do you know? Can you prove it?
Measurement alone is NOT sufficient. How do you know those data are correct? You need both: Data + Models == Insight. I'll show you how to prove it using simple spreadsheets. This technique leads to the concept of scalability zones.
Here's what Websphere scalability looks like when it's quantified. And it's real.

5  Bibliography

[Gunther 1993]
N. J. Gunther, "A Simple Capacity Model of Massively Parallel Transaction Systems," CMG National Conference, 1993 (PDF)
[Gunther 1998]
N. J. Gunther, The Practical Performance Analyst, McGraw-Hill, 1998 (Second printing iUniverse 2000)
[Gunther 2002]
N. J. Gunther, "A New Interpretation of Amdahl's Law and Geometric Scalability," arxiv.org/abs/cs/0210017
[Gunther 2007]
N. J. Gunther, Guerrilla Capacity Planning, Springer 2007
[Gunther 2008]
N. J. Gunther, "A General Theory of Computational Scalability Based on Rational Functions," arxiv.org/abs/0808.1431
[HoltGun 2008]
J. Holtman and N. J. Gunther, "Getting in the Zone for Successful Scalability," arxiv.org/abs/0809.2541
[Wiley 2012]
N. J. Gunther and S. Subramanyam and S. Parvu, "A Methodology for Optimizing Multithreaded System Scalability on Multi-cores," in Programming Multi-core and Many-core Computing Systems, eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computing. in press



File translated from TEX by TTH, version 3.38.
On 8 Nov 2011, 07:53.