How to Quantify Scalability

How to Quantify Scalability

The Universal Scalability Law (USL)

The purpose of models is not to fit the data but to sharpen the questions.Sam Karlin
This page is intended to supplement Chapters 4, 5 and 6 in my Guerrilla Capacity Planning book [Gunther 2007]:
  1. Scalability—A Quantitative Approach
  2. Evaluating Scalability Parameters
  3. Software Scalability
It's also an attempt to provide a quick overview of the USL methodology, including the latest developments since the book was published in 2007.
News: Application of the USL to multicore scalability will appear in a forthcoming book [Wiley 2014].


1  Universal Scalability Law (USL)
    1.1  The Scalability Model
    1.2  Theoretical Justification
        1.2.1  Theorem (Gunther 2008)
        1.2.2  Theorem (Gunther 2002)
    1.3  Applicability
2  How to Use It
    2.1  Virtual load testing
    2.2  Number of Measurements
    2.3  Detecting bad measurements
    2.4  Performance heuristics
    2.5  Performance diagnostics
    2.6  Production environments
    2.7  Scalability Zones
    2.8  My Blog
3  Tools for Using USL
    3.1  Excel Spreadsheet
    3.2  OpenOffice Spreadsheet
    3.3  R Scripts
4  Presentations
    4.1  Hotsos 2011
    4.2  Surge 2010
    4.3  Velocity 2010
    4.4  Ignite! 2009
5  Bibliography

1  Universal Scalability Law (USL)

The original derivation of the Universal Scalability Law, or USL, was presented at the 1993 CMG conference [Gunther 1993]. A brief account of its application to parallel processing performance (under the name super-serial model) was presented in Chaps. 6 and 14 of my first book [Gunther 1998]. A more complete derivation with example applications is presented in Chaps. 4-6 of my GCaP book [Gunther 2007]. The supporting mathematical theorems (see Section 1.2) are presented in papers cited in the Bibliography (see Section 5).
Some reasons why you should understand the USL include:
  1. A lot of people use the term "scalability" without clearly defining it, let alone defining it quantitatively. Computer system scalability must be quantified. If you can't quantify it, you can't guarantee it. The universal law of computational scaling provides that quantification.

  2. One of the greatest impediments to applying queueing-theory models (whether analytic or simulation) is the inscrutibility of service times within an application. Every queueing facility in a performance model requires a service time as an input parameter. Without the appropriate queues in the model, system performance metrics like throughtput and response time, cannot be predicted. The USL leapfrogs this entire problem by NOT requiring ANY low-level service time measurements as inputs.

1.1  The Scalability Model

The USL (universal scalability law) combines the following essential effects:

A. Equal bang for the buck B. Cost of sharing resources C. Diminishing returns from contention D. Negative returns from incoherency
sdet bench
sdet bench
sdet bench
sdet bench
α = 0, β = 0 α > 0, β = 0 α >> 0, β = 0 α >> 0, β > 0

into a single scalability model that defines the relative capacity C(N) :

C(N)  =   N

1 + α (N − 1) + β N (N − 1)
The three terms in the denominator of eqn. (1) are associated repectively with the three Cs: the level of concurrency, a contention penalty (with stength α) and a coherency penalty (with stength β). The parameter values are defined in the range: 0 ≤ α, β < 1. The independent variable N can represent either
Software Scalability:
Here, the number of users or load generators (N) is incremented on a fixed hardware configuration. In this case, the number of users acts as the independent variable while the processor configuration remains fixed over the range of user-load measurements. This is the most common situation found in load testing environments where tools like HP's LoadRunner or Apache JMeter are used.
Hardware Scalability:
Here, the number of physical processors (N) is incremented in the hardware configuration while keeping the user load per processor fixed. In this case, the number of users executing per processor (e.g., 100 users per processor) is assumed to remain the same for every added processor. For example, on a 32 processor platform you would apply a load of N = 3200 users to the test platform.
Equation (1) tells us that hardware and application scalability are two sides of the same coin: something not generally recognized. A non-zero value of β is associated with measured throughput that goes retrograde, i.e., decreases with increasing load or platform configuration.
NOTE: The objective of using eqn.(1) is NOT to produce a curve that passes through every data point. That's called curve fitting and that's what graphics artists do with splines. As von Neumann said, "Give me 4 parameters and I'll fit an elephant. Give me 5 and I'll make its trunk wiggle!" (At least I only have 2)
The fitted coefficients in eqn.(1) then provide an estimate of the user load at which the maximum scalability will occur in case D:
Nmax  = 


 (1  −  α) / β
and the corresponding throughput Xmax at load Nmax is given by:
Xmax  =  X(1) * C(Nmax)
Scalability profiles B and C (above) correspond to β = 0, which means that Nmax occurs at infinity. Put differently, the scalability curve simply approaches a ceiling at 1/α (the horizontal dashed line in the plots). See Chapter 6 "Software Scalability" in Guerrilla Capacity Planning.

1.2  Theoretical Justification

The following theorem provides the justification for applying the same USL eqn.(1) to both hardware and software systems:

1.2.1  Theorem (Gunther 2008)

The USL is equivalent to the synchronous queueing bound on throughput for a linear load-dependent machine repairman model of a multiprocessor.
The proof can be found in [Gunther 2008].

1.2.2  Theorem (Gunther 2002)

There is an analogy with Amdahl's law, which can also be interpreted queue-theoretically [Gunther 2002]:
Amdahl's law for parallel speedup is equivalent to the synchronous queueing bound on throughput in the machine repairman model of a multiprocessor.
Amdahl's law is therefore subsumed by the USL because it corresponds to the case β = 0. Both Amdahl's law and the USL belong to a class of mathematical functions called Rational Functions. Moreover, these theorems have also been confirmed experimentally using event-driven simulations [HoltGun 2008]. In other words, the whole USL approach to quantifying scalability rests on a fundamentally sound physical footing.

1.3  Applicability

This model has wide-spread applicability, including: That's why it's called universal.

2  How to Use It

This section provides some suggestions for ways in which the USL might be applied.

2.1  Virtual load testing

The USL in eqn.(1) allows you take a sparse set of load measurements—at least half a dozen data points (see Section 2.2)—and from those data determine how your application will scale under larger user loads than you may be able to generate in your test lab. This can all be done in a spreadsheet tool like Excel or OpenOffice.

sdet excel

2.2  Number of Measurements

In my GCaP book, I say a minimum of 4 data points (i.e., load points for the independent random variable or regressor N or p) are needed to do the USL regression, but that was based on the principle of polynomial fitting where you need 1 more point than the degree of the polynomial you're fitting. Since the USL polynomial in Excel is a quadratic (degree 2), you would need 3 data points to fit a polynomial unambiguously. But we are not fitting polynomials, we're doing statistical regression analysis. So, I increased the minimum to 4 data points in order to be more statistical meaningful. Since then, I've tended to say: on the order of 6 data points, recognizing that the larger number of measurements can sometimes be a struggle for people to attain in some test environments. The ultimate question is, how meaningful do you want your scalability analysis to be?
Don't bitch about the minimum requirement of 6 data points in Section 2.1. Microsoft Photosynth, for example, requires that you take a minimum of 200 photographs for something that is ultimately completely useless. At least with the USL you learn something.
However, as I was reminded in our GDAT class, a 95% confidence level has an approximate error that is proportional to 1/√(n), where n is the sample size, i.e., the number of measurements. A 95% confidence level means that there is only a 5% chance that the sample results will differ from the true population average.
Table 1: Error bound for a sample size of n based on Student's t-test where SD is the sample std deviation
n Actual Error Bound
 100   t(0.975,99) × SD(n)//√(100)    > 10% 
 25   t(0.975,24) × SD(n)//√(25)    > 20% 
 10   t(0.975,9) × SD(n)//√(10)    > 30% 
 6   t(0.975,5) × SD(n)//√(6)    > 40% 
 4   t(0.975,3) × SD(n)//√(4)    > 50% 
According to Table 1, a sample size of n = 4 carries an error margin greater than 50%. That's why I've since upped the requirement to n = 6 data points in Section 2.1. Even then, the improvement is marginal.
But let's not get too carried away with technicalities. Table 1 assumes a sampling (i.e., re-measuring) the same configuration, viz., N or p in test rig or production environment. That almost never happens. In fact, we're lucky to get a single measurement per configuration. Therefore, when it comes to statistical regression using the USL model, at best we can only be referring to n samples of the total system throughput X where the random variables N or p are also changing! This is hardly the conventional statistical procedure, but that's life in the world of computer performance analysis. Sometimes we have to make do with what we get handed to us. Since this is not a limitation of the USL, the only way to improve the situation is to get better measurements.
In other words, let's not nitpick about how well we are hitting the side of the barn, when the measurements are so statistically uncertain that we're not even sure whose farm we're in.

2.3  Detecting bad measurements

Equation (1) is not a crystal ball. It cannot foretell the onset of broken measurements or intrinsic pathologies. When the data diverge from the model, that does not automatically make the model wrong. You need to stop measuring and find out why.

2.4  Performance heuristics

The relative sizes of the α and β parameters tell you respectively, whether contention effects or coherency effects are responsible for poor scalability.

2.5  Performance diagnostics

What makes (1) easy to apply, also limits its diagnostic capability. If the parameter values are poor, you cannot use it to tell you what to fix. All that information is in there alright, but it's compressed into the values of those two little parameters. However, other people e.g., application developers (the people who wrote the code), the systems architect, may easily identify the problem once the universal law has told them they need to look for one.

2.6  Production environments

Applying the USL to performance data collected from production environments with mixed workloads is a current area of research.
The main issue is determining the appropriate independent variable, e.g., N users or processes, not dependent variables like utilization ρ(N). Then you only need X(N) data as the dependent variable to regress against. Those X(N) values should be determined from data that are collected during a measurement window with as few large-transient effects as possible, i.e., close to steady state. From there, everything should go as per the usual procedure described in Section 2.

2.7  Scalability Zones

In 2008, I introduced the concept of scalability zones [HoltGun 2008]. The idea is, instead of just considering a simple curve fit to the data points, we let the data fall across a number of regions whose boundaries are defined by a set of fitted scalability curves. The boundaries define zones as shown below. Moreover, these zones can be directly indentified with queueing effects and this can be immensely helpful to apps developers looking for ways to tweak their code to improve performance.
In this example, the plot shows the measured scalability (dots) of WebSphere MQ V6.0 nonpersistent 2KB messages (in Rnd Trips/sec) as a function of the increasing number of driving applications (Apps). The colored regions traversed by the data (dots) indicate the kind of synchronous queueing effects responsible for the apparent loss of scalability above about N=15 Apps.

2.8  My Blog

Various subtle points about using the USL are covered in my blog postings and the associated comments.

3  Tools for Using USL

Some tools are already available for applying the USL model to the estimation of both hardware (p) and software (N) scalability. These are the same tools as used in my classes on Guerrilla Capacity Planning and Guerrilla Data Analysis Techniques.
Most people will probably find it convenient to start with an Excel (see Section 3.1) or OpenOffice (see Section 3.2) spreadsheet. For serious capacity and performance analysis, however, beware potential numerical precision problems when using those tools.
Those with a more sophisticated statistical and programming background may find the R packages listed in Section 3.3 to be more appropriate.

3.1  Excel Spreadsheet

Excel spreadsheet USLcalc.xls.
Note that this Excel file contains the reorganized version of the USL equation as it appears in the Guerrilla Capacity Planning book:
  1. Current hardware equation (5.1) which uses the coefficients labeled σ and κ.

  2. Current software equation (6.7) which uses the coefficients labeled α and β.

The difference in the USL versions (old and current) has to do with the way the coefficients are included in the USL equation and therefore how they are calculated from the fitting parameters a and b in Excel. This only applies to the Excel version of the USL, where an inversion transformation is necessary because Excel cannot fit a rational function by default. Tools like R and Mathematica do not have this limitation.

3.2  OpenOffice Spreadsheet

An OpenOffice version of USLcalc.xls, called USLcalc.ods, has been provided compliments of a GCaP guerrilla. It compensates for the lack of a polynomial fitting routine in ODS.

3.3  R Scripts

  1. The R script USLcalc.r uses the nls() function to perform nonlinear regression with the USL model (2009).

  2. CRAN package USL: Analyze system scalability with the Universal Scalability Law by Stefan Moeding (2013).
    Install the package from the R-Console using the command install.packages("usl")

  3. RForge package SATK: Scalability Analysis Toolkit by Paul Puglia (2013).
    Install the package from the R-Console using the command install.packages("SATK", repos="")

  4. An example dataset is provided in USLcalc-data.txt.

Download the R software if you don't already have it.

4  Presentations

Here are some recent and upcoming presentations that involve USL analysis.

4.1  Hotsos 2011

Hotsos ORACLE performance symposium
Irving, TX, March 6—10, 2011
Brooks, Cooks and Response Time Scalability
Successful database scaling to meet service level objectives (SLO) requires converting standard Oracle performance data into a form suitable for cost-benefit comparison, otherwise you are likely to be groping in the dark or simply relying on someone else's scalability recipes. Creating convenient cost-benefit comparisons is the purpose of the Universal Scalability Law (USL), which I have previously presented using transaction throughput measurements as the appropriate performance metric. However, Oracle AWR and OWI data are largely based on response time metrics rather than throughput metrics.
In this presentation, I will show you how the USL can be applied to response time data. A surprising result is that the USL contains Brooks' law, which is essentially a variant of the old too-many-cooks adage: hiring more cooks at the last minute to ensure the meal is prepared on time can cause the preparation to take longer. From the standpoint of the USL, this kind of delay inflation arises from two unique interactions in the kitchen: group conferences and tête-à-têtes between the experienced cooks and the rookie cooks. Replace cooks with Oracle average active sessions (AAS) and similar response-time inflation can impact your database SLOs. The USL reveals this effect in a numerical way for easier analysis.
Examples of applying the USL to Oracle performance data will be used to examine Brooks' law and underlying concepts such as delay, wait, latency, averages and response time percentiles.

4.2  Surge 2010

Surge 2010: The Scalability and Performance Conference
Baltimore, MD, Sep 30—Oct 1, 2010
Quantifying Scalability FTW
You probably already collect performance data, but data ain't information. Successfull scalability requires transforming your data so that you can quantify the cost-benefit of any architectural decisions. In other words:
measurement  +  models  ==  information
So, measurement alone is only half the story; you need a method to transform your data. In this presentation I will show you a method that I have developed and applied successfully to large-scale web sites and stack applications to quantify the benefits of proposed scaling strategies. To the degree that you don't quantify your scalability, you run the risk of ending up with WTF rather than FTW.
Speaker list

4.3  Velocity 2010

Accepted title and abstract:
Hidden Scalability Gotchas in Memcached and Friends*
Neil Gunther (Performance Dynamics), Shanti Subramanyam (Oracle USA), Stefan Parvu (Oracle Finland)
Most web deployments have standardized on horizontal scaleout in every tier: web, application, caching and database; using cheap, off-the-shelf, white boxes. In this approach, there are no real expectations for vertical scalability of server apps like memcached or the full LAMP stack. But with the potential for highly concurrent scalability offered by newer multicore processors, it is no longer cost-effective to ignore their underutilization due to poor, thread-level, scalability of the web stack. In this session we show you how to quantify scalability with the Universal Scalability Law (USL) by demonstrating its application to actual performance data collected from a memcached benchmark. As a side effect of our technique, you will see how the USL also identifies the most signficant performance tuning opportunities to improve web app scalability.
* This work was performed while two of us (S.S. and S.P.) were employed by Sun Microsystems, prior to its acquisition by Oracle Corporation.
Cite as:
N. J. Gunther, S. Subramanyam, and S. Parvu. "Hidden Scalability Gotchas in Memcached and Friends." In VELOCITY Web Performance and Operations Conference, Santa Clara, California, O'Reilly, June, 22-24 2010.
Presentation slides (PDF)
Blog post about being accepted for the Velocity 2010 conference Web Ops track.
My review of the conference.

4.4  Ignite! 2009

Ignite! Velocity (San Jose).
Scalability for QuantHeads: How to Do It Between Spreadsheets
Why are there no 30 ft giants like the one in " Jack and the Beanstalk"? Why are there no 2,000 ft(?) beanstalks, for that matter?
Engineering weight-strength models tell us their scalability reaches a critical point and they would BOTH collapse. How big can your web site scale before a performance collapse? How do you know? Can you prove it?
Measurement alone is NOT sufficient. How do you know those data are correct? You need both: Data + Models == Insight. I'll show you how to prove it using simple spreadsheets. This technique leads to the concept of scalability zones.
Here's what Websphere scalability looks like when it's quantified. And it's real.

5  Bibliography

[Gunther 1993]
N. J. Gunther, "A Simple Capacity Model of Massively Parallel Transaction Systems," CMG National Conference, 1993 (PDF)
[Gunther 1998]
N. J. Gunther, The Practical Performance Analyst, McGraw-Hill, 1998 (Second printing iUniverse 2000)
[Gunther 2002]
N. J. Gunther, "A New Interpretation of Amdahl's Law and Geometric Scalability,"
[Gunther 2007]
N. J. Gunther, Guerrilla Capacity Planning, Springer 2007
[Gunther 2008]
N. J. Gunther, "A General Theory of Computational Scalability Based on Rational Functions,"
[HoltGun 2008]
J. Holtman and N. J. Gunther, "Getting in the Zone for Successful Scalability,"
[Wiley 2014]
N. J. Gunther and S. Subramanyam and S. Parvu, "A Methodology for Optimizing Multithreaded System Scalability on Multi-cores," in Programming Multi-core and Many-core Computing Systems, eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computing. in press.
Pre-order from Amazon.

File translated from TEX by TTH, version 3.38.
On 7 Apr 2014, 10:45.