How to Quantify Scalability

The Universal Scalability Law (USL)

The purpose of models is not to fit the data but to sharpen the questions. —Sam Karlin

This page is intended to supplement the following Chapters in my Guerrilla Capacity Planning book [Gunther 2007]:

Chap. 4: Scalability: A Quantitative Approach
Chap. 5: Evaluating Scalability Parameters
Chap. 6: Software Scalability

as well as the notes associated with my training class on Guerrilla Capacity and Performance: From Counters to Containers and Clouds (see Schedule page).

It's also an attempt to provide a quick overview of the USL methodology, including the latest developments since the book was published. This includes such developments as:

applying the USL to distributed systems
the three-parameter version of the USL
superlinear scaling in Hadoop

My original 1993 paper introduced the USL under the name "super serial" model because, at that time, I viewed it as an extension of the serial fraction concept contained in Amdahl's law. The idea of applying it as a statistical regression model and extending it to three parameters came much later.

1 Universal Scalability Law (USL)
    1.1 The Scalability Model
    1.2 The Three Cs: Concurrency, Contention and Coherency
    1.3 Theoretical Justification
        1.3.1 Theorem (Amdahl's Law)
        1.3.2 Theorem (Universal Scalability Law)
    1.4 Applicability
2 How to Use It
    2.1 Virtual load testing
    2.2 Detecting bad measurements
    2.3 Performance heuristics
    2.4 Performance diagnostics
    2.5 Production environments
    2.6 Scalability Zones
    2.7 My Blog
3 Tools for Using USL
    3.1 Excel Spreadsheet
    3.2 OpenOffice Spreadsheet
    3.3 R Packages and Scripts
4 Presentations
    4.1 DSconf 2019
    4.2 ACM 2018
    4.3 ACM 2015
    4.4 Usenix LISA 2014
    4.5 Hotsos 2011
    4.6 Surge 2010
    4.7 Velocity 2010
    4.8 Ignite! 2009
5 Bibliography

1 Universal Scalability Law (USL)

The original derivation of the Universal Scalability Law, or USL, was presented at the 1993 CMG conference [Gunther 1993]. A brief account of its application to parallel processing performance (under the name super-serial model) was presented in Chaps. 6 and 14 of my first book [Gunther 1998]. A more complete derivation with example applications is presented in Chaps. 4-6 of my GCaP book [Gunther 2007]. The supporting mathematical theorems (see Section 1.3) are presented in papers cited in the Bibliography (see Section 5).

Some reasons why you should understand the USL include:

A lot of people use the term "scalability" without clearly defining it, let alone defining it quantitatively. Computer system scalability must be quantified. If you can't quantify it, you can't guarantee it. The universal law of computational scaling provides that quantification.
One of the greatest impediments to applying queueing-theory models (whether analytic or simulation) is the inscrutibility of service times within an application. Every queueing facility in a performance model requires a service time as an input parameter. Without the appropriate queues in the model, system performance metrics like throughtput and response time, cannot be predicted. The USL leapfrogs this entire problem by NOT requiring ANY low-level service time measurements as inputs.

1.1 The Scalability Model

Defining the system throughput X(N) at a given load, N, the fundamental scaling effects contained in the USL (Universal Scalability Law) can be depicted schematically as

A. Equal bang for the buck	B. Cost of sharing resources	C. Diminishing returns from contention	D. Negative returns from incoherency

α = 0, β = 0	α > 0, β = 0	α >> 0, β = 0	α >> 0, β > 0

Each of these contributions can be combined into an equation that represents a scalability model, viz., the USL. The version of the USL that appears in my Guerrilla Capacity Planning book is

C(N) =

1 + α(N − 1) + βN(N − 1)

(1)

where C(N) = X(N) / X(1) is the relative capacity or the normalized throughput at each successive applied load, N. Conversely, any throughput value, including those that have not been measured, can be expressed in terms of C(N) as

X(N) = C(N) · X(1) .

(2)

One slight drawback of equation (1) is that it requires knowing the value of X(1) to do the normalization. Very often, however, we don't know the value of X(1) because it wasn't measured or it can't be measured, for some reason. In that case, the value of X(1) has to be estimated by interpolating the existing data.

A generalization of equation (1) introduces a third modeling parameter, γ, into the numerator of (1), so that

X(N) =

γ N

1 + α(N − 1) + βN(N − 1)

(3)

Note that the left side of equation (3) is now X(N) (the absolute throughput), not C(N) (the relative capacity). The reasoning behind this modification can be found in my blog post [Gunther 2018]. The virtue of this approach is that it makes it easier to apply the USL to the available data because the estimation of X(1) gets incorporated into the same statistical regression procedure that is already used to estimate the α, β parameters.

The applied load might be induced either by N virtual users e.g., generated by a load testing tool like LoadRunner, or real users on a production platform. In the parallel processing literature, the normalized execution time, S(N) = T(1)/T(N), known as the speedup, is often used in place of the relative capacity C(N), which is the dual representation.

1.2 The Three Cs: Concurrency, Contention and Coherency

The three coefficients, α, β, γ, in eqn. (3) can be identified repectively with the three Cs [Gunther 2018,SF ACM 2018]:

CONCURRENCY or ideal parallelism (with proportionality γ), which can also be interpreted as either:
1. the slope associated with linear-rising scalability, i.e., the line X(N) = γ N in Fig. A when α = β = 0
2. the maximum throughput attainable with a single load generator, i.e., X(1) = γ
CONTENTION (with proportionality α) due to waiting or queueing for shared resources
COHERENCY or data consistency (with proportionality β) due to the delay for data to become consistent, or cache coherent, by virtue of point-to-point exchange of data between resources that are distributed

NOTE: When β = 0 and γ = 1, eqn. (3) reduces to Amdahl's law. See Section 1.3.1.

The independent variable N can represent either

Software Scalability:: Here, the number of users or load generators (N) is incremented on a fixed hardware configuration. In this case, the number of users acts as the independent variable while the processor configuration remains fixed over the range of user-load measurements. This is the most common situation found in load testing environments where tools like LoadRunner or Apache JMeter are used.
Hardware Scalability:: Here, the number of physical processors (N) is incremented in the hardware configuration while keeping the user load per processor fixed. In this case, the number of users executing per processor (e.g., 100 users per processor) is assumed to remain the same for every added processor. For example, on a 32 processor platform you would apply a load of N = 3200 users to the test platform.

Equation (3) tells us that hardware and application scalability are two sides of the same coin: something not generally recognized. A non-zero value of β is associated with measured throughput that goes retrograde, i.e., decreases with increasing load or platform configuration.

NOTE: The objective of using eqn.(3) is NOT to produce a curve that passes through every data point. That's called curve fitting and that's what graphics artists do with splines. As von Neumann said, "Give me 4 parameters and I'll fit an elephant. Give me 5 and I'll make its trunk wiggle!" (At least I only have 2)

The fitted coefficients in eqn.(3) then provide an estimate of the user load at which the maximum scalability will occur in case D:

N_max =

√

(1 − α) / β

(4)

Notice that the γ parameter plays no role in eqn. (4). That's because it locates the position of the maximum in the USL curve on the x-axis, whereas γ scales the y-axis values. The corresponding maximum throughput X_max is found by substituting N_max into eqn. (3):

X_max = X(N_max)

(5)

Scalability profiles in Figures B and C (above) correspond to β = 0, which means that N_max occurs at infinity. Put differently, the scalability curve simply approaches a ceiling at 1/α (the horizontal dashed line in the plots). See Chapter 6 Software Scalability in the GCAP book.

1.3 Theoretical Justification

The following theorem provides the justification for applying the same USL eqn.(3) to both hardware and software systems:

1.3.1 Theorem (Amdahl's Law)

There is an analogy with Amdahl's law, which can also be interpreted queue-theoretically [Gunther 2002]:

Amdahl's law for parallel speedup is equivalent to the synchronous queueing bound on throughput in the machine repairman model of a multiprocessor.

1.3.2 Theorem (Universal Scalability Law)

The USL is equivalent to the synchronous queueing bound on throughput for a linear load-dependent machine repairman model of a multiprocessor.

The proof can be found in [Gunther 2008].

Amdahl's law is therefore subsumed by the USL because it corresponds to the case β = 0. Both Amdahl's law and the USL belong to a class of mathematical functions called Rational Functions. Moreover, these theorems have also been confirmed experimentally using event-driven simulations [HoltGun 2008]. In other words, the whole USL approach to quantifying scalability rests on a fundamentally sound physical footing.

1.4 Applicability

This model has wide-spread applicability, including:

Load testing tools like LoadRunner and JMeter. See section 2.1.
Modeling disk arrays, SANs, and multicore processors.
Modeling distributed application software, e.g., Hadoop. (See section 4.3)
Accounts for such effects as memory thrashing, and cache-miss latencies.

That's why it's called universal.

2 How to Use It

This section provides some suggestions for ways in which the USL might be applied.

2.1 Virtual load testing

The USL in eqn.(3) allows you take a sparse set of load measurements—at least half a dozen data points—and from those data determine how your application will scale under larger user loads than you may be able to generate in your test lab. This can all be done in a spreadsheet tool like Excel or OpenOffice.
sdet excel

2.2 Detecting bad measurements

Equation (3) is not a crystal ball. It cannot foretell the onset of broken measurements or intrinsic pathologies. When the data diverge from the model, that does not automatically make the model wrong. You need to stop measuring and find out why.

2.3 Performance heuristics

The relative sizes of the α and β parameters tell you respectively, whether contention effects or coherency effects are responsible for poor scalability.

2.4 Performance diagnostics

What makes (3) easy to apply, also limits its diagnostic capability. If the parameter values are poor, you cannot use it to tell you what to fix. All that information is in there alright, but it's compressed into the values of those two little parameters. However, other people e.g., application developers (the people who wrote the code), the systems architect, may easily identify the problem once the universal law has told them they need to look for one.

2.5 Production environments

Applying the USL to performance data collected from production environments with mixed workloads is a current area of research.

The main issue is determining the appropriate independent variable, e.g., N users or processes, not dependent variables like utilization ρ(N). Then you only need X(N) data as the dependent variable to regress against. Those X(N) values should be determined from data that are collected during a measurement window with as few large-transient effects as possible, i.e., close to steady state. From there, everything should go as per the usual procedure described in Section 2.

2.6 Scalability Zones

In 2008, I introduced the concept of scalability zones [HoltGun 2008]. The idea is, instead of just considering a simple curve fit to the data points, we let the data fall across a number of regions whose boundaries are defined by a set of fitted scalability curves. The boundaries define zones as shown below. Moreover, these zones can be directly indentified with queueing effects and this can be immensely helpful to apps developers looking for ways to tweak their code to improve performance.

In this example, the plot shows the measured scalability (dots) of WebSphere MQ V6.0 nonpersistent 2KB messages (in Rnd Trips/sec) as a function of the increasing number of driving applications (Apps). The colored regions traversed by the data (dots) indicate the kind of synchronous queueing effects responsible for the apparent loss of scalability above about N=15 Apps.

2.7 My Blog

Various subtle points about using the USL are covered in my blog postings and the associated comments.

3 Tools for Using USL

Some tools are already available for applying the USL model to the estimation of both hardware (p) and software (N) scalability. These are the same tools as used in my classes on Guerrilla Capacity Planning and Guerrilla Data Analysis Techniques.

Most people will probably find it convenient to start with an Excel (see Section 3.1) or OpenOffice (see Section 3.2) spreadsheet. For serious capacity and performance analysis, however, beware potential numerical precision problems when using those tools.

Those with a more sophisticated statistical and programming background may find the R packages listed in Section 3.3 to be more appropriate.

3.1 Excel Spreadsheet

Excel spreadsheet USLcalc.xls.

Note that this Excel file contains the reorganized version of the USL equation as it appears in the Guerrilla Capacity Planning book:

Current hardware equation (5.1) which uses the coefficients labeled σ and κ.

Current software equation (6.7) which uses the coefficients labeled α and β.

The difference in the USL versions (old and current) has to do with the way the coefficients are included in the USL equation and therefore how they are calculated from the fitting parameters a and b in Excel. This only applies to the Excel version of the USL, where an inversion transformation is necessary because Excel cannot fit a rational function by default. Tools like R and Mathematica do not have this limitation.

3.2 OpenOffice Spreadsheet

An OpenOffice version of USLcalc.xls, called USLcalc.ods, has been provided compliments of a GCaP guerrilla. It compensates for the lack of a polynomial fitting routine in ODS.

3.3 R Packages and Scripts

The R script USLcalc.r uses the nls() function to perform nonlinear regression with the USL model (2009).
CRAN package USL: Analyze system scalability with the Universal Scalability Law by Stefan Moeding (2013).
Install the package from the R-Console using the command install.packages("usl")
RForge package SATK: Scalability Analysis Toolkit by Paul Puglia (2013).
Install the package from the R-Console using the command install.packages("SATK", repos="http://R-Forge.R-project.org")
An example dataset is provided in USLcalc-data.txt.

Download the R software if you don't already have it.

4 Presentations

Here is a list of my presentations and publications demonstraing how the USL has been applied.

4.1 DSconf 2019

Distributed Systems Conference

Pune, India, February 16, 2019

Applying The Universal Scalability Law to Distributed Systems
When I originally developed the Universal Scalability Law (USL), it was in the context of tightly-coupled Unix multiprocessors [Gunther 1993], which led to an inherent dependency between the serial contention term and the data consistency term in the USL, i.e., no contention, no coherency penalty. A decade later, I realized that the USL would have broader applicability to distributed clusters if this dependency was removed [Gunther 2007]. In this talk I will show examples of how the most recent version of the USL, with three parameters α, β, γ, can be applied as a statistical regression model to a variety of large-scale distributed systems, such as Hadoop, Zookeeper, Sirius, AWS cloud, and Avalanche DLT, in order to quantify their scalability in terms of numerical concurrency, contention, and coherency values.

Video

4.2 ACM 2018

San Francisco Bay ACM Chapter, November 14, 2018

The Data Analytics of Application Scaling and Why There Are No Giants
Like the 30 ft. giant in "Jack and the Beanstalk," myths and fallacies abound regarding application scaling. Many blog posts show some performance data as time-series charts, but otherwise only offer a qualitative analysis. This talk is intended to remedy that by showing you how to QUANTIFY scalability. Time series are not sufficient to assess cost-benefit of cloud services and other scalability trade-offs. After reviewing the nonlinear constraints on the scalability of giants, we apply similar nonlinear data-analytics techniques to determine the universal scalability constraints on such well-known applications as: MySQL, Memcache, NGINX, Zookeeper, and Amazon AWS.

Video

4.3 ACM 2015

Comm. ACM, Vol. 58 No. 4, Pages 46-55, 2015

Hadoop Superlinear Scalability [Hadoop 2015]
"We often see more than 100% speedup efficiency!" came the rejoinder to the innocent reminder that you cannot have more than 100% of anything. This was just the first volley from software engineers during a presentation on how to quantify computer-system scalability in terms of the speedup metric. In different venues, on subsequent occasions, that retort seemed to grow into a veritable chorus that not only was superlinear speedup commonly observed, but also the model used to quantify scalability for the past 20 years—Universal Scalability Law (USL)—failed when applied to superlinear speedup data.
Indeed, superlinear speedup is a bona fide phenomenon that can be expected to appear more frequently in practice as new applications are deployed onto distributed architectures. As demonstrated here using Hadoop MapReduce, however, the USL is not only capable of accommodating superlinear speedup in a surprisingly simple way, it also reveals that superlinearity, although alluring, is as illusory as perpetual motion.

ACM Queue

Video at CACM

4.4 Usenix LISA 2014

Seattle, WA: November 9—14, 2014

Super Sizing Your Servers and the Payback Trap
As part of IT management, system administrators and ops managers need to size servers and clusters to meet application performance targets; whether it be for a private infrastructure or a public cloud. In this talk I will first establish an analytic framework that can quantify linear, sublinear and negative scalability. This framework can easily be incorporated into Google Docs or R. Several examples including PostgreSQL, Memcached, Varnish and Amazon EC2 scalability will then be presented in detail. The lesser known phenomenon of superlinearity will be examined using this same framework. Superlinear scaling means achieving more performance than the available capacity would be expected to support.

Slides and video

4.5 Hotsos 2011

Hotsos ORACLE performance symposium

Irving, TX: March 6—10, 2011

Brooks, Cooks and Response Time Scalability
Successful database scaling to meet service level objectives (SLO) requires converting standard Oracle performance data into a form suitable for cost-benefit comparison, otherwise you are likely to be groping in the dark or simply relying on someone else's scalability recipes. Creating convenient cost-benefit comparisons is the purpose of the Universal Scalability Law (USL), which I have previously presented using transaction throughput measurements as the appropriate performance metric. However, Oracle AWR and OWI data are largely based on response time metrics rather than throughput metrics.
In this presentation, I will show you how the USL can be applied to response time data. A surprising result is that the USL contains Brooks' law, which is essentially a variant of the old too-many-cooks adage: hiring more cooks at the last minute to ensure the meal is prepared on time can cause the preparation to take longer. From the standpoint of the USL, this kind of delay inflation arises from two unique interactions in the kitchen: group conferences and tête-à-têtes between the experienced cooks and the rookie cooks. Replace cooks with Oracle average active sessions (AAS) and similar response-time inflation can impact your database SLOs. The USL reveals this effect in a numerical way for easier analysis.
Examples of applying the USL to Oracle performance data will be used to examine Brooks' law and underlying concepts such as delay, wait, latency, averages and response time percentiles.

Presentation slides (PDF)

4.6 Surge 2010

Surge 2010: The Scalability and Performance Conference

Baltimore, MD, Sep 30—Oct 1, 2010

Quantifying Scalability FTW
You probably already collect performance data, but data ain't information. Successfull scalability requires transforming your data so that you can quantify the cost-benefit of any architectural decisions. In other words:

measurement + models == information

So, measurement alone is only half the story; you need a method to transform your data. In this presentation I will show you a method that I have developed and applied successfully to large-scale web sites and stack applications to quantify the benefits of proposed scaling strategies. To the degree that you don't quantify your scalability, you run the risk of ending up with WTF rather than FTW.

Speaker list

4.7 Velocity 2010

Accepted title and abstract:

Hidden Scalability Gotchas in Memcached and Friends^*
Neil Gunther (Performance Dynamics), Shanti Subramanyam (Oracle USA), Stefan Parvu (Oracle Finland)
Most web deployments have standardized on horizontal scaleout in every tier: web, application, caching and database; using cheap, off-the-shelf, white boxes. In this approach, there are no real expectations for vertical scalability of server apps like memcached or the full LAMP stack. But with the potential for highly concurrent scalability offered by newer multicore processors, it is no longer cost-effective to ignore their underutilization due to poor, thread-level, scalability of the web stack. In this session we show you how to quantify scalability with the Universal Scalability Law (USL) by demonstrating its application to actual performance data collected from a memcached benchmark. As a side effect of our technique, you will see how the USL also identifies the most signficant performance tuning opportunities to improve web app scalability.

* This work was performed while two of us (S.S. and S.P.) were employed by Sun Microsystems, prior to its acquisition by Oracle Corporation.

Cite as:

N. J. Gunther, S. Subramanyam, and S. Parvu. "Hidden Scalability Gotchas in Memcached and Friends." In VELOCITY Web Performance and Operations Conference, Santa Clara, California, O'Reilly, June, 22-24 2010.
http://velocityconf.com/velocity2010/public/schedule/detail/13046