Performance Ponderings

Contents

1  Exposing the Cost of Performance Hidden in the Cloud (2018)
2  WTF is Modeling, Anyway? (2017)
3  Morphing M/M/m: A New View of an Old Queue (2017)
4  How to Emulate Web Traffic Using Standard Load Testing Tools (2016)
5  Hadoop Superlinear Scalability (2015)
6  A Note on Disk Drag Dynamics (2012)
7  A Methodology for Optimizing Multithreaded System Scalability on Multicores (2011)
8  A Note on Parallel Algorithmic Speedup Bounds (2011)
9  Mind Your Knees and Queues. Responding to Hyperbole with Hyperbolæ (2009)
10  A General Theory of Computational Scalability Based on Rational Functions (2008)
11  Getting in the Zone for Successful Scalability (2008)
12  Object, Measure Thyself: Performance Monitoring and Data Collection (2008)
13  Multidimensional Visualization of Oracle Performance Using Barry007 (2008)
14  Eine Chance für Linux (2008)
15  Better Performance Management Through Better Visualization Tools (2008)
16  Seeing It All at Once with Barry (2007)
17  Leistungsdiagnostik (Performance Diagnostics) (2007)
18  Berechenbare Performance (Predicting Performance) (2007)
19  Moore's Law: More or Less? (2007)
20  Visualizing Virtualization (2007)
21  Guerrilla Scalability: How to Do Virtual Load Testing (2007)
22  The Virtualization Spectrum from Hyperthreads to GRIDs (2006)
23  Reconstructing the Future: Capacity Planning with Data That's Gone Troppo (2006)
24  Benchmarking Blunders and Things That Go Bump in the Night (2006)
25  Unification of Amdahl's Law, LogP and Other Performance Models for Message-Passing Architectures (2005)
26  (Numerical) Investigations into Physical Power-law Models of Internet Traffic Using the Renormalization Group (2005)
27  Millennium Performance Problem 1: Performance Visualization (2005)
28  Benchmarking Blunders and Things That Go Bump in the Night (2004)
29  How to Get Unbelievable Load Test Results (2004)
30  Performance Evaluation of Packet-to-Cell Segmentation Schemes in Input Buffered Packet Switches (2003)
31  Unix Load Average Metric (2003-2004)
32  Series on Guerrilla Capacity Planning (2003)
33  Characterization of the Burst Stabilization Protocol for the RR/CICQ Switch (2003)
34  A New Interpretation of Amdahl's Law and Geometric Scalability (2002)
35  Hit-and-Run Tactics Enable Guerrilla Capacity Planning (2002)
36  Hypernets: Good (G)news for Gnutella! (2002)
37  Of Buses and Bunching: Strangeness in the Queue (2001)
38  Quantifying Application and Server Scalability (2001)
39  How to Write Application Probes (Updated 2001)
40  Scalability Models for a Hypergrowth e-Commerce Website (2000)
41  BIRDS-I: A Benchmark for Image Retrieval on the Internet (2000)
42  Solaris Resource Manager: All I Ever Wanted Was My Unfair Advantage ... and Why You Can't Get It! (1999)
43  Microsoft Windows NT Scalability (1997-1998)
44  The MP Effect: Parallel Processing in Pictures (1996)
45  A Simple Capacity Model of Massively Parallel Transaction Systems (1993)
46  Performance Collapse of Networks (1988)

1  Exposing the Cost of Performance Hidden in the Cloud (2018)

Joint with M. Chawla. CMG cloudXchange Online Conference.
Abstract: Whilst offering lift-and-shift migration and versatile elastic capacity, the cloud also reintroduces an old mainframe concept - chargeback - which thereby rejuvenates the need for traditional performance and capacity management in the new cloud context. Combining production JMX data with an appropriate performance model, we show how to assess fee-based Amazon AWS configurations for a mobile-user application running on a Linux-hosted Tomcat cluster. The performance model also facilitates ongoing cost-benefit analysis of various EC2 Auto Scaling policies.
  1. Video presentation
  2. Enhanced slides

2  WTF is Modeling, Anyway? (2017)

A video conversation about the importance of modeling techniques with CMG performance and capacity management veteran Boris Zibitsker on his YouTube channel. The example presented (slides) shows you how to save multiple millions of dollars with a one-line performance model (video @ 21:50 minutes) that is accurate to within 5% error (better than a typical queueing model).

3  Morphing M/M/m: A New View of an Old Queue (2017)

Presentation at IFORS 2017 — the 21st Conference of the International Federation of Operations Research Societies, Quebec City, Canada.
Abstract: This year is the centenary of A. K. Erlang's 1917 paper on the determination of waiting times in an M/D/m queue with m telephone lines. Today, M/M/m queues are used to model such systems as, call centers, multicore computers and the Internet. Unfortunately, those who should be using M/M/m models often do not have sufficient background in applied probability theory. Our remedy defines a morphing approximation to the exact M/M/m queue that is accurate to within 10% for typical applications. The morphing formula for the residence-time, R(m,ρ), is both simpler and more intuitive than the exact solution involving the Erlang-C function. We have also developed an animation of this morphing process. An outstanding challenge, however, has been to elucidate the nature of the corrections that transform the approximate morphing solutions into the exact Erlang solutions. In this presentation, we show:
Since there were no Proceedings for this conference, no paper was required.

4  How to Emulate Web Traffic Using Standard Load Testing Tools (2016)

Joint with J. Brady. Proceedings of CMG 2016, La Jolla, California.
Abstract: Conventional load-testing tools are based on a fifty-year old time-share computer paradigm where a finite number of users submit requests and respond in a synchronized fashion. Conversely, modern web traffic is essentially asynchronous and driven by an unknown number of users. This difference presents a conundrum for testing the performance of modern web applications. Even when the difference is recognized, performance engineers often introduce modifications to their test scripts based on folklore or hearsay published in various Internet fora, much of which can lead to wrong results. We present a coherent methodology, based on two fundamental principles, for emulating web traffic using a standard load-test environment.
Available as an arXiv e-print.

5  Hadoop Superlinear Scalability (2015)

Joint with P. Puglia and K. Tomasette
Superlinearity (i.e., speedup > 100%) is a bona fide observable phenomenon that has been discussed qualitatively, but not understood in a rigorous way. We provide that insight.
Superlinear speedup can be expected to be observed more often as new applications move to distributed systems. Therefore, we believe it's important to understand what superlinear data actually represent and how to address the phenomenon when configuring distributed systems for performance and scalability.
Some of the points covered include:
Published as Communications of the ACM, Vol. 58 No. 4, Pages 46-55. Video presentation provides more background.
An unabridged edition of this article is freely available online at ACM Queue, Vol. 13, issue 5, June 4, 2015 and includes application of the USL to Varnish, Memcached, and Zookeeper in an Appendix.

6  A Note on Disk Drag Dynamics (2012)

Abstract: The electrical power consumed by typical magnetic hard disk drives (HDD) not only increases linearly with the number of spindles but, more significantly, it increases as very fast power-laws of speed (RPM) and diameter. Since the theoretical basis for this relationship is neither well-known nor readily accessible in the literature, we show how these exponents arise from aerodynamic disk drag and discuss their import for green storage capacity planning.
Available as an arXiv e-print.

7  A Methodology for Optimizing Multithreaded System Scalability on Multicores (2011)

Abstract: We show how to quantify scalability with the Universal Scalability Law (USL) by applying it to performance measurements of memcached, J2EE, and Weblogic on multi-core platforms. Since commercial multicores are essentially black-boxes, the accessible performance gains are primarily available at the application level. We also demonstrate how our methodology can identify the most significant performance tuning opportunities to optimize application scalability, as well as providing an easy means for exploring other aspects of the multi-core system design space.
Available as an arXiv e-print.

8  A Note on Parallel Algorithmic Speedup Bounds (2011)

Abstract: A parallel program can be represented as a directed acyclic graph. An important performance bound is the time to execute the critical path through the graph. We show how this performance metric is related to Amdahl speedup and the degree of average parallelism. These bounds formally exclude superlinear performance.
Available as an arXiv e-print.

9  Mind Your Knees and Queues. Responding to Hyperbole with Hyperbolæ (2009)

Abstract: How do you determine where the response-time "knee" occurs? Calculating where the response time suddenly begins to climb dramatically is considered by many to be an important determinant for such things as load testing, scalability analysis, and setting application service targets. This question arose in a CMG MeasureIT article. I responded to it in an unconventional, but rigorous way, in this CMG MeasureIT article.

10  A General Theory of Computational Scalability Based on Rational Functions (2008)

Abstract: The universal scalability law of computational capacity is a rational function Cp = P(p)/Q(p) with P(p) a linear polynomial and Q(p) a second-degree polynomial in the number of physical processors p, that has been long used for statistical modeling and prediction of computer system performance. We prove that Cp is equivalent to the synchronous throughput bound for a machine-repairman with state-dependent service rate. Simpler rational functions, such as Amdahl's law and Gustafson speedup, are corollaries of this queue-theoretic bound. Cp is further shown to be both necessary and sufficient for modeling all practical characteristics of computational scalability.
Available as an arXiv e-print.

11  Getting in the Zone for Successful Scalability (2008)

Joint with J. Holtman. Accepted for CMG 2008, Las Vegas, Nevada.
Abstract: The Universal Scalability Law (USL) is an analytic function used to quantify application scaling. It is universal because it subsumes Amdahl's law (AL) and linear scaling (LS) as special cases. Using simulation, we show (1) that USL is equivalent to synchronous queueing in a load-dependent machine repairman model, and (2) how LS, AL and USL can be regarded as boundaries defining three performance zones. Typical throughput measurements lie in all three zones. Simulation scenarios provide insight into which application features should be tuned to get into the optimal performance zone.
Available as an arXiv e-print.

12  Object, Measure Thyself: Performance Monitoring and Data Collection (2008)

Matthew O'Keefe, Michael Ducy, Greg Opaczewski, and Stephen Mullins, (Orbitz Worldwide). Presented at CMG 2008, Las Vegas, Nevada.
These guys were involved with the development and deployment of ERMA (Extremely Reusable Monitoring API), and Graphite: the popular open source time-series plotting tool. (See Section 3.2 of the paper) Although I didn't write any code, I did help them write up the instrumentation work done at Orbitz Worldwide in this paper and have them present it at the 2008 CMG conference. Oh! And I came up with the title, which harks back to a previous attempt of mine at a CMG 2002 event to encourage IDE vendors to do something similar in their C++ and Java libraries but, due a lack of obvious ROI, that never went anywhere.
  1. CMG 2008 paper
  2. CMG 2008 slides

13  Multidimensional Visualization of Oracle Performance Using Barry007 (2008)

Joint with T. Põder. Presented at CMG 2008, Las Vegas, Nevada.
Abstract: Most generic performance tools display only system-level performance data using 2-dimensional plot or diagram, and this limits the informational detail that can be displayed. A modern relational database system like Oracle, however, can concurrently serve thousands of client processes with different workload characteristics and generic data displays inevitably hide important information. Drawing on our previous work, this paper demonstrates the application of Barry007 (See paper 16) multidimensional visualization for analyzing Oracle end-user session-level performance showing both collective trends and individual performance anomalies.
Available on arXiv (PDF).

14  Eine Chance für Linux (2008)

Appears in the May 18 volume of Linux Technical Review. (in German)
Abstract (Translation): Linux could be in a position to expand its presence in the server market by looking to mainframe computer performance management as a role model and adapting its instrumentation accordingly.
  1. German version (PDF)
  2. English version (PDF)

15  Better Performance Management Through Better Visualization Tools (2008)

Invited presentation given at the annual Hotsos Symposium in Dallas, Texas, March 2-6 2008.

16  Seeing It All at Once with Barry (2007)

Joint with M. F. Jauvin. Presented at CMG 2007, San Diego, California.
  1. Paper (PDF),
  2. Slides (PDF),
  3. Animations (HTML),
  4. Tools (HTML).

17  Leistungsdiagnostik (Performance Diagnostics) (2007)

Load Average enträtselt und erweitert. July 2007 issue of Linux Magazin.
Understanding the load-average metrics in Linux operating system.
  1. German version (PDF)
  2. English version (PDF)

18  Berechenbare Performance (Predicting Performance) (2007)

Invited paper published in the German monograph Linux Technical Review 02 - Monitoring.
  1. German version (PDF)
  2. English version (PDF)

19  Moore's Law: More or Less? (2007)

Published in the May issue of the CMG MeasureIT e-zine.

20  Visualizing Virtualization (2007)

Guest editorial for the March issue of the CMG e-zine called MeasureIT.

21  Guerrilla Scalability: How to Do Virtual Load Testing (2007)

Invited presentation at the Hotsos Symposium 2007, March, Dallas, Texas.

22  The Virtualization Spectrum from Hyperthreads to GRIDs (2006)

This paper, presented at CMG 2006, Reno, Nevada, is based on the following observations: The associated polling frequency (from GHz to μHz) positions each virtual machine implementation into a region of the VM-spectrum. Several case studies are analyzed to illustrate how this insight can make virtual machines more visible to performance management techniques.

23  Reconstructing the Future: Capacity Planning with Data That's Gone Troppo (2006)

Joint with S. Jenkin. Paper presented at CMG-A 2006, Sydney, Australia.

24  Benchmarking Blunders and Things That Go Bump in the Night (2006)

Published as Part I and Part II in the CMG MeasureIT online magazine.

25  Unification of Amdahl's Law, LogP and Other Performance Models for Message-Passing Architectures (2005)

This paper generalizes the theorem in paper 34 below and was presented at PDCS 2005 VII International Conference on Parallel and Distributed Computing Systems

26  (Numerical) Investigations into Physical Power-law Models of Internet Traffic Using the Renormalization Group (2005)

Paper presented at the Triennial Conference of the International Federation of Operations Research Societies, Honolulu, Hawaii, July 11-15, 2005.
Uses the real-space variant of the renormalization group to exclude certain models that have appeared in the literature to account for so-called self-similar Internet traffic and further suggests that the claimed ramifications for Internet capacity planning may have been over-emphasized.
Chapter 10 of Guerrilla Capacity Planning presents these conclusions in a less mathematical form.

27  Millennium Performance Problem 1: Performance Visualization (2005)

Published in the CMG MeasureIT online magazine.

28  Benchmarking Blunders and Things That Go Bump in the Night (2004)

Abstract: Benchmarking; by which I mean any computer system that is driven by a controlled workload, is the ultimate in performance testing and simulation. Aside from being a form of institutionalized cheating, it also offer countless opportunities for systematic mistakes in the way the workloads are applied and the resulting measurements interpreted. Right test, wrong conclusion is a ubiquitous mistake that happens because test engineers tend to treat data as divine. Such reverence is not only misplaced, it's also a sure ticket to production hell when the application finally goes live. I demonstrate how such mistakes can be avoided by means of two war stories that are real WOPRs. (a) How to resolve benchmark flaws over the psychic hotline and (b) How benchmarks can go flat with too much Java juice. In each case I present simple performance models and show how they can be applied to correctly assess benchmark data.
Presented at WORP2 Workshop 2004
Available as an arXiv e-print.

29  How to Get Unbelievable Load Test Results (2004)

Featured at TeamQuest Corporation as an online capacity planning column.

30  Performance Evaluation of Packet-to-Cell Segmentation Schemes in Input Buffered Packet Switches (2003)

Joint with K. J. Christensen, K. Yoshigoe and A. Roginsky
Presented at High-Speed Networks Symposium of the IEEE International Conference on Communications (ICC 2004).
Available from arXiv server.

31  Unix Load Average Metric (2003-2004)

Originally published as a series of online performance columns for TeamQuest Corp. NOTE: The hyperlinked version of the Linux kernel is release 2.6.xx
This information, like everything else, can be found in the online files along with the source code (as cited in my writings).
A more detailed discussion appears in Chapter 4 of my Perl::PDQ book.

32  Series on Guerrilla Capacity Planning (2003)

These two articles: were published in the CMG MeasureIT online magazine.

33  Characterization of the Burst Stabilization Protocol for the RR/CICQ Switch (2003)

Joint with K. J. Christensen and K. Yoshigoe
Accepted by IEEE Conference on Local Computer Networks
Download as a PDF from arXiv server.

34  A New Interpretation of Amdahl's Law and Geometric Scalability (2002)

Amongst other things, this paper presents the theorem:
Amdahl's law for parallel speedup is equivalent to the synchronous queueing
bound on throughput in the repairman model of a symmetric multiprocessor.
Download from the arXiv server.

35  Hit-and-Run Tactics Enable Guerrilla Capacity Planning (2002)

Published in IEEE IT Professional journal, pp. 40-46, July-August issue, 2002.

36  Hypernets: Good (G)news for Gnutella! (2002)

Online article responding to an earlier analysis of Gnutella written by Jordan Ritter in 2001.
Measurements of both Napster and Gnutella are also disussed in this 2003 paper.
I point out that hypernets like a 20-degree virtual hypertorus or hypercube are much more efficient than a tree.
I looks as though BitTorrent if fact does something like this.
This online article was slashdotted in Feb, 2002.

37  Of Buses and Bunching: Strangeness in the Queue (2001)

TeamQuest online column.

38  Quantifying Application and Server Scalability (2001)

The following series of three articles: were published as TeamQuest online columns.

39  How to Write Application Probes (Updated 2001)

TeamQuest performance column online.

40  Scalability Models for a Hypergrowth e-Commerce Website (2000)

Published in the Springer Lecture Notes in Computer Science series.

41  BIRDS-I: A Benchmark for Image Retrieval on the Internet (2000)

Tech Report published by HP Labs.

42  Solaris Resource Manager: All I Ever Wanted Was My Unfair Advantage ... and Why You Can't Get It! (1999)

This paper is about virtualization implemented in SHARE II [1, 2] sitting on top of the Solaris kernel and rebadged by Sun as their System Resource Manager (SRM). Although it was written in 1999, now ubiquitous virtualization, whether implemented as hypervisors (e.g., VMware, XenServer, AWS), or Linux Containers, all use some form of fair-share scheduler [3]. Astoundingly, most people deploying applications either on local hypervisors or remote clouds remain blissfully unaware of this fact, and its potential impact on application performance and observed capacity. See also paper 22, "The Virtualization Spectrum from Hyperthreads to GRIDs" (2006).
Traditional UNIX time-share schedulers attempt to be fair to all users by employing a round-robin style algorithm for allocating CPU time. Unfortunately, a loophole exists whereby the scheduler can be biased in favor of a greedy user running many short CPU-bound processes. This loophole is not a defect but an intrinsic property of the round-robin scheduler that ensures responsiveness to the short CPU demands associated with multiple interactive users. A new generation of UNIX system resource management (SRM) software constrains the scheduler to be equitable to all users regardless of the number of processes each may be running. This fair-share scheduling draws on the concept of prorating resource "shares" across users and groups and then dynamically adjusting CPU usage to meet those share proportions. The simple notion of statically allocating these shares, however, belies the potential consequences for performance as measured by user response time and service level targets. We demonstrate this point by modeling several simple share allocation scenarios and analyzing the corresponding performance effects. A brief comparison of commercial SRM implementations from Hewlett-Packard, IBM, and Sun Microsystems is also presented.
Presented at the Computer Measurement Group Conference, Reno, NV, Dec. 5-10, 1999.
[1] A, Bettison, A, Gollan, C. Maltby, N. Russell, "SHARE II—A User Administration and Resource Control System for UNIX," LISA V, San Diego, CA, Sep. 20-Oct. 3, 1991.
[2] N. J. Gunther, "UNIX Resource Managers: Capacity Planning and Resource Issues," SAGE-AU Conference, Gold Coast, QLD, Australia, July 3-7, 2000.
[3] J. Kay, and P. Lauder, "A Fair Share Scheduler," Communications of the ACM, 31(1), pp. 44-55, 1988.

43  Microsoft Windows NT Scalability (1997-1998)

The following articles on Windows NT performance were published as a series in the USENIX login magazine:
  1. NT to the Max (NoT!) — Nov (1997)
  2. The ABCs of TPCs — Feb (1998)
  3. How to Stack Cyberbrix — Jun (1998)
Additionally, AUUG (Australian Unix Users Group) apparently picked them up and published articles (1) and (2) in the February 1998 edition of their AUUGN journal, which is still online! (as a scanned PDF) Scroll down to the Table of Contents.

44  The MP Effect: Parallel Processing in Pictures (1996)

This paper presents a purely diagrammatic way of understanding computer system scalability, including a picture-based derivation of Amdahl's law.
It received the Best Paper award at CMG 1996.
These ideas are elaborated on in Chap. 14 of The Practical Performance Analyst book and Chap. 4 of the Guerrilla Capacity Planning book.

45  A Simple Capacity Model of Massively Parallel Transaction Systems (1993)

This is the original paper that presented the Universal Scalability Law or USL model under the name: "Super Serial" model, because I viewed it as an extension of the serial fraction concept contained in Amdahl's law. This version of the USL model was developed for assessing database scalability while I was working at Pyramid Technology.
The paper was given at CMG 1993 in San Diego, California, as well as Jim Gray's 4th HPTS (High Performance Transaction Systems) Workshop in Asilomar, California, 1993.
Later, I realized more generally that contention and coherency could act as independent effects. See Paper 10 and Chaps. 4—6 of Guerrilla Capacity Planning for a more recent technical overview of the USL.

46  Performance Collapse of Networks (1988)

Background information.
Information Processing Letters publication (5.1 MB scanned PDF).
Copyright © 1996-2018 Performance Dynamics Company. All Rights Reserved.



File translated from TEX by TTH, version 3.81.
On 7 Aug 2018, 15:08.