Performance Ponderings

Contents

1  How to Emulate Web Traffic Using Standard Load Testing Tools (2016)
2  Hadoop Superlinear Scalability (2015)
3  A Note on Disk Drag Dynamics (2012)
4  A Methodology for Optimizing Multithreaded System Scalability on Multicores (2011)
5  A Note on Parallel Algorithmic Speedup Bounds (2011)
6  Mind Your Knees and Queues. Responding to Hyperbole with Hyperbolæ (2009)
7  A General Theory of Computational Scalability Based on Rational Functions (2008)
8  Getting in the Zone for Successful Scalability (2008)
9  Object, Measure Thyself: Performance Monitoring and Data Collection (2008)
10  Multidimensional Visualization of Oracle Performance Using Barry007 (2008)
11  Eine Chance für Linux (2008)
12  Better Performance Management Through Better Visualization Tools (2008)
13  Seeing It All at Once with Barry (2007)
14  Leistungsdiagnostik (Load Averages and Stretch Factors) (2007)
15  Berechenbare Performance (Predicting Performance) (2007)
16  Moore's Law: More or Less? (2007)
17  Visualizing Virtualization (2007)
18  Guerrilla Scalability: How to Do Virtual Load Testing (2007)
19  The Virtualization Spectrum from Hyperthreads to GRIDs (2006)
20  Reconstructing the Future: Capacity Planning with Data That's Gone Troppo (2006)
21  Benchmarking Blunders and Things That Go Bump in the Night (2006)
22  Unification of Amdahl's Law, LogP and Other Performance Models for Message-Passing Architectures (2005)
23  (Numerical) Investigations into Physical Power-law Models of Internet Traffic Using the Renormalization Group (2005)
24  Millennium Performance Problem 1: Performance Visualization (2005)
25  Benchmarking Blunders and Things That Go Bump in the Night (2004)
26  How to Get Unbelievable Load Test Results (2004)
27  Performance Evaluation of Packet-to-Cell Segmentation Schemes in Input Buffered Packet Switches (2003)
28  Unix Load Average Metric (2003-2004)
29  Series on Guerrilla Capacity Planning (2003)
30  Characterization of the Burst Stabilization Protocol for the RR/CICQ Switch (2003)
31  A New Interpretation of Amdahl's Law and Geometric Scalability (2002)
32  Hit-and-Run Tactics Enable Guerrilla Capacity Planning (2002)
33  Hypernets: Good (G)news for Gnutella! (2002)
34  Of Buses and Bunching: Strangeness in the Queue (2001)
35  Quantifying Application and Server Scalability (2001)
36  How to Write Application Probes (Updated 2001)
37  Scalability Models for a Hypergrowth e-Commerce Website (2000)
38  BIRDS-I: A Benchmark for Image Retrieval on the Internet (2000)
39  Solaris Resource Manager: All I Ever Wanted Was My Unfair Advantage ... and Why You Can't Get It! (1999)
40  Windows NT Scalability (1997-1998)
41  The MP Effect: Parallel Processing in Pictures (1996)
42  A Simple Capacity Model of Massively Parallel Transaction Systems (1993)
43  The Collapse of Internet Performance (1988)

1  How to Emulate Web Traffic Using Standard Load Testing Tools (2016)

Joint with J. Brady. Proceedings of CMG 2016, La Jolla, California.
Abstract: Conventional load-testing tools are based on a fifty-year old time-share computer paradigm where a finite number of users submit requests and respond in a synchronized fashion. Conversely, modern web traffic is essentially asynchronous and driven by an unknown number of users. This difference presents a conundrum for testing the performance of modern web applications. Even when the difference is recognized, performance engineers often introduce modifications to their test scripts based on folklore or hearsay published in various Internet fora, much of which can lead to wrong results. We present a coherent methodology, based on two fundamental principles, for emulating web traffic using a standard load-test environment.
Available as an arXiv e-print.

2  Hadoop Superlinear Scalability (2015)

Joint with P. Puglia and K. Tomasette
Superlinearity (i.e., speedup > 100%) is a bona fide observable phenomenon that has been discussed qualitatively, but not understood in a rigorous way. We provide that insight.
Superlinear speedup can be expected to be observed more often as new applications move to distributed systems. Therefore, we believe it's important to understand what superlinear data actually represent and how to address the phenomenon when configuring distributed systems for performance and scalability.
Some of the points covered include:
Published as Communications of the ACM, Vol. 58 No. 4, Pages 46-55. Video presentation provides more background.
An unabridged edition of this article is freely available online at ACM Queue, Vol. 13, issue 5, June 4, 2015 and includes application of the USL to Varnish, Memcached, and Zookeeper in an Appendix.

3  A Note on Disk Drag Dynamics (2012)

Abstract: The electrical power consumed by typical magnetic hard disk drives (HDD) not only increases linearly with the number of spindles but, more significantly, it increases as very fast power-laws of speed (RPM) and diameter. Since the theoretical basis for this relationship is neither well-known nor readily accessible in the literature, we show how these exponents arise from aerodynamic disk drag and discuss their import for green storage capacity planning.
Available as an arXiv e-print.

4  A Methodology for Optimizing Multithreaded System Scalability on Multicores (2011)

Abstract: We show how to quantify scalability with the Universal Scalability Law (USL) by applying it to performance measurements of memcached, J2EE, and Weblogic on multi-core platforms. Since commercial multicores are essentially black-boxes, the accessible performance gains are primarily available at the application level. We also demonstrate how our methodology can identify the most significant performance tuning opportunities to optimize application scalability, as well as providing an easy means for exploring other aspects of the multi-core system design space.
Available as an arXiv e-print.

5  A Note on Parallel Algorithmic Speedup Bounds (2011)

Abstract: A parallel program can be represented as a directed acyclic graph. An important performance bound is the time to execute the critical path through the graph. We show how this performance metric is related to Amdahl speedup and the degree of average parallelism. These bounds formally exclude superlinear performance.
Available as an arXiv e-print.

6  Mind Your Knees and Queues. Responding to Hyperbole with Hyperbolæ (2009)

Abstract: How do you determine where the response-time "knee" occurs? Calculating where the response time suddenly begins to climb dramatically is considered by many to be an important determinant for such things as load testing, scalability analysis, and setting application service targets. This question arose in a CMG MeasureIT article. I responded to it in an unconventional, but rigorous way, in this CMG MeasureIT article.

7  A General Theory of Computational Scalability Based on Rational Functions (2008)

Abstract: The universal scalability law of computational capacity is a rational function Cp = P(p)/Q(p) with P(p) a linear polynomial and Q(p) a second-degree polynomial in the number of physical processors p, that has been long used for statistical modeling and prediction of computer system performance. We prove that Cp is equivalent to the synchronous throughput bound for a machine-repairman with state-dependent service rate. Simpler rational functions, such as Amdahl's law and Gustafson speedup, are corollaries of this queue-theoretic bound. Cp is further shown to be both necessary and sufficient for modeling all practical characteristics of computational scalability.
Available as an arXiv e-print.

8  Getting in the Zone for Successful Scalability (2008)

Joint with J. Holtman. Accepted for CMG 2008, Las Vegas, Nevada.
Abstract: The Universal Scalability Law (USL) is an analytic function used to quantify application scaling. It is universal because it subsumes Amdahl's law (AL) and linear scaling (LS) as special cases. Using simulation, we show (1) that USL is equivalent to synchronous queueing in a load-dependent machine repairman model, and (2) how LS, AL and USL can be regarded as boundaries defining three performance zones. Typical throughput measurements lie in all three zones. Simulation scenarios provide insight into which application features should be tuned to get into the optimal performance zone.
Available as an arXiv e-print.

9  Object, Measure Thyself: Performance Monitoring and Data Collection (2008)

Matthew O'Keefe, Michael Ducy, Greg Opaczewski, and Stephen Mullins, (Orbitz Worldwide). Presented at CMG 2008, Las Vegas, Nevada.
These guys were involved with the development and deployment of ERMA (Extremely Reusable Monitoring API), and Graphite: the popular open source time-series plotting tool. (See Section 3.2 of the paper) Although I didn't write any code, I did help them write up the instrumentation work done at Orbitz Worldwide in this paper and have them present it at the 2008 CMG conference. Oh! And I came up with the title, which harks back to a previous attempt of mine at a CMG 2002 event to encourage IDE vendors to do something similar in their C++ and Java libraries but, due a lack of obvious ROI, that never went anywhere.
  1. CMG 2008 paper
  2. CMG 2008 slides

10  Multidimensional Visualization of Oracle Performance Using Barry007 (2008)

Joint with T. Põder. Accepted for CMG 2008, Las Vegas, Nevada.
Abstract: Most generic performance tools display only system-level performance data using 2-dimensional plot or diagram, and this limits the informational detail that can be displayed. A modern relational database system like Oracle, however, can concurrently serve thousands of client processes with different workload characteristics and generic data displays inevitably hide important information. Drawing on our previous work, this paper demonstrates the application of Barry007 (See paper 13) multidimensional visualization for analyzing Oracle end-user session-level performance showing both collective trends and individual performance anomalies.

11  Eine Chance für Linux (2008)

Appears in the May 18 volume of Linux Technical Review. (in German)
Abstract (Translation): Linux could be in a position to expand its presence in the server market by looking to mainframe computer performance management as a role model and adapting its instrumentation accordingly.
  1. German version (PDF)
  2. English version (PDF)

12  Better Performance Management Through Better Visualization Tools (2008)

Invited presentation at the Hotsos Symposium in Dallas, Texas, March 2-6 2008.

13  Seeing It All at Once with Barry (2007)

Joint with M. F. Jauvin. Presented at CMG 2007, San Diego, California.
Paper (PDF), Slides (PDF), Animations (HTML), Tools (HTML).

14  Leistungsdiagnostik (Load Averages and Stretch Factors) (2007)

July 2007 issue of Linux Magazin.
  1. English version (PDF)

15  Berechenbare Performance (Predicting Performance) (2007)

Invited paper published in the German monograph Linux Technical Review 02 - Monitoring.
  1. German version (PDF)
  2. English version (PDF)

16  Moore's Law: More or Less? (2007)

Published in the May issue of the CMG MeasureIT e-zine.

17  Visualizing Virtualization (2007)

Guest editorial for the March issue of the CMG e-zine called MeasureIT.

18  Guerrilla Scalability: How to Do Virtual Load Testing (2007)

Invited presentation at the Hotsos Symposium 2007, March, Dallas, Texas.

19  The Virtualization Spectrum from Hyperthreads to GRIDs (2006)

This paper, presented at CMG 2006, Reno, Nevada, is based on the following observations: The associated polling frequency (from GHz to μHz) positions each virtual machine implementation into a region of the VM-spectrum. Several case studies are analyzed to illustrate how this insight can make virtual machines more visible to performance management techniques.

20  Reconstructing the Future: Capacity Planning with Data That's Gone Troppo (2006)

Joint with S. Jenkin. Paper presented at CMG-A 2006, Sydney, Australia.

21  Benchmarking Blunders and Things That Go Bump in the Night (2006)

Published as Part I and Part II in the CMG MeasureIT online magazine.

22  Unification of Amdahl's Law, LogP and Other Performance Models for Message-Passing Architectures (2005)

This paper generalizes the theorem in paper 31 below and was presented at PDCS 2005 VII International Conference on Parallel and Distributed Computing Systems

23  (Numerical) Investigations into Physical Power-law Models of Internet Traffic Using the Renormalization Group (2005)

Paper presented at the Triennial Conference of the International Federation of Operations Research Societies, Honolulu, Hawaii, July 11-15, 2005.
Uses the real-space variant of the renormalization group to exclude certain models that have appeared in the literature to account for so-called self-similar Internet traffic and further suggests that the claimed ramifications for Internet capacity planning may have been over-emphasized.
Chapter 10 of Guerrilla Capacity Planning presents these conclusions in a less mathematical form.

24  Millennium Performance Problem 1: Performance Visualization (2005)

Published in the CMG MeasureIT online magazine.

25  Benchmarking Blunders and Things That Go Bump in the Night (2004)

Abstract: Benchmarking; by which I mean any computer system that is driven by a controlled workload, is the ultimate in performance testing and simulation. Aside from being a form of institutionalized cheating, it also offer countless opportunities for systematic mistakes in the way the workloads are applied and the resulting measurements interpreted. Right test, wrong conclusion is a ubiquitous mistake that happens because test engineers tend to treat data as divine. Such reverence is not only misplaced, it's also a sure ticket to production hell when the application finally goes live. I demonstrate how such mistakes can be avoided by means of two war stories that are real WOPRs. (a) How to resolve benchmark flaws over the psychic hotline and (b) How benchmarks can go flat with too much Java juice. In each case I present simple performance models and show how they can be applied to correctly assess benchmark data.
Presented at WORP2 Workshop 2004
Available as an arXiv e-print.

26  How to Get Unbelievable Load Test Results (2004)

Featured at TeamQuest Corporation as an online capacity planning column.

27  Performance Evaluation of Packet-to-Cell Segmentation Schemes in Input Buffered Packet Switches (2003)

Joint with K. J. Christensen, K. Yoshigoe and A. Roginsky
Presented at High-Speed Networks Symposium of the IEEE International Conference on Communications (ICC 2004).
Available from arXiv server.

28  Unix Load Average Metric (2003-2004)

Originally published as a series of online performance columns for TeamQuest Corp. NOTE: The hyperlinked version of the Linux kernel is release 2.6.xx
This information, like everything else, can be found in the online files along with the source code (as cited in my writings).
A more detailed discussion appears in Chapter 4 of my Perl::PDQ book.

29  Series on Guerrilla Capacity Planning (2003)

These two articles: were published in the CMG MeasureIT online magazine.

30  Characterization of the Burst Stabilization Protocol for the RR/CICQ Switch (2003)

Joint with K. J. Christensen and K. Yoshigoe
Accepted by IEEE Conference on Local Computer Networks
Download as a PDF from arXiv server.

31  A New Interpretation of Amdahl's Law and Geometric Scalability (2002)

Amongst other things, this paper presents the theorem:
Amdahl's law for parallel speedup is equivalent to the synchronous queueing
bound on throughput in the repairman model of a symmetric multiprocessor.
Download from the arXiv server.

32  Hit-and-Run Tactics Enable Guerrilla Capacity Planning (2002)

Published in IEEE IT Professional journal, pp. 40-46, July-August issue, 2002.

33  Hypernets: Good (G)news for Gnutella! (2002)

Online article responding to an earlier analysis of Gnutella written by Jordan Ritter in 2001.
Measurements of both Napster and Gnutella are also disussed in this 2003 paper.
I point out that hypernets like a 20-degree virtual hypertorus or hypercube are much more efficient than a tree.
I looks as though BitTorrent if fact does something like this.
This online article was slashdotted in Feb, 2002.

34  Of Buses and Bunching: Strangeness in the Queue (2001)

TeamQuest online column.

35  Quantifying Application and Server Scalability (2001)

The following series of three articles: were published as TeamQuest online columns.

36  How to Write Application Probes (Updated 2001)

TeamQuest performance column online.

37  Scalability Models for a Hypergrowth e-Commerce Website (2000)

Published in the Springer Lecture Notes in Computer Science series.

38  BIRDS-I: A Benchmark for Image Retrieval on the Internet (2000)

Tech Report published by HP Labs.

39  Solaris Resource Manager: All I Ever Wanted Was My Unfair Advantage ... and Why You Can't Get It! (1999)

This paper is about virtualization implemented in SHARE II [1, 2] sitting on top of the Solaris kernel and rebadged by Sun as their System Resource Manager (SRM). Although it was written in 1999, now ubiquitous virtualization, whether implemented as hypervisors (e.g., VMware, XenServer, AWS), or Linux Containers, all use some form of fair-share scheduler [3]. Astoundingly, most people deploying applications either on local hypervisors or remote clouds remain blissfully unaware of this fact, and its potential impact on application performance and observed capacity. See also paper 19, "The Virtualization Spectrum from Hyperthreads to GRIDs" (2006).
Traditional UNIX time-share schedulers attempt to be fair to all users by employing a round-robin style algorithm for allocating CPU time. Unfortunately, a loophole exists whereby the scheduler can be biased in favor of a greedy user running many short CPU-bound processes. This loophole is not a defect but an intrinsic property of the round-robin scheduler that ensures responsiveness to the short CPU demands associated with multiple interactive users. A new generation of UNIX system resource management (SRM) software constrains the scheduler to be equitable to all users regardless of the number of processes each may be running. This fair-share scheduling draws on the concept of prorating resource "shares" across users and groups and then dynamically adjusting CPU usage to meet those share proportions. The simple notion of statically allocating these shares, however, belies the potential consequences for performance as measured by user response time and service level targets. We demonstrate this point by modeling several simple share allocation scenarios and analyzing the corresponding performance effects. A brief comparison of commercial SRM implementations from Hewlett-Packard, IBM, and Sun Microsystems is also presented.
Presented at the Computer Measurement Group Conference, Reno, NV, Dec. 5-10, 1999.
[1] A, Bettison, A, Gollan, C. Maltby, N. Russell, "SHARE II—A User Administration and Resource Control System for UNIX," LISA V, San Diego, CA, Sep. 20-Oct. 3, 1991.
[2] N. J. Gunther, "UNIX Resource Managers: Capacity Planning and Resource Issues," SAGE-AU Conference, Gold Coast, QLD, Australia, July 3-7, 2000.
[3] J. Kay, and P. Lauder, "A Fair Share Scheduler," Communications of the ACM, 31(1), pp. 44-55, 1988.

40  Windows NT Scalability (1997-1998)

The following series of three papers: were published in the USENIX journal.

41  The MP Effect: Parallel Processing in Pictures (1996)

Received CMG 1996 Best Paper award.

42  A Simple Capacity Model of Massively Parallel Transaction Systems (1993)

This is the original paper that forms the basis of the Universal Law of Computational Scaling, and was presented at CMG 1993, San Diego, California.

43  The Collapse of Internet Performance (1988)

Background information.
Information Processing Letters publication (5.1 MB scanned PDF).
Copyright © 1996-2018 Performance Dynamics Company. All Rights Reserved.



File translated from TEX by TTH, version 3.81.
On 11 Aug 2017, 11:20.