Annotated Bibliography

This bibliography is based on books that I have in my possession which I have either used or intend to use. It is not intended to be definitive or exhaustive. Within each subject area, authors are listed alphabetically by surname. My comments appear indented after each title in a fixed-pitch font. Both books and technical papers are cited, together with the corresponding URL when appropriate.

1 Benchmarking
2 Network Capacity
3 Queueing Models
4 Server Architectures
5 Simulation Models
6 Software Development
7 Statistical Methods
8 Vendor Specific
9 Web Technologies

1 Benchmarking

This section includes a listing of books and web sites that contain information related to the benchmarking section of my class notes.

References

[1]: CMG Computer Measurement Group

Formed some 30 years ago by IBM 390 mainframe folks, this is arguably the only comprehensive annual conference on performance analysis and capacity planning that now also includes UNIX, NT, Linux, Java, networking, storage, etc. The U.S. national conference is held every December while U.S. regional meetings are held throughout the year. International conferences are also held in Australia, U.K., Europe and South Africa. A CD containing a collection of the U.S. papers presented over the past 25 years is available for a nominal charge. Their website is http://www.cmg.org/
[2]: Gray, J. (Ed.) Benchmark Handbook: For Database and Transaction Processing Systems, Morgan Kaufmann 1992.

Dated but worthwhile background information on the historical development of the TPC, SPEC, and other industrial strength benchmarks.
[3]: SPEC The System Performance Evaluation Corporation

Defines benchmarks for the performance comparison of CPU and other computer subsystems. Benchmark codes are purchased and provided on an appropriate media e.g., tape or CD. Benchmark specification, developments, and results are available from their website http://www.spec.org/
[4]: TPC The Transaction Processing Council

Defines benchmarks for the performance comparison of database management systems. Only benchmark specifications are provided; not code. Benchmark specification, developments, and results are available from their website http://www.tpc.org/

2 Network Capacity

This section includes books related to the network capacity section of my class notes. The corresponding section on Web-based technologies 9 should also be reviewed.

References

[1]: Buchanan, R. W. The Art of Testing Network Systems, Wiley 1996.

Although the focus is ostensibly on the network infrastructure, it remains the only book that I know of to present any kind of methodological process for measuring system-level performance by means of synthetic workload generation. As I mention in the class, this is the most complex form of simulation (usually being one step away from actual deployment into production) hence, there are many opportunities to produce erroneous results. Having a formal process helps to minimize those errors. Employing operational laws and bounds analysis to cross-check the data would improve the situation even further. Unfortunately, the book is entirely lacking in the use of such analytic methods.
[2]: Gunther, N. J. The Practical Performance Analyst, iUniverse 2000.

PART III of my book constitutes the technical background for the topic of large network transients that I present on the last day of the course. My approach is based on an adaption of Feynman's formulation of quantum mechanics based on his integral over sample-paths. Feynman's formalism provides both an intuitive insight into the nature of spontaneous network congestion ¹ as well as a calculation tool to estimate the mean time to such events. In this sense, the integral over sample-paths falls between the visualization of Nelson's Catastrophe Theory approach [3] and the rigor of the Large Deviations calculations due to Weiss [4] et al.
[3]: Nelson, R. Stochastic Catastrophe Theory in Computer Performance Modeling, Journal of the ACM, 34:661, 1987.

A fascinating application of catastrophe theory to the same problems as those discussed in PART III [2] of my book. The main limitation with this approach is the inability to calculate the mean time to transition into the congested or thrashing state.
[4]: Shwartz, A. and Weiss, A. Large Deviations for Performance Analysis, Chapman & Hall 1995.

An horrendously difficult book to read for those with less than a Ph.D. in applied probability theory. The emphasis is on mathematical rigor rather than practical application. A lot of deep insights are buried under tons of Greek notation. A comparison with the work of yours truly [2] is cited in item H. on p.28 of their book.
[5]: Stallings, W. High Speed Networks: TCP/IP and ATM Design Principles, Prentice-Hall, 1998.

A beautifully written book that includes a discussion of the Bellcore internet traffic results.
[6]: Walrand, J. and Varaiya, P. High Performance Communication Networks, 1996.

A more readable treatment of the method of Large Deviations than that presented by Shwartz and Weiss [4], but it still moves very quickly for those readers unfamiliar with the concepts. Particular attenton is given to its application in the context of QoS requirements and admission policies for ATM networks.

3 Queueing Models

This section includes books related to the section of my class notes entitled, ''Queueing Theory for Those Who Can't Wait'' held on the first day.

References

[1]: Allen, A. O. Probability, Statistics, and Queueing Theory with Computer Science Applications, (2^nd edition) Academic Press 1990.

Very mathematical treatment combined with a broad range of computer performance examples. Covers similar topics to Jain's book. Almost worth buying for the amusing quotes alone. Unfortunately, a lot of the coded examples are written in APL. Since this language is not familiar to most performance analysts, it is next to impossible to translate the examples into a more modern language. Allen, himself, has tended toward Mathematica in recent years.
[2]: Highleyman, W. H. Performance Analysis of Transaction Systems, Prentice-Hall, 1989.

Recommended for those readers specializing in the performance analysis of OLTP database systems. The depth of discussion is excellent throughout and is grounded in real-world analysis. The main distraction comes from the obscure notational conventions adopted throughout. It is recommended that you first make a translation table based on Appendix 1 in the book. Another potential limitation arises from the use of purely open queueing circuits to model OLTP systems. There is no accompanying software, but with a medium amount of effort (mostly in translatimg the symbols) you should find it a straightforward matter to recast Highleyman's models into PDQ.
[3]: Kleinrock, L. Queueing Systems, (Vols. I and II) Wiley 1975.

One of the earliest comprehensive books on queueing theory and still holds its own 25 years later. Although a very mathematical treatment, it is extremely well written from an engineering viewpoint and is very readable. This is mostly a treatment of single queues. There is no discussion of queueing circuits (necessary for representing modern computer systems and networks) and no discussion of new algorithms e.g., MVA [4], since it was written prior to their development. Volume II contains a detailed discussion of the ALOHA network [2] that I refer to on the last day of my class.
[4]: Lazowska, E. D., Zahorjan, J., Graham, G. S., and Sevcik, K. C. Quantitative System Performance: Computer System Analysis Using Queueing Network Models, Prentice-Hall 1984

The original inspiration for The Practical Performance Analyst and PDQ. It avoids formal queueing theory and emphasizes the use of Mean Value algorithms. The authors are pioneers in developing and applying the MVA (Mean Value Analysis) technique to mainframe computer systems. They also discusses important methodological aspects of performance measurement and prediction. Unfortunately, the content was allowed to age significantly the book is now OOP. It was never revised to include the newer distributed computing technologies. Nearly, all the examples and methodologies are drawn from IBM mainframes. These examples are still highly relevant if you operate an IBM mainframe but less so if you are using computer systems based on UNIX, Client/Server, SCSI, WindowsNT, etc. A sophisticated MVA analyzer package (called MAP) written in FORTRAN is still available for separate purchase. The full text of the 1984 edition is now available online as a set of PDF files.
[5]: Leung, C. H. C. Quantitative Analysis of Computer Systems, Wiley 1988.

Rigorous but readable discussion of priority queues. Now OOP-sed ².
[6]: Menasce, D. A., Almeida, B., and Dowdy, L., W. Capacity Planning and Performance Modeling, Prentice-Hall 1994.

The approach to queueing models is very much in the style of MVA [4] [2] but brought up to date for client/server technologies. Comes with binary MVA calcultor on diskette.
[7]: Tanner, M. Practical Queueing Analysis, McGraw-Hill 1994.

An excellent introduction to the theory of single queues from the standpoint of practical application that includes computer and telecommunication performance analysis. Intuition is emphasized over mathematical derivation. An accompanying diskette contains PASCAL programs for solving the examples of everyday queueing phenomena discussed in the book. The style and material are well suited to the practical performance analyst.

4 Server Architectures

This section covers books on server and processor architectures.

References

[1]

Buyya, R. (Ed.) High Performance Cluster Computing. Vol. 1, Architectures and Systems Prentice-Hall 1999.

A very comprehensive review of the state of the art circa the mid '90's with each chapter written by experts in their respective fields.

[2]

Flynn, M. J. Computer Architecture: Pipelined and Parallel Processor Design Jones and Bartlett 1995.

A very thorough book on processor design. I prefer this book to Patterson and Hennessy [3]. Chapter 6 applies queueing theory to represent processor-memory performance.

[3]

Hennessy, J. L., and Patterson, D. A. Computer Architecture: A Quantitative Approach, (2^nd Edition) Morgan Kaufmann 1996.

For a lot of people this is the ''bible'' written by two of the high-priests of modern CPU design (aka RISC processors). Personally, I don't like the style or the content. I find it too qualitative for my tastes. Even when the authors attempt to get ''quantitative,'' it looks qualitative. For example, here is how they present Amdahl's Law:

Speedup_overall = 1
(1 - Fraction_enhanced) + Fraction_enhanced
Speedup_enhanced

which turns out to be equivalent to: Speedup(p) = 1 (1 - (1 - s)) + 1 - s p = p 1 + s (p - 1) in the notation I use throughout the class.

[4]

Pfister, G. F. In Search of Clusters, Prentice-Hall 1998.

A well-written book that provides a broad overview of cluster concepts (e.g., ccNUMA, and SCI) very quickly (e.g., I read it during a coast-to-coast flight). Figure 118 on p.467 presents the same cross-over scalability curves for SMP vs. clusters that I present in my class notes. The difference is, this is a ''marketing'' graph whereas I present the ''super-serial'' equations that are responsible for these curves.

[5]

Sportak, M. A. Windows NT Clustering, SAMS 1997.

TBC

5 Simulation Models

This section includes books related to topic of event-based simulators.

References

[1]: Law, A. M. and Kelton, W. D. Simulation Modeling and Analysis, McGraw-Hill 1982.

Regarded by many as the simulationist's bible. Very mathematical presentation.
[2]: MacDougall, M. H. Simulating Computer Systems: Techniques and Tools, MIT Press 1990.

A good introduction to event-based simulation techniques and theory. Not mathematical. Also includes the C source code for the SMPL simulator as Appendices. Now OOP-sed.

6 Software Development

This section has books that pertain to the application of performance analysis and scalability as applied to software development (a still bourgening area).

References

[1]: Smith, C. U. Performance Engineering of Software Systems, Addison-Wesley 1991.

A highly underrated book on software performance engineering (SPE). Philosophically, there is no question that the espoused performance methodology for software design is meritorious but the book's message generally seems to have fallen on deaf ears. Why? Smith argues that SPE is cost-effective and lowers risk but, as I explain in this course, new business approaches have tended to de-emphasize performance analysis in preference to rapid software development. Moreover, software engineers (unlike hardware engineers) are not trained to build models (let alone performance models); modeling is a very foreign concept to them. Unfortunately, no software accompanied the book's release to aid the SPE cause. Now, however, a new modeling package called SPE-ED(tm), can be purchased from http://www.perfeng.com. These criticisms notwithstanding, the book is recommended reading for every software engineer. Unfortunately, it is now OOP-sed but see the next entry.
[2]: Smith, C. U. and Williams, L. G. Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software, Addison-Wesley 2001.

A welcome new arrival but I have not had time to review it yet.
[3]: Dumke. R. et al. (Eds.) Performance Engineering: State of the Art and Current Trends Springer 2001.

A collection of essays about various aspects of SPE including a chapter by Connie Smith and another by yours truly.

7 Statistical Methods

This section covers books and papers cited in the Guerrilla Data Analysis class.

References

[1]: Jain, R. The Art of Computer System Performance Analysis: Techniques for Experimental Design, Measurement, Simulation and Modeling, Wiley 1991.

One of the best general purpose performance texts after almost a decade. It covers a lot of practical techniques, factorial design [2] of performance measurements being covered in several chapters. It also contains a good introduction to the Mean Value Analysis [4] techniques used in PDQ. Because of the book's breadth, several important topics such as parallel application metrics and client/server capacity planning are not treated explicitly. There is no accompanying software.
[2]: Box, G. E. P., Hunter, W. G., and Hunter, J. S. Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, Wiley 1978.

A formal, yet highly readable, book on a very difficult subject. It covers Design of Experiments and Regression Models in great detail; including the standard pitfalls.
[3]: Joshua J. Yi, David J. Lilja, and Douglas M. Hawkins, ``A Statistically Rigorous Approach for Improving Simulation Methodology,'' International Symposium on High-Performance Computer Architecture (HPCA), February, 2003.
[4]: R. Plackett and J. Burman, ``The Design of Optimum Multifactorial Experiments,'' Biometrika, Vol. 33, Issue 4, June, 1946, pp. 305-325.
[5]: D. C. Montgomery, Design and Analysis of Experiments, (5th ed), Wiley and Sons, 2000,
[6]: AJ KleinOsowski and David J. Lilja, ``MinneSPEC: A New SPEC Workload for Simulation-Based Computer Architecture Research,'' Computer Architecture Letters, Vol. 1, June, 2002, pp. 10-13. See also MinneSPEC
[7]: L. Eeckhout et al, ``Designing Computer Architecture Workloads,'' IEEE Computer, Feb., 2003, pp. 65-71.
[8]: J. Haskins and K. Skadron, ``Minimal Subset Evaluation: Rapid Warm-up for Simulated Hardware State,'' Intl. Conf. Computer Design, 2001.
[9]: R. E. Wunderlich, T. F. Wenisch, B. Falsafi, J. C. Hoe, ``SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling,'' Intl. Symp. Computer Architecture, 2003, pp. 84-95.
[10]: T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, ``Automatically Characterizing Large Scale Program Behavior,'' Intl. Conf. Architectural Support for Programming Languages and Operating Systems, 2002.
[11]: Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. Time Series Analysis: Forecasting and Control, (3^rd edition) Prentice-Hall 1994.

Very well written but highly mathematical text which covers the complete theory of time series analysis.

The following links provide more information about individual books on statistics and statistical software packages.

8 Vendor Specific

A number of good books are couched in a vendor-specific context, while the methodologies they describe might be quite generally applicable. I've tried to list some of those books here.

References

[1]: Cockcroft, A. and Pettit, R. Sun Performance and Tuning (2^nd edition) Prentice-Hall 1998.

The 1^st edition set the standard for performance tuning of Solaris. The layout makes it particularly readable. This expansion of the 1^st edition includes such new topics as: TCP/IP performance, sizing java clients and the SEToolkit.
[2]: Cockcroft, A. and Walker, J. Capacity Planning for Internet Services Prentice-Hall 2001.

I only just received my copy and have not had time to review it yet.
[3]: Corrigan, P. and Gurry, M. ORACLE Performance Tuning, O'Reilly 1993.

Tuning and sizing methodologies up through ORACLE 7. Very comprehensive. A 2^nd edition was published in November 1996 but is still lacking any discussion of the ORACLE 8i or 9i releases.
[4]: Loukides, M. System Performance Tuning O'Reilly 1992.

An oldy but a goody for those not so familiar with the performance ''knobs'' of UNIX ³ in all its various incarnations. A more recent and complimentary book in the same vein is textitUNIX Power Tools by ...
[5]: McDougall, R. et al. Resource Management Prentice-Hall 1999.

Discusses Solaris Resource Manager (SRM) and how it uses the fair share scheduler from the setup and behavior standpoint. Also discusses network Bandwidth Manager which uses class-based queueing, and Load Sharing Facility for HPC applications. What is lacking throughout is, a more integrated and comprehensive set of tools to help a sysadm use these automated facilities to meet service level objectives.
[6]: Mauro, J. and McDougall, R. Solaris Internals: Core Kernel Components Prentice-Hall 2001.

If you're really into Solaris (the kernel, that is), this is the book! If you still can't get enough, come and get it from the horse's mouth.
[7]: Samson, S. L. MVS Performance Management: OS/390 Edition McGraw-Hill 1997.

A comprehensive treament of performance tuning for IBM mainframes by an MVS expert and CMG [1] regular. I have drawn on Steve's discussons of DASD modeling and WLM in Goal Mode for some of my class material.
[8]: Wong, B. Configuration and Capacity Planning for Solaris Servers Prentice-Hall 1997.

Heavy focus on disk sizing and performance.

9 Web Technologies

These books will be useful to those of you involved in planning and procuring internet or intranet services. The corresponding section on network capacity 2 should also be reviewed.

References

[1]

Bulka, D. Java Performance and Scalability Volume 1: Server-Side Programming Techniques Addison-Wesley 2000.

TBD

[2]

Halter S. L. and Munroe, S. J. Enterprise Java Performance Prentice-Hall 2001.

TBD

[3]

Killelea, P. Web Performance Tuning, O'Reilly 1998.

A nicely written book with lots of tips mostly centered on HTTPd performance tuning. Heavy on empirical insights, rather weaker on formal analysis.

[4]

Menasce, D. A. and Almeida, B. Capacity Planning for Web Performance, Prentice-Hall 1998.

Presents a combination of formal analysis along with suggestions for data collection and parameterization of formal models. The approach to queueing models is very much in the style of MVA [4] [2] but brought up to date for web technologies.

[5]

Menasce, D. A. and Almeida, B. Scaling for E-Business, Prentice-Hall 2000.

Similar content to [4] but with an emphasis on database back-end configurations. Narrowly escapes getting into the complexities of modeling such things as thread allocation.

[6]

Neward T. Server-based Java Programming Manning Publications 2000.

TBD

[7]

Wilson, S. and Kesselman, J. Java Platform Performance: Strategies and Tactics, Addison-Wesley Longman 2000.

Presents methods for measuring client-side Java performance, mostly through the use of benchmarks. Unfortunately, the authors get off to a bad start in Chap. 1, p. 6 where they discuss the important subject of application scalability.
They define scalability rather narrowly as ''the study of how systems perform under heavy loads.'' I would prefer they had said, under increasing user load. After all, contention may set in with just a few active users. Referring next to Fig. 1-2 (above) which depicts a generic hockey-stick shaped response-time characteristic, the authors claim this as evidence of an application ''that isn't scaling well.'' This conclusion is apparently keyed off the incorrect statement that the response time is increasing ''exponentially'' with increasing user load. No other evidence is provided to support this claim. Not only is the response time not rising exponentially, the application may be scaling as well as it can on that platform. Queueing theory also tells us that the response time rises linearly up the hockey-stick handle; not exponentially. Moreover, such behavior does not, by itself, imply poor scalability. The response time curve may rise super-linearly in the presence of thrashing effects, but this special case is not discussed by the authors. To make matters worse, the authors next refer to Figure 1-3 on p.7 (shown here on the right) which depicts a response time curve that tends to flatten out as user load increases. (It actually looks more like a throughput characteristic). This is presented as an example of an application that ''scales in a more desirable manner'' because the response time degradation is more gradual. Assuming the authors have not mislabeled the plot (and the text indicates that they have not), they have failed to comprehend that the flattening effect is most likely caused by throttling due to a limit on the number of threads that the client can execute or the inability of the server to keep up with requests, or something similar. Whatever the precise cause, this sublinear over-saturation effect needs to be investigated as an undesirable measurement effect rather than a desirable scaling feature.

[8]

Yeager, N. J. and McGrath, R. E. Web Server Technology, Morgan Kaufmann, 1996.

Qualitative treatment with an emphasis on the server-side of web technology. In many ways, a nice compliment to Killelea's book [3]. Becoming a bit dated now.

Footnotes:

¹ This kind of spontaneous congestion is analogous to quantum tunneling.

² OOP == Out of Print

³ There is no ''UNIX''. It's an experiment that escaped from the lab circa 1973 and has been mutating ever since!

File translated from T_EX by T_TH, version 2.25.
On 12 Aug 2005, 22:19.

Annotated Bibliography

Contents

1 Benchmarking

References

2 Network Capacity

References

3 Queueing Models

References

4 Server Architectures

References

5 Simulation Models

References

6 Software Development

References

7 Statistical Methods

References

8 Vendor Specific

References

9 Web Technologies

References

Footnotes: