\Huge Guerrilla Data Analysis Techniques (GDAT)
Guerrilla Data Analysis Techniques (GDAT)
5-Day class with an emphasis on using
R and
PDQ-R
| with |
|  |  |
| Prof. David Lilja | Mr. Jim Holtman | Dr. Neil Gunther |
Contents
1 Why You Need This Course
2 Certification
3 Course Goals
4 Dates
5 Course Structure
5.1 GDAT Content Day 1
5.2 GDAT Content Day 2
5.3 GDAT Content Day 3
5.4 GDAT Content Day 4
5.5 GDAT Content Day 5
6 Instructors
6.1 David Lilja
6.2 Jim Holtman
6.3 Neil Gunther
7 Terms and Conditions
8 Textbooks
1 Why You Need This Course
Many Guerrilla alumni have asked for this class. Why?
Well, they've collected cubic light years of performance data, but then they
realize that anyone could have pushed the same buttons they did to
collect that data. No job security there. They want to set themselves apart by
transforming that raw performance data into performance information by
applying
Guerrilla techniques.
That's exactly what we teach you in this class.
Moreover, the data analysis techniques we present are general purpose, and therefore
not tied to any particular computing platform or data collection tools.
2 Certification
This class corresponds to Guerrilla Capacity Planner: Level III certification.
The levels are defined as:
- Entry level, e.g.,
Guerrilla Boot Camp.
- Exposure to a wide variety of computer systems capacity planning concepts, methods, and
tools that can be adapted opportunistically to support the needs of
enterprise-level platform-independent performance management.
An example class is
Guerrilla Capacity Planning.
- Detailed study of a particular capacity planning technique or performance analysis tool.
A printed certificate reflecting the level of achievement is awarded to each attendee who completes the course.
Official Purpose
This new 5-day course falls naturally into two parts:
- An easy introduction to both simple and sophisticated statistical concepts.
We begin with a comparison of the three
primary techniques used to measure and evaluate the performance of computer
systems, an in-depth look at the metrics used to characterize performance,
and a survey of the strategies used in the fundamental measurement tools
and techniques. The focus then shifts to provide a gentle introduction to
some of the key statistical tools and techniques needed to interpret noisy
performance measurements and to understand complex simulation results. We
also will examine techniques that can be used to appropriately design
experiments to obtain the maximum amount of information for a given level
of experimental effort. The course then concludes with a discussion of the
key issues related to system simulation.
- Demonstrations of how to apply those concepts. We use tools like
Excel, R, SIMUL8, SimPy and Mathematica applied to actual computer performance data.
3 Course Goals
After completing this course, the participants will be able to:
- Rigorously compare the performance of computer systems in the
presence of measurement noise.
- Determine whether a change made to a system has a statistically
significant impact on performance.
- Use statistical tools to reduce the number of simulations that
need to be performed of a computer system.
- Design a set of experiments to obtain the most information for a
given level of effort.
- Understand the inherent trade-offs involved in using simulation tools, e.g., SIMUL8,
and analytical modeling tools, e.g.,
PDQ-R.
- Apply tools, like R and Excel, to the analysis of large volumes of
computer performance data.
- Discern which visualization techniques are best suited
to assist in converting performance data into information.
- Participate in ongoing Performance Dynamics-sponsored email discussions about using R, PDQ-R and
other tools on the job in their shop.
4 Dates
Check the
schedule
page for the latest information.
Online
registration
is available. Additional registration details are provided at the end of this page.
Who Should Attend
This class is intended for application scientists and engineers,
computer architects, compiler writers, and software engineers who use or
design high-performance computer systems. The level of the presentation is
appropriate for both practitioners and students. Experts from any
scientific discipline will find this class useful in helping to
understand how to appropriately measure and statistically analyze the
performance of their systems and applications.
Content level: 20% beginner, 60% intermediate, 20% advanced.
5 Course Structure
Class begins at 9am and ends at 5pm each day.
A morning break of half an hour is serviced around 10:30am
Seated lunch service is provided from Noon until 1pm.
A serviced afternoon break of half an hour occurs around 3:00pm
A number of practical exercises will be given and discussed throughout
the five days. You are encouraged to bring a laptop computer.
5.1 GDAT Content Day 1
- Introduction
-
- Measurement
- Simulation
- Analytical modeling
- Performance Metrics
-
- Characteristics of good metrics
- Processor and system metrics
- Speedup and relative change
- Measurement Tools and Techniques
-
- Strategies
- Interval timers
- Program profiling
- Tracing
- Indirect measurement
5.2 GDAT Content Day 2
- Statistical Interpretation of Data
-
- What do all of these means mean?
- Sources of measurement errors
- Confidence intervals
- Statistical comparison alternatives
- Design of Experiments: Part 1
-
- Terminology
- One-factor ANOVA (Analysis of Variance)
- Two-factor ANOVA
5.3 GDAT Content Day 3
- Design of Experiments: Part 2
-
- Generalized m-factor experiments
- Fractional factorial designs
- Multifactorial designs
- Plackett-Burman design matrix
- Application to Simulations
-
- Types of simulations: event-based, workload simulation
- Random number generation
- Verification and validation
5.4 GDAT Content Day 4
- Introduction to Statistical Analysis Tools
-
- Comparison of Excel, R, SIMUL8, SimPy, Mathematica
- Demonstration of doing statistical analysis with R
- Handling millions of data items quickly
- Computing statistics, graphing the results, confidence intervals
- Guided Tour of Techniques
-
- ANOVA calculations
- Plackett-Burman designs in R
5.5 GDAT Content Day 5
- Using R to Analyze Performance Data
-
- Detailed examples and case studies
- Interfaces to SQL databases
- Advanced R techniques for analyzing data by partitioning and processing subsets
- Debugging your R scripts
- Advanced Techniques
-
- Multivariate analysis case study
- Data visualization techniques for performance analysis
- Open discussion and student-specific examples
6 Instructors
6.1 David Lilja
David received the Ph.D. and M.S. degrees, both in Electrical Engineering,
from the University of Illinois at Urbana-Champaign, and a B.S. in Computer
Engineering from Iowa State University in Ames. He is currently a
Professor of Electrical and Computer Engineering,
and a Fellow of the
Minnesota Supercomputing Institute, at the University of Minnesota in
Minneapolis. He also serves as a member of the graduate faculties in
Computer Science and Scientific Computation, and was the founding Director
of Graduate Studies for Computer Engineering. He has been a visiting
senior engineer in the Hardware Performance Analysis group at IBM in
Rochester, Minnesota, and a visiting professor at the University of Western
Australia in Perth supported by a Fulbright award.
Previously, he worked as a research assistant at the Center for
Supercomputing Research and Development at the University of Illinois, and
as a development engineer at Tandem Computers Incorporated (now a division
of HP/Compaq) in Cupertino, California. He has served on the program
committees of numerous conferences; was a distinguished visitor of the IEEE
Computer Society; is a Senior member of the IEEE and a member of the ACM;
and is a registered Professional Engineer. His primary research interests
are in high-performance computer architecture, parallel computing,
nanocomputing, hardware-software interactions, and performance analysis.
6.2 Jim Holtman
Jim has a BSEE from New Mexico State University and an MSEE/Comp Sci from
the University of California at Berkeley. He worked at Bell Labs
developing a real-time operating system for the Safeguard Anti-ballistic
Missile system which was one of the first multiprocessor systems in the
late 1960s. He worked on the development of operation support systems for
the Bell System and was named a Bell Labs Fellow for his establishment of
the architecture review process at Bell Labs.
He then worked for Convergys consulting with various groups developing
real-time billing systems for mobile carriers on their architecture and
performance issues.
He is currently retired, but is still interested in the analysis and
visualization of computer performance data. He is an advocate of the
R-language for analyzing data and has taught courses on R and on systems
architecture/performance. In particular, he has presented on
this subject
at CMG.
6.3 Neil Gunther
Neil holds an M.Sc. in Applied Mathematics and a Ph.D. in theoretical
physics. He is an internationally recognized researcher and consultant in Information Processing who founded
Performance Dynamics Company in 1994. Prior to that, Dr. Gunther held research and management
positions at San Jose State University, JPL/NASA (Voyager and Galileo
missions), Xerox PARC and Pyramid/Siemens Technology. His classes have been given at both
corporate and academic institutions including AOL, Boeing, FedEx, Melbourne University, Motorola, Nokia,
Stanford University, Sun Microsystems (both USA and EU) and Thales Group (Holland).
Dr. Gunther
is the author of numerous papers as
well as several books,
and in 2008 he was the recipient of the
A.A. Michelson Award
from CMG, the industry's highest
honor for computer performance analysis and capacity planning. He was also recently
elected to the rank of Senior Member of the IEEE.
7 Terms and Conditions
Tuition Fees
Please consult the
Class Schedule
page for current pricing and conditions.
Transportation
Information will be sent upon receipt of enrollment. A packet will include
airport and transportation options.
Reservations
All confirmed reservations must be must be
accompanied by a purchase order number, a check for the tuition, or credit card
information for billing. Courtesy Reservations will be held for up to 30 days in
order for paperwork to be processed so long as there is suffcient time and
adequate space in thecourse.
8 Textbooks
A copy of the textbooks
Measuring Computer Performance
(Cambridge University Press, 2000),
and
Analyzing Computer System Performance with Perl::PDQ
(Springer-Verlag, 2005),
are included in the price of admission.
Location
Please consult the
Class Schedule
for hotel location details.
The city of Pleasanton is right next door to Castro Valley.
Meals
Breakfast, lunch, morning and afternoon breaks will be catered for by the hotel each day. See the
Mini Survival Guide
explaining how to get to the hotel and a list of local restaurants to eat at, once you do.
File translated from
TEX
by
TTH,
version 3.38.
On 10 Jun 2009, 15:07.