\Huge Guerrilla Data Analysis Techniques (GDAT)

Guerrilla Data Analysis Techniques (GDAT)

5-Day class with an emphasis on using R and PDQ-R

with
Prof. David Lilja Mr. Jim Holtman Dr. Neil Gunther

Contents

1  Why You Need This Course
2  Certification
3  Course Goals
4  Dates
5  Course Structure
    5.1  GDAT Content Day 1
    5.2  GDAT Content Day 2
    5.3  GDAT Content Day 3
    5.4  GDAT Content Day 4
    5.5  GDAT Content Day 5
6  Instructors
    6.1  David Lilja
    6.2  Jim Holtman
    6.3  Neil Gunther
7  Terms and Conditions
8  Textbooks

1  Why You Need This Course

Many Guerrilla alumni have asked for this class. Why? Well, they've collected cubic light years of performance data, but then they realize that anyone could have pushed the same buttons they did to collect that data. No job security there. They want to set themselves apart by transforming that raw performance data into performance information by applying Guerrilla techniques. That's exactly what we teach you in this class.
Moreover, the data analysis techniques we present are general purpose, and therefore not tied to any particular computing platform or data collection tools.

2  Certification

This class corresponds to Guerrilla Capacity Planner: Level III certification. The levels are defined as:
  1. Entry level, e.g., Guerrilla Boot Camp.
  2. Exposure to a wide variety of computer systems capacity planning concepts, methods, and tools that can be adapted opportunistically to support the needs of enterprise-level platform-independent performance management. An example class is Guerrilla Capacity Planning.
  3. Detailed study of a particular capacity planning technique or performance analysis tool. A printed certificate reflecting the level of achievement is awarded to each attendee who completes the course.

    Official Purpose

    This new 5-day course falls naturally into two parts:
    1. An easy introduction to both simple and sophisticated statistical concepts. We begin with a comparison of the three primary techniques used to measure and evaluate the performance of computer systems, an in-depth look at the metrics used to characterize performance, and a survey of the strategies used in the fundamental measurement tools and techniques. The focus then shifts to provide a gentle introduction to some of the key statistical tools and techniques needed to interpret noisy performance measurements and to understand complex simulation results. We also will examine techniques that can be used to appropriately design experiments to obtain the maximum amount of information for a given level of experimental effort. The course then concludes with a discussion of the key issues related to system simulation.

    2. Demonstrations of how to apply those concepts. We use tools like Excel, R, SIMUL8, SimPy and Mathematica applied to actual computer performance data.

    3  Course Goals

    After completing this course, the participants will be able to:

    4  Dates

    Check the schedule page for the latest information.
    Online registration is available. Additional registration details are provided at the end of this page.

    Who Should Attend

    This class is intended for application scientists and engineers, computer architects, compiler writers, and software engineers who use or design high-performance computer systems. The level of the presentation is appropriate for both practitioners and students. Experts from any scientific discipline will find this class useful in helping to understand how to appropriately measure and statistically analyze the performance of their systems and applications.
    Content level: 20% beginner, 60% intermediate, 20% advanced.

    5  Course Structure

    Class begins at 9am and ends at 5pm each day.
    A morning break of half an hour is serviced around 10:30am
    Seated lunch service is provided from Noon until 1pm.
    A serviced afternoon break of half an hour occurs around 3:00pm
    A number of practical exercises will be given and discussed throughout the five days. You are encouraged to bring a laptop computer.

    5.1  GDAT Content Day 1

    Introduction
    • Measurement
    • Simulation
    • Analytical modeling
    Performance Metrics
    • Characteristics of good metrics
    • Processor and system metrics
    • Speedup and relative change
    Measurement Tools and Techniques
    • Strategies
    • Interval timers
    • Program profiling
    • Tracing
    • Indirect measurement

    5.2  GDAT Content Day 2

    Statistical Interpretation of Data
    • What do all of these means mean?
    • Sources of measurement errors
    • Confidence intervals
    • Statistical comparison alternatives
    Design of Experiments: Part 1
    • Terminology
    • One-factor ANOVA (Analysis of Variance)
    • Two-factor ANOVA

    5.3  GDAT Content Day 3

    Design of Experiments: Part 2
    • Generalized m-factor experiments
    • Fractional factorial designs
    • Multifactorial designs
    • Plackett-Burman design matrix
    Application to Simulations
    • Types of simulations: event-based, workload simulation
    • Random number generation
    • Verification and validation

    5.4  GDAT Content Day 4

    Introduction to Statistical Analysis Tools
    • Comparison of Excel, R, SIMUL8, SimPy, Mathematica
    • Demonstration of doing statistical analysis with R
    • Handling millions of data items quickly
    • Computing statistics, graphing the results, confidence intervals
    Guided Tour of Techniques
    • ANOVA calculations
    • Plackett-Burman designs in R

    5.5  GDAT Content Day 5

    Using R to Analyze Performance Data
    • Detailed examples and case studies
    • Interfaces to SQL databases
    • Advanced R techniques for analyzing data by partitioning and processing subsets
    • Debugging your R scripts
    Advanced Techniques
    • Multivariate analysis case study
    • Data visualization techniques for performance analysis
    • Open discussion and student-specific examples

    6  Instructors

    6.1  David Lilja

    David received the Ph.D. and M.S. degrees, both in Electrical Engineering, from the University of Illinois at Urbana-Champaign, and a B.S. in Computer Engineering from Iowa State University in Ames. He is currently a Professor of Electrical and Computer Engineering, and a Fellow of the Minnesota Supercomputing Institute, at the University of Minnesota in Minneapolis. He also serves as a member of the graduate faculties in Computer Science and Scientific Computation, and was the founding Director of Graduate Studies for Computer Engineering. He has been a visiting senior engineer in the Hardware Performance Analysis group at IBM in Rochester, Minnesota, and a visiting professor at the University of Western Australia in Perth supported by a Fulbright award.
    Previously, he worked as a research assistant at the Center for Supercomputing Research and Development at the University of Illinois, and as a development engineer at Tandem Computers Incorporated (now a division of HP/Compaq) in Cupertino, California. He has served on the program committees of numerous conferences; was a distinguished visitor of the IEEE Computer Society; is a Senior member of the IEEE and a member of the ACM; and is a registered Professional Engineer. His primary research interests are in high-performance computer architecture, parallel computing, nanocomputing, hardware-software interactions, and performance analysis.

    6.2  Jim Holtman

    Jim has a BSEE from New Mexico State University and an MSEE/Comp Sci from the University of California at Berkeley. He worked at Bell Labs developing a real-time operating system for the Safeguard Anti-ballistic Missile system which was one of the first multiprocessor systems in the late 1960s. He worked on the development of operation support systems for the Bell System and was named a Bell Labs Fellow for his establishment of the architecture review process at Bell Labs.
    He then worked for Convergys consulting with various groups developing real-time billing systems for mobile carriers on their architecture and performance issues.
    He is currently retired, but is still interested in the analysis and visualization of computer performance data. He is an advocate of the R-language for analyzing data and has taught courses on R and on systems architecture/performance. In particular, he has presented on this subject at CMG.

    6.3  Neil Gunther

    Neil holds an M.Sc. in Applied Mathematics and a Ph.D. in theoretical physics. He is an internationally recognized researcher and consultant in Information Processing who founded Performance Dynamics Company in 1994. Prior to that, Dr. Gunther held research and management positions at San Jose State University, JPL/NASA (Voyager and Galileo missions), Xerox PARC and Pyramid/Siemens Technology. His classes have been given at both corporate and academic institutions including AOL, Boeing, FedEx, Melbourne University, Motorola, Nokia, Stanford University, Sun Microsystems (both USA and EU) and Thales Group (Holland).
    Dr. Gunther is the author of numerous papers as well as several books, and in 2008 he was the recipient of the A.A. Michelson Award from CMG, the industry's highest honor for computer performance analysis and capacity planning. He was also recently elected to the rank of Senior Member of the IEEE.

    7  Terms and Conditions

    Tuition Fees

    Please consult the Class Schedule page for current pricing and conditions.

    Transportation

    Information will be sent upon receipt of enrollment. A packet will include airport and transportation options.

    Reservations

    All confirmed reservations must be must be accompanied by a purchase order number, a check for the tuition, or credit card information for billing. Courtesy Reservations will be held for up to 30 days in order for paperwork to be processed so long as there is suffcient time and adequate space in thecourse.

    8  Textbooks

    A copy of the textbooks Measuring Computer Performance (Cambridge University Press, 2000), and Analyzing Computer System Performance with Perl::PDQ (Springer-Verlag, 2005), are included in the price of admission.

    Location

    Please consult the Class Schedule for hotel location details. The city of Pleasanton is right next door to Castro Valley.

    Meals

    Breakfast, lunch, morning and afternoon breaks will be catered for by the hotel each day. See the Mini Survival Guide explaining how to get to the hotel and a list of local restaurants to eat at, once you do.



    File translated from TEX by TTH, version 3.38.
    On 10 Jun 2009, 15:07.