Guerrilla Data Analytics (GDAT)\{\Large How to Get Beyond Monitoring}

Guerrilla Data Analytics (GDAT)
How to Get Beyond Monitoring

From Linear Regression to Machine Learning
Complete with a review of the R programming language and application of R statistical tools

Instructor: Dr. Neil J. Gunther
Performance Dynamics Educational Services

Data comes from the Devil. Therefore, you need to know how to waterboard it with the right statistical tools to get it to tell you the truth.


1  Purpose
2  Certification
3  Course Goals
4  Dates and Registration
5  Who Should Attend?
6  Course Outline
    6.1  GDAT Day 1
    6.2  GDAT Day 2
    6.3  GDAT Day 3
    6.4  GDAT Day 4
    6.5  GDAT Day 5
7  Registration and Materials
    7.1  Registration
    7.2  Textbook
    7.3  Location
    7.4  Meals

1  Purpose

You already understand the essential concepts of computer system capacity planning (e.g., Level II certification) and you've collected cubic light years of performance data. But now you realize that's not sufficient. Why? Because raw performance data is not the same thing as performance information. To extract the pertinent information, you need to transform your data. And that's precisely what this class teaches you.
Moreover, the data analysis techniques we present are general purpose, and therefore not tied to any particular computing platform or data collection tools.
Although there are no prerequisites, it is strongly recommended that you take the Level II GCaP class before embarking on the this Level III GDAT class.

2  Certification

This class (GDAT) corresponds to Guerrilla Capacity Planner: Level III certification, where the levels are defined as:
  1. Entry level for newbies, e.g., Guerrilla Boot Camp (GBOOT), which is usually offerred on a demand basis only. Please contact Performance Dynamics if you would like to take this Level I class.
  2. Exposure to a wide variety of computer systems capacity planning concepts, methods, and tools that can be adapted opportunistically to support the needs of enterprise-level platform-independent performance management.
  3. Detailed study of a particular capacity planning technique or performance analysis tool, e.g., Guerrilla Data Analysis Techniques (GDAT).
A printed certificate reflecting the level of achievement is awarded to each attendee at the completion of the respective course.

3  Course Goals

After completing this course, students will know how to:

4  Dates and Registration

Check out the schedule for dates and online registration.

5  Who Should Attend?

Computer system administrators, mainframe system operators, network system administrators, performance engineers, test engineers, IT consultants, data center managers, Devops, IT technical managers and software development engineers. This course does not assume any prior experience with performance analysis methods, but a working knowledge of computer systems and high school algebra is helpful.

6  Course Outline

Class typically begins at 9am and the instructor is generally available until 9pm each day.
Many class discussions have been known to continue over dinner.
A morning break of half an hour is serviced around 10:30am
Seated lunch service is provided from Noon until 1pm.
A serviced afternoon break of half an hour occurs around 3:00pm
A large number of practical exercises (with solutions in R) will be given and discussed throughout the five days. You are encouraged to bring a laptop computer.

6.1  GDAT Day 1

How to Detect Bad Data
  • All data is wrong by definition
  • Broken performance tools
  • The power of good statistical models
Introduction to R
  • Why R is de RigueuR on Wall St and elsewhere
  • My special 911.r script
  • R commands
  • R language
  • R graphics
  • Installing R
Expressing Measurement Error
  • Measurement is a process not a number
  • Confidence intervals and sigma levels
  • Confidence bands and QQ plots
  • How to express errors

6.2  GDAT Day 2

Review of Elementary Statistics
  • Descriptive statistics
  • Measures of central tendency: mean, median and mode
  • Meaning of the means: arithmetic, geometric, harmonic
  • Measures of dispersion: stdev, variance, stderr, percentiles
  • Summarizing data and its statistics
Histograms and Distributions
  • Review of Uniform, Normal, Poisson, Exponential distributions
  • How to determine normal distributions
  • How to determine exponential/Poisson distributions
  • Weighted multi-class workloads

6.3  GDAT Day 3

Regression Analysis
  • Linear regression done right
  • Hubble's bubble & the most famous scatter plot
  • Fitting and projecting
  • Examples
Multivariate and Nonlinear Regression
  • Multivariate regression
  • ANOVA: Analysis of Variance
  • Nonlinear regression
  • Moving averages

6.4  GDAT Day 4

Application Scalability Analysis
  • Load test data and QA analysis
  • Universal scalability law (USL)
  • Applying USL to production data
  • Analyzing data for scalability zones
Applying Regression Analysis to Web Traffic
  • Web server scalability
  • Web traffic profiles and time zones

6.5  GDAT Day 5

Taming the Data Torrent
  • Principal component analysis
  • Reducing the number of monitored metrics
  • Case studies: PerfViz, Apdex, Barry
Machine Learning for CaP
  • Machine learning algorithms
  • Support Vector Machines
  • Supervised learning
  • The SVM package in R
  • Detecting performance patterns and defining exceptions
Wild (Not Mild) Distributions
  • Power law data and distributions
  • Case studies: SQL access patterns, web traffic, data recovery
  • Data validation using qqplots, log-linear plots and log-log plots
Review and Class Discussion

7  Registration and Materials

7.1  Registration

All registration is now done online. Please consult the Guerrilla Training Schedule for current pricing and conditions.

7.2  Textbook

A copy of Dr. Gunther's performance analysis textbook: Guerrilla Capacity Planning (Springer-Verlag 2007) is included in the price of admission.
Sorry, no refunds or exchanges can be given if you already have a copy of the book.

7.3  Location

See the Guerrilla Training Schedule for details about the hotel location and room reservations. The city of Pleasanton is right next door to Castro Valley.

7.4  Meals

Lunch is provided each day.

File translated from TEX by TTH, version 3.81.
On 23 Jul 2018, 07:17.