Guerrilla Manual Online
Guerrilla Manual Online
Hit-and-run mantras you can use on your boss or in a tiger team meeting
Updated on Jan 14, 2012
| The Guerrilla Manifesto |
| Management resists, the guerrilla planner retreats. |
| Management dithers, the guerrilla planner proposes. |
| Management relents, the guerrilla planner promotes. |
| Management retreats, the guerrilla planner pursues. |
Booklet: This web page contains updates to The Guerrilla Manual provided as a pull-out booklet in the rear jacket of
Guerrilla Capacity Planning book.
Twitter: Many of these aphorisms
are now also encapsulated as shortened Guerrilla mantras or
GMantras
that are automatically updated daily on
Twitter.
Contents
1 WEAPONS OF MASS INSTRUCTION
1.1 Why Go Guerrilla?
1.2 Best Practices
1.3 Virtualization
1.4 An Ounce of Prevention
1.5 Why are CaP and Perf Hard?
1.6 Brisk Risk Management
1.7 Failing On Time
1.8 Performance Homunculus
1.9 Self Tuning Applications
1.10 Squeezing Capacity
1.11 When Wrong is Right
1.12 Throw More Hardware at It
1.13 Network Performance
1.14 Not even wrong!
1.15 Checked your measurements lately?
1.16 Data Are Not Divine
1.17 Busy work
1.18 Little's Law
1.19 Bigger is Not Always Better
1.20 Bottlenecks
1.21 Benchmarks
1.22 Failure to Communicate
1.23 Consolidation
1.24 Control Freaks Unite!
1.25 Productivity
1.26 Art vs. Science
1.27 ITIL for Guerrillas
1.28 Vanishing Paradox
1.29 Dumb Question
1.30 Quantum Leap
1.31 Don't Be a Sheeple
1.32 Performance Gatekeepers
1.33 Performance Analysis is a Money Sink
1.34 Tyranny of the 9s
2 PERFORMANCE MODELING RULES OF THUMB
2.1 What is Performance Modeling?
2.2 Monitoring vs. Modeling
2.3 Keep It Simple
2.4 More Like The Map Than The Metro
2.5 The Big Picture
2.6 Point of Principle
2.7 Guilt is Golden
2.8 What is a Queue?
2.9 Where to Start?
2.10 Inputs and Outputs
2.11 No Service, No Queues
2.12 Estimating Service Times
2.13 Change the Data
2.14 Closed or Open Queue?
2.15 Opening a Closed Queue
2.16 Steady-State Measurements
2.17 Transcribing Data
2.18 Workloads Come in Threes
2.19 Better Than a Crystal Ball
2.20 Patterns and Anti-Patterns
2.21 Interpreting Data
2.22 Intuition and Modeling
2.23 Load Average
2.24 VAMOOS Your Data Analysis Hesitations
2.25 Measurement Errors
2.26 Modeling Errors
2.27 Data Ain't Information
2.28 Data Science
3 UNIVERSAL SCALABILITY LAW (USL)
1 WEAPONS OF MASS INSTRUCTION
1.1 Why Go Guerrilla?
The planning horizon is now 3 months, thanks to the gnomes on Wall
Street. Only
Guerrilla-style
tactical planning is crazy enough to
be compatible with that kind of insanity.
1.2 Best Practices
Best practices are an admission of failure.
Failure to understand the particular requirements for CaP and Perf in a given context.
If you don't understand, you're living on borrowed time.
Copying someone else's apparent success is like cheating on a test.
You may make the grade this time, but how far will the bluff take you
into the future?
Best practices are often derived from financial accounting attempts to cut costs by removing (ignoring)
variance in the relevant processes and prescribing a one size fits all solution: even though it's
not described that way. It's always interesting to compare the typical multiplicity of bests in best practices.
A recent NPR program discussed how this approach has backfired in the arena of medicine and healthcare.
The analogy is appropriate because many aspects of performance analysis are not too different from the
science of medical diagnosis.
See the section on
The Limits Of Best Practices.
1.3 Virtualization
All virtualization is about illusions and
Voltaire said:
"Illusion is the first of all pleasures."
However, when it comes to IT, even if it provides a more pleasurable experience to
perpetrate illusions onto a user, it is not ok to foist them on the performance analyst or capacity planner.
Translation: We performance weenies need more whistles and less bells.
In other words, virtualization vendors need to make sure they provide us with backdoors and peepholes so we
can measure how resources are actually being consumed in a virtualized environment.
Corollary: Can you say, transparency?
It's better for the IT support of business if we can manage it properly.
To manage it, it can't be illusory.
1.4 An Ounce of Prevention
Capacity management is about prevention. But someone once told me "You
can't sell prevention!"; the implication being that an ounce of
prevention is worthless.
Then, explain the multi-billion dollar dietary-supplements industry?
It's not what you sell, but how you sell it.
1.5 Why are CaP and Perf Hard?
Both peformance analysis and capacity planning are complicated by your brain thinking linearly
about a computer system that operates nonlinearly.
Looked at another way, collecting and analyzing performance metrics is very important, but understanding
the relationship between those metics is vital. And, of course, those relationships are
nonlinear. That's why we rely on tools like
queueing models
and the universal scalability law,
because they encode the correct nonlinearities for us.
1.6 Brisk Risk Management
BRisk management, isn't.
Perceived risk (psychology) and managed risk (analysis) are not the same thing.
Here's an actual example of (mis)perceived risk:
"I can understand people being worked up about safety and quality with
the welds," said Steve Heminger, executive director ... "But we're
concerned about being on schedule because we are racing against the next
earthquake."
This is a direct quote from a
Caltrans executive manager for the
new Bay Bridge
construction between Oakland and San Francisco. He is saying that Caltrans management decided to ignore
the independent assessement of the welding quality in order to stay on schedule. Yikes! See mantra 1.7.
Although he is not an IT manager, the point about BRisk management is the same.
You can read more background on this topic on
my blog.
1.7 Failing On Time
Management will often let a project fail—as long as it fails on time!
Until you read and heed this statement, you will probably have
a very frustrating time getting your perforance analysis conclusions
across to managment.
See mantra 1.6. There,
a section of the upper deck of the current Bay Bridge collapsed during the Loma Prieta earthquake of 1989.
Now, the Caltrans manager is watching the clock and
concluding that it's better to increase the risk that the new bridge will fail (by being brisk about
weld inspections), in order to beat the much lower risk that the current
bridge might fail again in some unpredictable future quake. Substitute your favorite IT project,
product or application for the word bridge and you get the idea.
1.8 Performance Homunculus
A list of system management activities might include such things as:
- Security management
- Software distribution
- Backup management
- Chargeback management
- Cap and Perf management
Of these, all but capacity management has some kind of shrink-wrap or COTS solution.
Capacity and performance management cannot be treated as just another bullet item on a list of things to do.
Cap and Perf management is to systems management as the
homunculus
(sensory proportion) is to the human body (geometric proportion).
Cap and Perf management can rightly be regarded as just a subset of
systems management, but the infrastructure requirements for successful
capacity planning (both the tools and knowledgeable humans to use them)
are necessarily out of proportion with the requirements for simpler systems
management tasks like software distribution, security, backup, etc. It's
self-defeating to try doing capacity planning on the cheap.
1.9 Self Tuning Applications
Self-tuning applications are not ready for prime time.
How can they be when human performance experts get it wrong all the time!?
Think about it.
Performance analysis is a lot like a medical examination, and
medical Expert Systems were heavily touted in the mid 1980's.
You don't hear about them anymore. And you know that if it worked,
HMO's would be all over it.
It's a laudable goal but if you lose your job, it won't be because of
some expert performance robot.
1.10 Squeezing Capacity
Capacity planning is not just about the future anymore.
Today, there
is a serious need to squeeze more out of your current capital equipment.
1.11 When Wrong is Right
Capacity planning is about setting expectations.
Even wrong expectations are better than no expectations!
The planning part of capacity planning requires making predictions. Even a wrong prediction is
useful because it can serve as a warning that either:
- the understanding that underlies your prediction is wrong
- the measurement process is broken and is producing wrong data
Either way, something needs to be corrected, but you wouldn't realize that without making a prediction in the first place.
If you aren't iteratively correcting predictions throughout a project
life-cycle, you will only know things are amiss when it's too late! GCaP says you can do better than that.
1.12 Throw More Hardware at It
The classic over-engineering gotcha. Hardware is certainly cheaper today, but a boat-load of cheap PCs from Chins
won't help one iota if the application runs
single-threaded.
Single-threadedness can wreck you
This is now my canonical response to the oft-heard platitude: "We don't
need no stinkin' capacity planning, we'll just throw more hardware at it."
The capacity part is easy. It's the planning part that requires
brain power.
1.13 Network Performance
It's never the network! [ Although, it might be the network admin. :-) ]
If the network is out of bandwidth, has interminable latencies or is otherwise glitching, fix
it! Then we'll talk about the performance of your application.
1.14 Not even wrong!
Here is a plot of benchmarked round-trip times
(RTT) for a set of applications as a function of increasing user load (clients). Take a good, long look.
If your application has concave response times like these... SHIP IT!
In case you're wondering, those are REAL data and yes, the axes are
correctly labeled. I'll let you ponder what is wrong with these measurements.
Here's a hint: They're so broken, they're not even wrong!
Only if you don't understand basic
queueing theory,
would you accept measurements like these (which the original performance engineer did).
1.15 Checked your measurements lately?
When I'm asked, "But, how accurate are your performance models?" my
canonical response is, "Well, how accurate are your performance DATA!?"
Most people remain blissfully unaware of the fact that ALL
measurements come with errors; both systematic and random. An important
capacity planning task is to determine and track the magnitude of the
errors in your performance data. Every datum should come with a `±'
attached (which will then force you to put a number after it).
1.16 Data Are Not Divine
Treating performance data as something divine is a sin.
Data comes from the Devil, only models come from God.
Corollary:
That means it's helpful to be able to talk to God. But God, she does babel, sometimes. :)
1.17 Busy work
Busy work does not the truth maketh.
Western culture too often glorifies hours clocked as productive work. If
you don't take time off to come up for air and reflect on what you're
doing, how are you going to know when you're wrong?
1.18 Little's Law
Little's law means a lot!
However, I must say I don't really like the notation on that
Wikipedia page
but, more importantly, it also fails to point out that there are really two versions of Little's Law:
which relates the average queue length (Q) to the residence time R = W + S.
Here, W is the average time you spend waiting in line to get your groceries rung up, for example,
and S is the average service time it takes to ring up your groceries once you get to the cashier.
The other version of Little's Law is:
and often goes by the name Utilization law.
It relates the average utilization (ρ), e.g., of the cashier, to the service time (S).
Equation (2) is derived from (1) by simply setting W = 0 on the right-hand side of the equation.
In both equations, the left-hand side is a pure number, i.e., it has no formal units (% is not a unit).
It is important to realize that eqns.(1) and (2) are really variants of the same law: Little's law.
Here's why:
- Eqn.(1) tells us the average number of customers or requests in residence.
- Eqn.(2) tells us the average number of customers or requests in service.
That second interpretation of utilization can be very important for performance analysis but
is often missed in textbooks and elsewhere (including Wikipedia pages).
You should learn Little's law (both versions) by heart.
I use it almost daily to cross-check that throughput and
delay data are consistent, no matter whether those data come from
measurements or models. More details about Little's law can be found in
Analyzing Computer System Performance with Perl::PDQ.
Another use of Little's law is calculating service times, which are
notoriously difficult to measure directly. See also Mantras 2.10 and 2.12.
And here's the lore behind Little's Law.
1.19 Bigger is Not Always Better
Beware the
SMP wall!
The bigger the symmetric multiprocessor (SMP) configuration
you purchase, the busier you need to run it. But only to the
point where the average run-queue begins to grow. Any busier and the
user's response time will rapidly start to climb through the roof.
1.20 Bottlenecks
You never remove a bottleneck, you just move it.
1.21 Benchmarks
All competitive benchmarking is institutionalized cheating.
The purpose of competitive benchmarking a computer system
is to beat everyone else on performance, so you can say "mine is bigger than yours." It's the IT equivalent of war!
Benchmark run-rules were made to be broken or at least bent; just don't get caught.
This must be true because industrial benchmark organizations like SPEC.org
and TPC.org have technical review committees that look for
cheating in submitted results.
For capacity planning and system sizing, you need to be aware of this fact of life
and look for the loopholes in published benchmark results.
Here, competitive refers to industrial benchmarks,
as opposed to benchmarking that you might do for purely internal comparisons
or diagnostic purposes.
1.22 Failure to Communicate
You should spend as much time developing the presentation of your capacity planning results as you did reaching them.
If your audience is missing the point or you don't really have one
because you didn't spend enough time on developing it, you just wasted a
lot more than your presentation time-slot.
1.23 Consolidation
Gunther's law of consolidation: Remove it and they will come!
1.24 Control Freaks Unite!
Your own applications are the last refuge of performance engineering.
Control over the performance of hardware resources e.g., CPUs and
disks, is progressively being eroded as these things simply become
commodity black boxes viz., multicore processors and disk arrays. This
situation will only be exacerbated with the advent of Internet-based
application services. Software developers will therefore have to
understand more about the performance and capacity planning implications
of their designs running on these black boxes.
(See Sect. 3)
1.25 Productivity
If you want to be more productive, go to sleep.
Thanks to the Puritans presumably, American corporate culture is
obsessed with the false notion that being busy is being productive. Wrong!
Europeans (especially the Mediterraneans) understand the
power of the cat nap. After nearly 400 years, it's time for America to get over it.
1.26 Art vs. Science
When it comes to the art of performance analysis and capacity planning, the art is in the science.
A number of recent books and presentations
on performance analysis and capacity planning have appeared with "The Art of..."
in the title.
In itself, this is not new. The title of Raj Jain's excellent 1991 book is
The Art of Computer Systems Performance Analysis.
Nonetheless, they all resort to various scientific techniques to analyze performance data.
The application of science inevitably involves some art.
1.27 ITIL for Guerrillas
Q: What goes in the ITIL box:
Business Metrics → Service Delivery → Service Level Management?
A: Guerrilla capacity planning (GCaP).
The ITIL framework is all about defining IT processes to satisfy
business needs, not their implementation.
That's what makes
GCaP training
an excellent IT-business solution.
1.28 Vanishing Paradox
Almost by definition, the activity of performance analysis contains a hidden paradox:
If you do performance management perfectly, you run the risk of becoming
invisible and therefore expendable to your management.
In other words, having successfully supported performance management,
a manager could eventually feel justified in asking:
"Why is my budget paying for performance management when everything is performing perfectly?"
AKA a career-limiter.
Compare this with software development, for example. If a developer does their job
perfectly, they risk being overburdened with more work than you can handle
(AKA job security). In contrast to performance management, a manager might be heard to
say: "We must have this new functionality in the next release!"
1.29 Dumb Question
The only dumb question is the one never asked.
1.30 Quantum Leap
A quantum leap is neither. It can't be both quantal (the correct adjective) and a leap. So, it's an oxymoron.
If it were quantal, it would be infinitesimal (on the order of 10−10 meters)
and therefore not observable by us. If the leap were of the regular observable variety,
it could not have a quantum magnitude.
Quantum transitions in energy are only associated with the discrete spectrum of
atomic or molecular
bound states.
Try to avoid communication nonsensica (GMantra 1.31).
1.31 Don't Be a Sheeple
Quantum leap (GMantra 1.30) is right up there with other moronic phrases like "sea change" (what IS that?) and
"moving forward"—who draws attention to moving backwards?
That last one was tweeted and ended up as entry #136 in David Pogue's
Twitter Book:
My most recent favorite is this one.
WAYNE SWAN (politician):
``I will not rule anything in or rule anything out.''
Ruling out, I get: take a ruler and draw a line through it. But how do you rule something in!?
And Mr. Swan didn't invent it. He's just mindlessly repeating it because he heard other boffins say it,
and I'm quoting him because it was captured in a transcript. (Double jeopardy)
Good communication, which is vital for good performance analysis and capacity planning, requires that
you be a
shepherd, not a
sheeple.
Don't use nonsensical phrases just because everyone else does.
Besides, it makes you sound like an idiot ... or worse: a politician.
1.32 Performance Gatekeepers
Performance analysis is too complex and important to be left to enthusiastic individuals.
Performance specialists should act as gatekeepers.
A common situation in big organizations is for various groups to be responsible for the performance evaluation of
the software or hardware widgets they create. In principle, this is a good thing, but there is a downside.
Over-zealous performance optimization of any subsystem can deoptimize the entire system.
To avoid this side-effect, a separate central group should be responsible for oversight of total system performance.
They should act as both reviewers and gatekeepers for the performance analyses produced by all the other groups in the
organization.
1.33 Performance Analysis is a Money Sink
There is a serious misconception that precautions like
security management are part of the cost of doing business, but performance analysis actually costs business.
In other words, performance anything is perceived as a cost center, or money down the drain.
Remember the
performance homunculus in Section 1.8.
Unfortunately, there is some justification for this view. Performance acitivies like:
- Performance evaluation
- Performance by design
- Performance engineering
- Performance testing
can be seen as inflating schedules and therefore delaying expected revenue. See section 1.7.
Moreover, there can be an incentive to charge for the "performance upgrade" further down the line.
Better to be aware of these perceptions than be left wondering why your performance initiatives are
not being well received by management.
1.34 Tyranny of the 9s
You've no doubt heard of the
Tyranny of the 9s,
but how about subjugation to the sigmas?
| Nines | Percent | Downtime/Year | σ Level |
| 4 | 99.99% | 52.596 minutes | 4σ |
| 5 | 99.999% | 5.2596 minutes | - |
| 6 | 99.9999% | 31.5576 seconds | 5σ |
| 7 | 99.99999% | 3.15576 seconds | - |
| 8 | 99.999999% | 315.6 milliseconds | 6σ |
The following R function will do the calculations for you.
downt <- function(nines,tunit=c('s','m','h')) {
ds <- 10^(-nines) * 365.25*24*60*60
if(tunit == 's') { ts <- 1; tu <- "seconds" }
if(tunit == 'm') { ts <- 60; tu <- "minutes" }
if(tunit == 'h') { ts <- 3600; tu <- "hours" }
return(sprintf("Downtime per year at %d nines: %g %s", nines, ds/ts,tu))
}
> downt(5,'m')
[1] "Downtime per year at 5 nines: 5.2596 minutes"
> downt(8,'s')
[1] "Downtime per year at 8 nines: 0.315576 seconds"
6σ is the "black belt" level that many companies
aspire to. The associated σ levels correspond to the area contained under the standard normal (or "bell shaped") curve
within that σ interval about the mean. It can be computed using the following R function:
library(NORMT3)
sigp <- function(sigma) {
sigma <- as.integer(sigma)
apc <- erf(sigma/sqrt(2))
return(sprintf("%d-sigma bell area: %10.8f%%; Prob(chance): %e",
sigma, apc*100, 1-apc))
}
> sigp(2)
[1] "2-sigma bell area: 95.44997361%; Prob(chance): 4.550026e-02"
> sigp(5)
[1] "5-sigma bell area: 99.99994267%; Prob(chance): 5.733031e-07"
So, 5σ corresponds to slightly more than 99.9999% of the area under in the bell curve; the total area being 100%.
It also corresponds closely to six 9s availability.
The last number is the probability that you happened to achieve that availability by random luck or pure chance.
A reasonable mnemonic for some of these values is:
- 3σ corresponds roughly to a probability of 1 in 1,000 that four 9s availability occurred by chance.
- 5σ is roughly a 1 in a million chance, which is the same as flipping a fair coin and getting 20 heads in a row.
- 6σ is roughly a 1 in a billion chance.
2 PERFORMANCE MODELING RULES OF THUMB
Here are some ideas that might be of help when you're trying to
construct your capacity planning or performance analysis models.
2.1 What is Performance Modeling?
All modeling is programming and all programming is debugging.
Similarly seen on Twitter:
"90% of coding is debugging. The other 10% is writing bugs."
2.2 Monitoring vs. Modeling
The difference between performance modeling and performance monitoring
is like the difference between weather prediction and simply watching a weather-vane twist in the wind.
2.3 Keep It Simple
Nothing like jumping into the pool at the deep end. Just don't forget your swimming togs in the excitement.
To paraphrase Einstein:
A performance model should be as simple as possible, but no simpler!
Someone else said:
"A designer knows that he has achieved perfection not when there is nothing left to add, but when
there is nothing left to take away." —Antoine de St-Expurey
I now tell people in my
Guerrilla classes,
despite the fact that I repeat this rule of thumb several times, you will
throw the kitchen sink into your performance models; at least, early on as you
first learn how to create them. It's
almost axiomatic: the more you know about the system architecture, the
more detail you will try to throw into the model. The goal, in fact, is the
opposite.
2.4 More Like The Map Than The Metro
A performance model is to a computer system as the
London Tube map
is to the London underground railway.
The Tube map is pure abstraction that has very little to do with the
physical railway system. It encodes only sufficient detail to enable
transit on the underground from point A to point B. It does not include
a lot of irrelevant details such as altitude of the stations, or even
their actual geographical proximity. A performance model is a similar
kind of abstraction.
Despite several attempts, the original Tube map has hardly been
improved upon since its conception in 1933. Apparently, it already met
the requirement of being as simple as possible, but no simpler. The fact
that it was designed by an electrical draughtsman, probably helped.
2.5 The Big Picture
Unlike most aspects of computer technology, performance modeling is
about deciding how much detail can be ignored!
2.6 Point of Principle
When trying to construct the performance representation of a computer system
(which may or may not be a queueing model), look for the principle of
operation. If you can't describe the principle of operation in 25 words or
less, you probably don't understand it yet.
As an example, consider a time-share computer
system. Its principle of operation can be stated as follows. Time-share scheduling gives every user the illusion
that they are the only user on the system. All the thousands of
lines of code in the operating system that implement time-slicing,
priority queues, and so forth, are there merely to support that illusion.
2.7 Guilt is Golden
Performance modeling is also about spreading the guilt around.
You, as the performance analyst or planner, only have to shine the
light in the right place and then stand back while others flock to fix it.
2.8 What is a Queue?
- A queue is a line of customers waiting to be severed—as in
"
Off with their heads!" (*)
- Hardware version: Think of a queue as a register,
e.g., a memory register.
- Software version: Think of a queue as a list. In some languages it is a data type, e.g., Lisp, Mathematica, Perl, etc.
(*) I did mistakenly say "severed" while discussing queues during a Guerrilla class in November, 2002.
In Chapter 2 Getting the Jump on Queueing and Appendix B A Short History of Buffers
of my Perl::PDQ book,
I point out that a queue is a very appropriate paradigm for understanding the performance of
computer systems because it corresponds to a
data buffer.
Since all digital computer and network systems can be considered as a collection of buffers,
their performance can be modeled as a collection of queues, aka: queueing network models, where the word
"network" here means circuit; like an electric circuit.
PDQ (Pretty Damn Quick)
helps you to sever computer systems with queues.
Queueing theory is a relatively young science, having just
turned 100 in 2009.
2.9 Where to Start?
Why not have fun with blocks—functional blocks!
One place to start constructing a PDQ model is by drawing a
functional block diagram. The objective is to identify where time
is spent at each stage in processing the workload of interest.
Ultimately, each functional block is converted to a queueing subsystem
like those shown above. This includes the ability to distinguish
sequential and parallel processing. Other diagrammatic techniques e.g.,
UML diagrams, may also be useful but I don't understand that stuff and never tried it.
See Chap. 6
"Pretty Damn Quick(PDQ) - A Slow Introduction" of
Analyzing Computer System Performance with Perl::PDQ.
2.10 Inputs and Outputs
When defining performance models (especially queueing models), it helps
to write down a list of INPUTS (measurements or estimates that are used
to parameterize the model) and OUTPUTS (numbers that are generated by
calculating the model).
Take Little's law Q = X R for example. It is a performance model;
albeit a simple equation or operational law, but a model nonetheless. All the
variables on the RIGHT side of the equation (X and R) are INPUTS, and
the single variable on the LEFT is the OUTPUT.
A more detailed discussion of this point is presented in Chap. 6
"Pretty Damn Quick(PDQ) - A Slow Introduction" of
Analyzing Computer System Performance with Perl::PDQ.
2.11 No Service, No Queues
You know the restaurant rule: "No shoes, no service!"
Well, this is the PDQ rule: no service (time), no queues.
In your
PDQ models,
there is no point creating more queueing nodes than you have measured
service times for.
If the measurements of the real system do not include the
service time for a queueing node that you think ought to be in your PDQ
model, then that PDQ node cannot be defined.
2.12 Estimating Service Times
Service times are notoriously difficult to measure directly. Often,
however, the service time can be calculated from other performance
metrics that are easier to measure.
Suppose, for example, you had requests coming into an HTTP server
and you could measure its CPU utilization with some UNIX tool like
vmstat, and you would like to know the service time of the HTTP Gets.
UNIX won't tell you, but you can use Little's law (U = X S) to figure
it out. If you can measure the arrival rate of requests in Gets/sec
(X) and the CPU %utilization (U), then the average service time
(S) for a Get is easily calculated from the quotient U/X.
2.13 Change the Data
If the measurements don't support your PDQ performance model, change
the measurements.
2.14 Closed or Open Queue?
When trying to figure out which queueing model to apply, ask
yourself if you have a finite number of requests to service or not. If
the answer is yes (as it would be for a load-test platform), then it's a
closed queueing model. Otherwise use an open queueing model.
2.15 Opening a Closed Queue
How do I determine when a closed queueing model can be
replaced by an open model?
This important question arises, for example, when you want
to extrapolate performance predictions for an Internet application (open)
that are based on measurements from a load-test platform (closed).
An open queueing model
assumes an infinite population of requesters initiating requests at
an arrival rate λ (lambda). In a closed model, λ (lambda)
is approximated by the ratio N/Z.
Treat the thinktime Z as a free parameter, and choose a value (by
trial and error) that keeps N/Z constant as you make N larger in
your PDQ model. Eventually, at some value of N, the OUTPUTS of both the
closed and open models will agree to some reasonable approximation.
2.16 Steady-State Measurements
The steady-state measurement period should on the order of 100 times
larger than the largest service time.
2.17 Transcribing Data
Use the timebase of your measurement tools. If it reports in seconds,
use seconds, if it reports in microseconds, use microseconds. The point
being, it's easier to check the digits directly for any transcription
errors. Of course, the units of ALL numbers should be normalized before
doing any arithmetic.
2.18 Workloads Come in Threes
In a mixed workload model (multi-class streams in PDQ), avoid using more
than 3 concurrent workstreams whenever possible.
Apart from making an unwieldy PDQ report to read, generally you are
only interested in the interaction of 2 workloads (pairwise comparison).
Everything else goes in the third (AKA "the background"). If you can't
see how to do this, you're probably not ready to create the PDQ model.
2.19 Better Than a Crystal Ball
A performance model is not clairvoyant, but it's better than a crystal ball; which is just a worthless piece of glass.
Predicting the future is not the same thing as "seeing" the future.
A performance model is just a means for evaluating the data that are provided to it. The model
transforms those data into information. The information that can be
extracted is intimately dependent on the values of those data—change
the data and you change the information. For example:
- Garbage in, garbage out
- Unexpected behaviour
In case 2 the new data are not meeting the expectations set by previous meaurements.
But it's the system that is "wrong," not the model, because the system has failed to follow the initial trend
contained in the earlier measurements. The performance model forces you
to ask why that happened and can anything be done to improve system
performance. On the other hand, without a performance model, you
don't have any context for such questions.
2.20 Patterns and Anti-Patterns
All meaning has a pattern, but not all patterns have a meaning.
- Visual example:
-
- Textual example:
-
Colorless green ideas sleep furiously. —N. Chomsky (1957)
New research indicates that if a person is not in control of the situation,
they are more likely to see patterns where none exist, suffer illusions and believe
in conspiracy theories.
In the context of computer performance analysis, the same conclusion might well apply when
looking at data
collected from a system that you don't understand.
2.21 Interpreting Data
Performance modeling can often be more important for interpretating data than predicting it.
The conventional view of performance models is that they are useful for:
- Predicting the future performance of an extant system
- Exploring what-if scenarios that may or may not be realistic
A performance model can also be used for interpretating performance measurements.
Both the measurements and the model must be consistent or something is wrong and needs to be explained.
2.22 Intuition and Modeling
Intuition is a seductive siren who will let you crash on the rocks of
misunderstanding so, better to tie yourself to the mast of math.
Ulysses is the the Greek hero in Homer's Odyssey.
On his way back from the Trojan wars, Ulysses orders his men to tie him to the
mast of his ship
and to plug their own ears so that they will not succumb to the beautiful
song of the sirens and be diverted to their deaths. Ulysses, being a typical manager, chooses to be
bound and to keep his ears unplugged because he cannot bear the idea of not
hearing the sirens' music.
2.23 Load Average
The load average in UNIX and Linux is not your average kind of average.
It's actually an exponentially damped moving average of the type commonly used in data smoothing.
It's also not particularly useful as a performance or capacity metric because it's an absolute number
(see item 3).
For a more complete discussion, see:
- See Chap. 4 of my Perl::PDQ book
- Read the original online articles
- How to convert the absolute load average to the relative
stretch factor
metric
2.24 VAMOOS Your Data Analysis Hesitations
It's easy to get carried away and jump the gun trying to model your Perf or CaP data ... and get it wrong. :/
Instead, try to follow these basic steps:
- Visualize:
Make a plot of your data without making any assumptions about how it should look.
This is where tools like scatter plots come in.
- Analyze:
Look for patterns or other significant features in the data and possibly quantify them, e.g.,
trends in the distribution of data points or periodically repeating features, such as spikes or peaks.
- Modelize: Consider different types of models:
SWAG,
statistical regression, queue-theoretic, simulation, etc. If you are using the
Chart>Add Trendline feature in Excel, this is where you choose your model from the Excel dialog box.
Don't fret over whether or not it's the "right" choice: there's no such thing, at this point.
Whatever your choice, try to make it consistent with steps 1 and 2. If it doesn't work out it doesn't matter because,
based on the next step, you're about to come around again. :)
- Over and Over: None of this is likely to converge in a single pass.
- Satisfied: Iterate your way to success. Repeat until you are satisfied that all your assumptions are consistent.
In summary,
VAMOOS: Visualize, Analyze, Modelize, Over and Over until Satisfied
your data analysis hesitations.
2.25 Measurement Errors
All measurements are wrong by
definition.
Unlike mathematics, measurement is a process (an experimental procedure) and therefore cannot
produce pure numbers like the prime numbers. The measurement process always
produces errors. The real question is, how big are those errors and can you tolerate them? It's a
bit of an indictment that most of our performance analysis tools do not display errors, and that
leads to people to think that 23 percent CPU busy is the same as 23 the 9th prime.
It should read something like: 23±5% to remind us that there is an associated error range.
2.26 Modeling Errors
All performance models are wrong, but some are wronger than others.
Predicting the future is not the same thing as "seeing" the future, in the sense of seeing what things
might look like.
See GMantra 2.19 for more on this point.
All predictions are merely estimates based on (input) data (or other estimates)
and therefore predictions come with errors. The only real question is, how much error can you tolerate?
2.27 Data Ain't Information
Data is not (are not) the same thing as information.
A Google search returns a lot of data, not information. Google admits that:
Are you feeling lucky?
It should be a certainty, but it's not.
Sneakily, Google also knows that your brain will quickly decide what is information in all those links,
such that you don't even realize you're doing most of the work which you unconsciously attribute to Google.
They know
your brain craves patterns.
Collecting performance data is only one half the story: rather like a Google search. You still have to decide
what information (if any) is contained in all those data.
Unlike a Google search, however, performance data are not simple text (simple for your brain, that is).
Performance data usually come in the form of a torrent of numbers which, unlike text, are not simple for your bain to comprehend.
Even worse, without doing the proper analysis, the data can be deceptive and lead you to the wrong conclusion.
That's where performance analysis tools and models come in. They act as
transformers on the data to help your
brain decide what is information.
2.28 Data Science
Data science. Pfft!
Anything that has to call itself a "science," usually isn't. (Social Science?)
How about
Information Science?
Pretty soon we'll be talking about Information Technology (IT). Oh, wait!...
3 UNIVERSAL SCALABILITY LAW (USL)
This section has grown to the point where it now has its
own page
updated with all the most recent developments.
File translated from
TEX
by
TTH,
version 3.38.
On 14 Jan 2012, 11:01.