Notes and Topics on Statistics

Faraz_1984 · #1 Monday, August 04, 2008

Definitions

Statistics
Collection of methods for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions.
Variable
Characteristic or attribute that can assume different values
Random Variable
A variable whose values are determined by chance.
Population
All subjects possessing a common characteristic that is being studied.
Sample
A subgroup or subset of the population.
Parameter
Ch
aracteristic or measure obtained from a population.
Statistic (not to be confused with Statistics)
Characteristic or measure obtained from a sample.
Descriptive Statistics
Collection, organization, summarization, and presentation of data.
Inferential Statistics
Generalizing from samples to populations using probabilities. Performing hypothesis testing, determining relationships between variables, and making predictions.
Qualitative Variables
Variables which assume non-numerical values.
Quantitative Variables
Variables which assume numerical values.
Discrete Variables
Variables which assume a finite or countable number of possible values. Usually obtained by counting.
Continuous Variables
Variables which assume an infinite number of possible values. Usually
obtained by measurement.
Nominal Level
Level of measurement which classifies data into mutually exclusive, all inclusive categories in which no order or ranking can be imposed on the data.
Ordinal Level
Level of measurement which classifies data into categories that can be ranked. Differences between the ranks do not exist.
Interval Level
Level of measurement which classifies data that can be ranked and differences are meaningful. However, there is no meaningful zero, so ratios are meaningless.
Ratio Level
Level of measurement which classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist between the different units of measure.
Random Sampling
Sampling in which the data is collected using chance methods or random numbers.
Systematic Sampling
Sampling in which data is obtained by selecting every kth object.
Convenience Sampling
Sampling in which data is which is readily available is used.
Stratified Sampling
Sampling in which the population is divided into groups (called strata) according to some characteristic. Each of these strata is then sampled using one of the other sampling techniques.
Cluster Sampling
Sampling in which the population is divided into groups (usually geographically). Some of these groups are randomly selected, and then all of the elements in those groups are selected.

Faraz_1984 · #2 Monday, August 04, 2008

Population vs Sample
The population includes all objects of interest whereas the sample is only a portion of the population. Parameters are associated with populations and statistics with samples. Parameters are usually denoted using Greek letters (mu, sigma) while statistics are usually denoted using Roman letters (x, s).
There are several reasons why we don't work with populations. They are usually large, and it is often impossible to get data for every object we're studying. Sampling does not usually occur without cost, and the more items surveyed, the larger the cost.
We compute statistics, and use them to estimate parameters. The computation is the first part of the statistics course (Descriptive Statistics) and the estimation is the second part (Inferential Statistics)
Discrete vs Continuous
Discrete variables are usually obtained by counting. There are a finite or countable number of choices available with discrete data. You can't have 2.63 people in the room.
Continuous variables are usually obtained by measuring. Length, weight, and time are all examples of continous variables. Since continuous variables are real numbers, we usually round them. This implies a boundary depending on the number of decimal places. For example: 64 is really anything 63.5 <= x < 64.5. Likewise, if there are two decimal places, then 64.03 is really anything 63.025 <= x < 63.035. Boundaries always have one more decimal place than the data and end in a 5.
Levels of Measurement
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from lowest level to highest level. Data is classified according to the highest level which it fits. Each additional level adds something the previous level didn't have.
• Nominal is the lowest level. Only names are meaningful here.
• Ordinal adds an order to the names.
• Interval adds meaningful differences
• Ratio adds a zero so that ratios are meaningful.
Types of Sampling
There are five types of sampling: Random, Systematic, Convenience, Cluster, and Stratified.
• Random sampling is analogous to putting everyone's name into a hat and drawing out several names. Each element in the population has an equal chance of occurring. While this is the preferred way of sampling, it is often difficult to do. It requires that a complete list of every element in the population be obtained. Computer generated lists are often used with random sampling. You can generate random numbers using the TI82 calculator.
• Systematic sampling is easier to do than random sampling. In systematic sampling, the list of elements is "counted off". That is, every kth element is taken. This is similar to lining everyone up and numbering off "1,2,3,4; 1,2,3,4; etc". When done numbering, all people numbered 4 would be used.
• Convenience sampling is very easy to do, but it's probably the worst technique to use. In convenience sampling, readily available data is used. That is, the first people the surveyor run into.
• Cluster sampling is accomplished by dividing the population into groups -- usually geographically. These groups are called clusters or blocks. The clusters are randomly selected, and each element in the selected clusters are used.
• Stratified sampling also divides the population into groups called strata. However, this time it is by some characteristic, not geographically. For instance, the population might be separated into males and females. A sample is taken from each of these strata using either random, systematic, or convenience sampling.

Faraz_1984 · #3 Monday, August 04, 2008

Frequency Distributions & Graphs
________________________________________

Definitions
Raw Data
Data collected in original form.
Frequency
The number of times a certain value or class of values occurs.
Frequency Distribution
The organization of raw data in table form with classes and frequencies.
Categorical Frequency Distribution
A frequency distribution in which the data is only nominal or ordinal.
Ungrouped Frequency Distribution
A frequency distribution of numerical data. The raw data is not grouped.
Grouped Frequency Distribution
A frequency distribution where several numbers are grouped into one class.
Class Limits
Separate one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between the upper limit of one class and the lower limit of the next.
Class Boundaries
Separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to the upper class limit.
Class Width
The difference between the upper and lower boundaries of any class. The class width is also the difference between the lower limits of two consecutive classes or the upper limits of two consecutive classes. It is not the difference between the upper and lower limits of the same class.
Class Mark (Midpoint)
The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found by adding the upper and lower boundaries and dividing by two.
Cumulative Frequency
The number of values less than the upper class boundary for the current class. This is a running total of the frequencies.
Relative Frequency
The frequency divided by the total frequency. This gives the percent of values falling in that class.
Cumulative Relative Frequency (Relative Cumulative Frequency)
The running total of the relative frequencies or the cumulative frequency divided by the total frequency. Gives the percent of the values which are less than the upper class boundary.
Histogram
A graph which displays the data by using vertical bars of various heights to represent frequencies. The horizontal axis can be either the class boundaries, the class marks, or the class limits.
Frequency Polygon
A line graph. The frequency is placed along the vertical axis and the class midpoints are placed along the horizontal axis. These points are connected with lines.
Ogive
A frequency polygon of the cumulative frequency or the relative cumulative frequency. The vertical axis the cumulative frequency or relative cumulative frequency. The horizontal axis is the class boundaries. The graph always starts at zero at the lowest class boundary and will end up at the total frequency (for a cumulative frequency) or 1.00 (for a relative cumulative frequency).
Pareto Chart
A bar graph for qualitative data with the bars arranged according to frequency.
Pie Chart
Graphical depiction of data as slices of a pie. The frequency determines the size of the slice. The number of degrees in any slice is the relative frequency times 360 degrees.
Pictograph
A graph that uses pictures to represent data.
Stem and Leaf Plot
A data plot which uses part of the data value as the stem and the rest of the data value (the leaf) to form groups or classes. This is very useful for sorting data quickly.

Faraz_1984 · #4 Monday, August 04, 2008

Grouped Frequency Distributions
________________________________________
Guidelines for classes
1. There should be between 5 and 20 classes.
2. The class width should be an odd number. This will guarantee that the class midpoints are integers instead of decimals.
3. The classes must be mutually exclusive. This means that no data value can fall into two different classes
4. The classes must be all inclusive or exhaustive. This means that all data values must be included.
5. The classes must be continuous. There are no gaps in a frequency distribution. Classes that have no values in them must be included (unless it's the first or last class which are dropped).
6. The classes must be equal in width. The exception here is the first or last class. It is possible to have an "below ..." or "... and above" class. This is often used with ages.
Creating a Grouped Frequency Distribution
1. Find the largest and smallest values
2. Compute the Range = Maximum - Minimum
3. Select the number of classes desired. This is usually between 5 and 20.
4. Find the class width by dividing the range by the number of classes and rounding up. There are two things to be careful of here. You must round up, not off. Normally 3.2 would round to be 3, but in rounding up, it becomes 4. If the range divided by the number of classes gives an integer value (no remainder), then you can either add one to the number of classes or add one to the class width. Sometimes you're locked into a certain number of classes because of the instructions. The Bluman text fails to mention the case when there is no remainder.
5. Pick a suitable starting point less than or equal to the minimum value. You will be able to cover: "the class width times the number of classes" values. You need to cover one more value than the range. Follow this rule and you'll be okay: The starting point plus the number of classes times the class width must be greater than the maximum value. Your starting point is the lower limit of the first class. Continue to add the class width to this lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract one from the lower limit of the second class. Then continue to add the class width to this upper limit to find the rest of the upper limits.
7. Find the boundaries by subtracting 0.5 units from the lower limits and adding 0.5 units from the upper limits. The boundaries are also half-way between the upper limit of one class and the lower limit of the next class. Depending on what you're trying to accomplish, it may not be necessary to find the boundaries.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies.

Faraz_1984 · #5 Monday, August 04, 2008

Data Description

•Definitions
• Statistic
• Characteristic or measure obtained from a sample
• Parameter
• Characteristic or measure obtained from a population
• Mean
• Sum of all the values divided by the number of values. This can either be a population mean (denoted by mu) or a sample mean (denoted by x bar)
• Median
• The midpoint of the data after being ranked (sorted in ascending order). There are as many numbers below the median as above the median.
• Mode
• The most frequent number
• Skewed Distribution
• The majority of the values lie together on one side with a very few values (the tail) to the other side. In a positively skewed distribution, the tail is to the right and the mean is larger than the median. In a negatively skewed distribution, the tail is to the left and the mean is smaller than the median.
• Symmetric Distribution
• The data values are evenly distributed on both sides of the mean. In a symmetric distribution, the mean is the median.
• Weighted Mean
• The mean when each value is multiplied by its weight and summed. This sum is divided by the total of the weights.
• Midrange
• The mean of the highest and lowest values. (Max + Min) / 2
• Range
• The difference between the highest and lowest values. Max - Min
• Population Variance
• The average of the squares of the distances from the population mean. It is the sum of the squares of the deviations from the mean divided by the population size. The units on the variance are the units of the population squared.
• Sample Variance
• Unbiased estimator of a population variance. Instead of dividing by the population size, the sum of the squares of the deviations from the sample mean is divided by one less than the sample size. The units on the variance are the units of the population squared.
• Standard Deviation
• The square root of the variance. The population standard deviation is the square root of the population variance and the sample standard deviation is the square root of the sample variance. The sample standard deviation is not the unbiased estimator for the population standard deviation. The units on the standard deviation is the same as the units of the population/sample.
• Coefficient of Variation
• Standard deviation divided by the mean, expressed as a percentage. We won't work with the Coefficient of Variation in this course.
• Chebyshev's Theorem
• The proportion of the values that fall within k standard deviations of the mean is at least where k > 1. Chebyshev's theorem can be applied to any distribution regardless of its shape.
• Empirical or Normal Rule
• Only valid when a distribution in bell-shaped (normal). Approximately 68% lies within 1 standard deviation of the mean; 95% within 2 standard deviations; and 99.7% within 3 standard deviations of the mean.
• Standard Score or Z-Score
• The value obtained by subtracting the mean and dividing by the standard deviation. When all values are transformed to their standard scores, the new mean (for Z) will be zero and the standard deviation will be one.
• Percentile
• The percent of the population which lies below that value. The data must be ranked to find percentiles.
• Quartile
• Either the 25th, 50th, or 75th percentiles. The 50th percentile is also called the median.
• Decile
• Either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th percentiles.

• InterQuartile Range (IQR)
• The difference between the 3rd and 1st Quartiles.

Faraz_1984 · #6 Monday, August 04, 2008

Measures of Central Tendency

The term "Average" is vague
Average could mean one of four things. The arithmetic mean, the median, midrange, or mode. For this reason, it is better to specify which average you're talking about.
Mean
This is what people usually intend when they say "average"
Population Mean:
Sample Mean:
Frequency Distribution:
The mean of a frequency distribution is also the weighted mean.
Median
The data must be ranked (sorted in ascending order) first. The median is the number in the middle.
To find the depth of the median, there are several formulas that could be used, the one that we will use is:
Depth of median = 0.5 * (n + 1)
Raw Data
The median is the number in the "depth of the median" position. If the sample size is even, the depth of the median will be a decimal -- you need to find the midpoint between the numbers on either side of the depth of the median.
Ungrouped Frequency Distribution
Find the cumulative frequencies for the data. The first value with a cumulative frequency greater than depth of the median is the median. If the depth of the median is exactly 0.5 more than the cumulative frequency of the previous class, then the median is the midpoint between the two classes.
Grouped Frequency Distribution
Since the data is grouped, you have lost all original information. Some textbooks have you simply take the midpoint of the class. This is an over-simplification which isn't the true value (but much easier to do). The correct process is to interpolate.
Find out what proportion of the distance into the median class the median by dividing the sample size by 2, subtracting the cumulative frequency of the previous class, and then dividing all that bay the frequency of the median class.
Multiply this proportion by the class width and add it to the lower boundary of the median class.
Mode
The mode is the most frequent data value. There may be no mode if no one value appears more than any other. There may also be two modes (bimodal), three modes (trimodal), or more than three modes (multi-modal).
For grouped frequency distributions, the modal class is the class with the largest frequency.
Midrange (Mid-point)
The midrange is simply the midpoint between the highest and lowest values.
Summary
The Mean is used in computing other statistics (such as the variance) and does not exist for open ended grouped frequency distributions (1). It is often not appropriate for skewed distributions such as salary information.
The Median is the center number and is good for skewed distributions because it is resistant to change.
The Mode is used to describe the most typical case. The mode can be used with nominal data whereas the others can't. The mode may or may not exist and there may be more than one value for the mode (2).
The Midrange is not used very often. It is a very rough estimate of the average and is greatly affected by extreme values (even more so than the mean).
Property Mean Median Mode Midrange

Always Exists No (1) Yes No (2) Yes

Uses all data values Yes No No No

Affected by extreme values Yes No No Yes
Note Please Arrange it

Faraz_1984 · #7 Monday, August 04, 2008

Stats: Measures of Variation
________________________________________
Range
The range is the simplest measure of variation to find. It is simply the highest value minus the lowest value.
RANGE = MAXIMUM - MINIMUM
Since the range only uses the largest and smallest values, it is greatly affected by extreme values, that is - it is not resistant to change.

Variance
"Average Deviation"
The range only involves the smallest and largest numbers, and it would be desirable to have a statistic which involved all of the data values.
The first attempt one might make at this is something they might call the average deviation from the mean and define it as:

The problem is that this summation is always zero. So, the average deviation will always be zero. That is why the average deviation is never used.
Population Variance
So, to keep it from being zero, the deviation from the mean is squared and called the "squared deviation from the mean". This "average squared deviation from the mean" is called the variance.

Unbiased Estimate of the Population Variance
One would expect the sample variance to simply be the population variance with the population mean replaced by the sample mean. However, one of the major uses of statistics is to estimate the corresponding parameter. This formula has the problem that the estimated value isn't the same as the parameter. To counteract this, the sum of the squares of the deviations is divided by one less than the sample size.

Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the units were also squared. To get the units back the same as the original data values, the square root must be taken.

The sample standard deviation is not the unbiased estimator for the population standard deviation.
The calculator does not have a variance key on it. It does have a standard deviation key. You will have to square the standard deviation to find the variance.
Sum of Squares (shortcuts)
The sum of the squares of the deviations from the means is given a shortcut notation and several alternative formulas.

A little algebraic simplification returns:
What's wrong with the first formula, you ask? Consider the following example - the last row are the totals for the columns
1. Total the data values: 23
2. Divide by the number of values to get the mean: 23/5 = 4.6
3. Subtract the mean from each value to get the numbers in the second column.
4. Square each number in the second column to get the values in the third column.
5. Total the numbers in the third column: 5.2
6. Divide this total by one less than the sample size to get the variance: 5.2 / 4 = 1.3
x

4 4 - 4.6 = -0.6 ( - 0.6 )^2 = 0.36
5 5 - 4.6 = 0.4 ( 0.4 ) ^2 = 0.16
3 3 - 4.6 = -1.6 ( - 1.6 )^2 = 2.56
6 6 - 4.6 = 1.4 ( 1.4 )^2 = 1.96
5 5 - 4.6 = 0.4 ( 0.4 )^2 = 0.16
23 0.00 (Always) 5.2
Not too bad, you think. But this can get pretty bad if the sample mean doesn't happen to be an "nice" rational number. Think about having a mean of 19/7 = 2.714285714285... Those subtractions get nasty, and when you square them, they're really bad. Another problem with the first formula is that it requires you to know the mean ahead of time. For a calculator, this would mean that you have to save all of the numbers that were entered. The TI-82 does this, but most scientific calculators don't.
Now, let's consider the shortcut formula. The only things that you need to find are the sum of the values and the sum of the values squared. There is no subtraction and no decimals or fractions until the end. The last row contains the sums of the columns, just like before.
1. Record each number in the first column and the square of each number in the second column.
2. Total the first column: 23
3. Total the second column: 111
4. Compute the sum of squares: 111 - 23*23/5 = 111 - 105.8 = 5.2
5. Divide the sum of squares by one less than the sample size to get the variance = 5.2 / 4 = 1.3
x x^2
4 16
5 25
3 9
6 36
5 25
23 111

Faraz_1984 · #8 Monday, August 04, 2008

Stats: Counting Techniques
________________________________________
Definitions
Factorial
A positive integer factorial is the product of each natural number up to and including the integer.
Permutation
An arrangement of objects in a specific order.
Combination
A selection of objects without regard to order.
Tree Diagram
A graphical device used to list all possibilities of a sequence of events in a systematic way.

Stats: Counting Techniques
________________________________________
Fundamental Theorems
Arithmetic
Every integer greater than one is either prime or can be expressed as an unique product of prime numbers
Algebra
Every polynomial function on one variable of degree n > 0 has at least one real or complex zero.
Linear Programming
If there is a solution to a linear programming problem, then it will occur at a corner point or on a boundary between two or more corner points
Fundamental Counting Principle
In a sequence of events, the total possible number of ways all events can performed is the product of the possible number of ways each individual event can be performed.
The Bluman text calls this multiplication principle 2.
Factorials
If n is a positive integer, then
n! = n (n-1) (n-2) ... (3)(2)(1)
n! = n (n-1)!
A special case is 0!
0! = 1
Permutations
A permutation is an arrangement of objects without repetition where order is important.
Permutations using all the objects
A permutation of n objects, arranged into one group of size n, without repetition, and order being important is:
nPn = P(n,n) = n!
Example: Find all permutations of the letters "ABC"
ABC ACB BAC BCA CAB CBA
Permutations of some of the objects
A permutation of n objects, arranged in groups of size r, without repetition, and order being important is:
nPr = P(n,r) = n! / (n-r)!
Example: Find all two-letter permutations of the letters "ABC"
AB AC BA BC CA CB
Shortcut formula for finding a permutation
Assuming that you start a n and count down to 1 in your factorials ...
P(n,r) = first r factors of n factorial
Distinguishable Permutations
Sometimes letters are repeated and all of the permutations aren't distinguishable from each other.
Example: Find all permutations of the letters "BOB"
To help you distinguish, I'll write the second "B" as "b"
BOb BbO OBb ObB bBO bOB
If you just write "B" as "B", however ...
BOB BBO OBB OBB BBO BBO
There are really only three distinguishable permutations here.
BOB BBO OBB
If a word has N letters, k of which are unique, and you let n (n1, n2, n3, ..., nk) be the frequency of each of the k letters, then the total number of distinguishable permutations is given by:

Consider the word "STATISTICS":
Here are the frequency of each letter: S=3, T=3, A=1, I=2, C=1, there are 10 letters total
10! 10*9*8*7*6*5*4*3*2*1
Permutations = -------------- = -------------------- = 50400
3! 3! 1! 2! 1! 6 * 6 * 1 * 2 * 1
Combinations
A combination is an arrangement of objects without repetition where order is not important.
Note: The difference between a permutation and a combination is not whether there is repetition or not -- there must not be repetition with either, and if there is repetition, you can not use the formulas for permutations or combinations. The only difference in the definition of a permutation and a combination is whether order is important.
A combination of n objects, arranged in groups of size r, without repetition, and order being important is:
nCr = C(n,r) = n! / ( (n-r)! * r! )
Another way to write a combination of n things, r at a time is using the binomial notation:
Example: Find all two-letter combinations of the letters "ABC"
AB = BA AC = CA BC = CB
There are only three two-letter combinations.
Shortcut formula for finding a combination
Assuming that you start a n and count down to 1 in your factorials ...
C(n,r) = first r factors of n factorial divided by the last r factors of n factorial
Pascal's Triangle
Combinations are used in the binomial expansion theorem from algebra to give the coefficients of the expansion (a+b)^n. They also form a pattern known as Pascal's Triangle.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
Each element in the table is the sum of the two elements directly above it. Each element is also a combination. The n value is the number of the row (start counting at zero) and the r value is the element in the row (start counting at zero). That would make the 20 in the next to last row C(6,3) -- it's in the row #6 (7th row) and position #3 (4th element).
Symmetry
Pascal's Triangle illustrates the symmetric nature of a combination. C(n,r) = C(n,n-r)
Example: C(10,4) = C(10,6) or C(100,99) = C(100,1)
Shortcut formula for finding a combination
Since combinations are symmetric, if n-r is smaller than r, then switch the combination to its alternative form and then use the shortcut given above.
C(n,r) = first r factors of n factorial divided by the last r factors of n factorial
Tree Diagrams
Tree diagrams are a graphical way of listing all the possible outcomes. The outcomes are listed in an orderly fashion, so listing all of the possible outcomes is easier than just trying to make sure that you have them all listed. It is called a tree diagram because of the way it looks.

The first event appears on the left, and then each sequential event is represented as branches off of the first event.
The tree diagram to the right would show the possible ways of flipping two coins. The final outcomes are obtained by following each branch to its conclusion: They are from top to bottom:
HH HT TH TT

Faraz_1984 · #9 Monday, August 04, 2008

Probability
________________________________________

Definitions
Probability Experiment
Process which leads to well-defined results call outcomes
Outcome
The result of a single trial of a probability experiment
Sample Space
Set of all possible outcomes of a probability experiment
Event
One or more outcomes of a probability experiment
Classical Probability
Uses the sample space to determine the numerical probability that an event will happen. Also called theoretical probability.
Equally Likely Events
Events which have the same probability of occurring.
Complement of an Event
All the events in the sample space except the given events.
Empirical Probability
Uses a frequency distribution to determine the numerical probability. An empirical probability is a relative frequency.
Subjective Probability
Uses probability values based on an educated guess or estimate. It employs opinions and inexact information.
Mutually Exclusive Events
Two events which cannot happen at the same time.
Disjoint Events
Another name for mutually exclusive events.
Independent Events
Two events are independent if the occurrence of one does not affect the probability of the other occurring.
Dependent Events
Two events are dependent if the first event affects the outcome or occurrence of the second event in a way the probability is changed.
Conditional Probability
The probability of an event occurring given that another event has already occurred.
Bayes' Theorem
A formula which allows one to find the probability that an event occurred as the result of a particular previous event.

Faraz_1984 · #10 Monday, August 04, 2008

Introduction to Probability
________________________________________
Sample Spaces
A sample space is the set of all possible outcomes. However, some sample spaces are better than others.
Consider the experiment of flipping two coins. It is possible to get 0 heads, 1 head, or 2 heads. Thus, the sample space could be {0, 1, 2}. Another way to look at it is flip { HH, HT, TH, TT }. The second way is better because each event is as equally likely to occur as any other.
When writing the sample space, it is highly desirable to have events which are equally likely.
Another example is rolling two dice. The sums are { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }. However, each of these aren't equally likely. The only way to get a sum 2 is to roll a 1 on both dice, but you can get a sum of 4 by rolling a 1-3, 2-2, or 3-1. The following table illustrates a better sample space for the sum obtain when rolling two dice.
First Die Second Die
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Classical Probability
The above table lends itself to describing data another way -- using a probability distribution. Let's consider the frequency distribution for the above sums.
Sum Frequency Relative Frequency
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
If just the first and last columns were written, we would have a probability distribution. The relative frequency of a frequency distribution is the probability of the event occurring. This is only true, however, if the events are equally likely.
This gives us the formula for classical probability. The probability of an event occurring is the number in the event divided by the number in the sample space. Again, this is only true when the events are equally likely. A classical probability is the relative frequency of each event in the sample space when each event is equally likely.
P(E) = n(E) / n(S)
Empirical Probability
Empirical probability is based on observation. The empirical probability of an event is the relative frequency of a frequency distribution based upon observation.
P(E) = f / n
Probability Rules
There are two rules which are very important.
All probabilities are between 0 and 1 inclusive
0 <= P(E) <= 1
The sum of all the probabilities in the sample space is 1
There are some other rules which are also important.
The probability of an event which cannot occur is 0.
The probability of any event which is not in the sample space is zero.
The probability of an event which must occur is 1.
The probability of the sample space is 1.
The probability of an event not occurring is one minus the probability of it occurring.
P(E') = 1 - P(E)

"OR" or Unions
Mutually Exclusive Events
Two events are mutually exclusive if they cannot occur at the same time. Another word that means mutually exclusive is disjoint.
If two events are disjoint, then the probability of them both occurring at the same time is 0.
Disjoint: P(A and B) = 0
If two events are mutually exclusive, then the probability of either occurring is the sum of the probabilities of each occurring.
Specific Addition Rule
Only valid when the events are mutually exclusive.
P(A or B) = P(A) + P(B)
Example 1:
Given: P(A) = 0.20, P(B) = 0.70, A and B are disjoint
I like to use what's called a joint probability distribution. (Since disjoint means nothing in common, joint is what they have in common -- so the values that go on the inside portion of the table are the intersections or "and"s of each pair of events). "Marginal" is another word for totals -- it's called marginal because they appear in the margins.
B B' Marginal
A 0.00 0.20 0.20
A' 0.70 0.10 0.80
Marginal 0.70 0.30 1.00
The values in red are given in the problem. The grand total is always 1.00. The rest of the values are obtained by addition and subtraction.
Non-Mutually Exclusive Events
In events which aren't mutually exclusive, there is some overlap. When P(A) and P(B) are added, the probability of the intersection (and) is added twice. To compensate for that double addition, the intersection needs to be subtracted.
General Addition Rule
Always valid.
P(A or B) = P(A) + P(B) - P(A and B)
Example 2:
Given P(A) = 0.20, P(B) = 0.70, P(A and B) = 0.15
B B' Marginal
A 0.15 0.05 0.20
A' 0.55 0.25 0.80
Marginal 0.70 0.30 1.00
Interpreting the table
Certain things can be determined from the joint probability distribution. Mutually exclusive events will have a probability of zero. All inclusive events will have a zero opposite the intersection. All inclusive means that there is nothing outside of those two events: P(A or B) = 1.
B B' Marginal
A A and B are Mutually Exclusive if this value is 0 . .
A' . A and B are All Inclusive if this value is 0 .
Marginal . . 1.00
"AND" or Intersections
Independent Events
Two events are independent if the occurrence of one does not change the probability of the other occurring.
An example would be rolling a 2 on a die and flipping a head on a coin. Rolling the 2 does not affect the probability of flipping the head.
If events are independent, then the probability of them both occurring is the product of the probabilities of each occurring.
Specific Multiplication Rule
Only valid for independent events
P(A and B) = P(A) * P(B)
Example 3:
P(A) = 0.20, P(B) = 0.70, A and B are independent.
B B' Marginal
A 0.14 0.06 0.20
A' 0.56 0.24 0.80
Marginal 0.70 0.30 1.00
The 0.14 is because the probability of A and B is the probability of A times the probability of B or 0.20 * 0.70 = 0.14.
Dependent Events
If the occurrence of one event does affect the probability of the other occurring, then the events are dependent.
Conditional Probability
The probability of event B occurring that event A has already occurred is read "the probability of B given A" and is written: P(B|A)
General Multiplication Rule
Always works.
P(A and B) = P(A) * P(B|A)
Example 4:
P(A) = 0.20, P(B) = 0.70, P(B|A) = 0.40
A good way to think of P(B|A) is that 40% of A is B. 40% of the 20% which was in event A is 8%, thus the intersection is 0.08.
B B' Marginal
A 0.08 0.12 0.20
A' 0.62 0.18 0.80
Marginal 0.70 0.30 1.00
Independence Revisited
The following four statements are equivalent
1. A and B are independent events
2. P(A and B) = P(A) * P(B)
3. P(A|B) = P(A)
4. P(B|A) = P(B)
The last two are because if two events are independent, the occurrence of one doesn't change the probability of the occurrence of the other. This means that the probability of B occurring, whether A has happened or not, is simply the probability of B occurring.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Very Important : How to Prepare Study Notes	Shaa-Baaz	Tips and Experience Sharing	5	Sunday, May 21, 2017 08:30 PM
Regarding Notes	Anonymous84	Tips and Experience Sharing	1	Wednesday, August 15, 2007 06:56 PM

The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post:
Bilal Salim (Wednesday, February 02, 2011), MOEEN AKHTAR (Friday, October 19, 2012)

The Following User Says Thank You to Faraz_1984 For This Useful Post:
Bilal Salim (Wednesday, February 02, 2011)