|
Share Thread: Facebook Twitter Google+ |
|
LinkBack | Thread Tools | Search this Thread |
#1
|
||||
|
||||
Notes and Topics on Statistics
Definitions
|
The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011), MOEEN AKHTAR (Friday, October 19, 2012) |
#2
|
||||
|
||||
Notes and Topics on Statistics:
|
The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011), MOEEN AKHTAR (Friday, October 19, 2012) |
#3
|
||||
|
||||
Frequency Distributions & Graphs
Frequency Distributions & Graphs
________________________________________
Last edited by Faraz_1984; Monday, August 04, 2008 at 12:02 PM. Reason: Mistake |
The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011), MOEEN AKHTAR (Friday, October 19, 2012) |
#4
|
||||
|
||||
Grouped Frequency Distributions
Grouped Frequency Distributions
________________________________________ Guidelines for classes 1. There should be between 5 and 20 classes. 2. The class width should be an odd number. This will guarantee that the class midpoints are integers instead of decimals. 3. The classes must be mutually exclusive. This means that no data value can fall into two different classes 4. The classes must be all inclusive or exhaustive. This means that all data values must be included. 5. The classes must be continuous. There are no gaps in a frequency distribution. Classes that have no values in them must be included (unless it's the first or last class which are dropped). 6. The classes must be equal in width. The exception here is the first or last class. It is possible to have an "below ..." or "... and above" class. This is often used with ages. Creating a Grouped Frequency Distribution 1. Find the largest and smallest values 2. Compute the Range = Maximum - Minimum 3. Select the number of classes desired. This is usually between 5 and 20. 4. Find the class width by dividing the range by the number of classes and rounding up. There are two things to be careful of here. You must round up, not off. Normally 3.2 would round to be 3, but in rounding up, it becomes 4. If the range divided by the number of classes gives an integer value (no remainder), then you can either add one to the number of classes or add one to the class width. Sometimes you're locked into a certain number of classes because of the instructions. The Bluman text fails to mention the case when there is no remainder. 5. Pick a suitable starting point less than or equal to the minimum value. You will be able to cover: "the class width times the number of classes" values. You need to cover one more value than the range. Follow this rule and you'll be okay: The starting point plus the number of classes times the class width must be greater than the maximum value. Your starting point is the lower limit of the first class. Continue to add the class width to this lower limit to get the rest of the lower limits. 6. To find the upper limit of the first class, subtract one from the lower limit of the second class. Then continue to add the class width to this upper limit to find the rest of the upper limits. 7. Find the boundaries by subtracting 0.5 units from the lower limits and adding 0.5 units from the upper limits. The boundaries are also half-way between the upper limit of one class and the lower limit of the next class. Depending on what you're trying to accomplish, it may not be necessary to find the boundaries. 8. Tally the data. 9. Find the frequencies. 10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not be necessary to find the cumulative frequencies. 11. If necessary, find the relative frequencies and/or relative cumulative frequencies. |
The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011), MOEEN AKHTAR (Friday, October 19, 2012) |
#5
|
||||
|
||||
Data Description
Data Description
Definitions Statistic Characteristic or measure obtained from a sample Parameter Characteristic or measure obtained from a population Mean Sum of all the values divided by the number of values. This can either be a population mean (denoted by mu) or a sample mean (denoted by x bar) Median The midpoint of the data after being ranked (sorted in ascending order). There are as many numbers below the median as above the median. Mode The most frequent number Skewed Distribution The majority of the values lie together on one side with a very few values (the tail) to the other side. In a positively skewed distribution, the tail is to the right and the mean is larger than the median. In a negatively skewed distribution, the tail is to the left and the mean is smaller than the median. Symmetric Distribution The data values are evenly distributed on both sides of the mean. In a symmetric distribution, the mean is the median. Weighted Mean The mean when each value is multiplied by its weight and summed. This sum is divided by the total of the weights. Midrange The mean of the highest and lowest values. (Max + Min) / 2 Range The difference between the highest and lowest values. Max - Min Population Variance The average of the squares of the distances from the population mean. It is the sum of the squares of the deviations from the mean divided by the population size. The units on the variance are the units of the population squared. Sample Variance Unbiased estimator of a population variance. Instead of dividing by the population size, the sum of the squares of the deviations from the sample mean is divided by one less than the sample size. The units on the variance are the units of the population squared. Standard Deviation The square root of the variance. The population standard deviation is the square root of the population variance and the sample standard deviation is the square root of the sample variance. The sample standard deviation is not the unbiased estimator for the population standard deviation. The units on the standard deviation is the same as the units of the population/sample. Coefficient of Variation Standard deviation divided by the mean, expressed as a percentage. We won't work with the Coefficient of Variation in this course. Chebyshev's Theorem The proportion of the values that fall within k standard deviations of the mean is at least where k > 1. Chebyshev's theorem can be applied to any distribution regardless of its shape. Empirical or Normal Rule Only valid when a distribution in bell-shaped (normal). Approximately 68% lies within 1 standard deviation of the mean; 95% within 2 standard deviations; and 99.7% within 3 standard deviations of the mean. Standard Score or Z-Score The value obtained by subtracting the mean and dividing by the standard deviation. When all values are transformed to their standard scores, the new mean (for Z) will be zero and the standard deviation will be one. Percentile The percent of the population which lies below that value. The data must be ranked to find percentiles. Quartile Either the 25th, 50th, or 75th percentiles. The 50th percentile is also called the median. Decile Either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th percentiles. InterQuartile Range (IQR) The difference between the 3rd and 1st Quartiles. |
The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011), MOEEN AKHTAR (Friday, October 19, 2012) |
#6
|
||||
|
||||
Measures of Central Tendency
Measures of Central Tendency
|
The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011), MOEEN AKHTAR (Friday, October 19, 2012) |
#7
|
||||
|
||||
Measures of Variation
Stats: Measures of Variation
________________________________________ Range The range is the simplest measure of variation to find. It is simply the highest value minus the lowest value. RANGE = MAXIMUM - MINIMUM Since the range only uses the largest and smallest values, it is greatly affected by extreme values, that is - it is not resistant to change. Variance "Average Deviation" The range only involves the smallest and largest numbers, and it would be desirable to have a statistic which involved all of the data values. The first attempt one might make at this is something they might call the average deviation from the mean and define it as: The problem is that this summation is always zero. So, the average deviation will always be zero. That is why the average deviation is never used. Population Variance So, to keep it from being zero, the deviation from the mean is squared and called the "squared deviation from the mean". This "average squared deviation from the mean" is called the variance. Unbiased Estimate of the Population Variance One would expect the sample variance to simply be the population variance with the population mean replaced by the sample mean. However, one of the major uses of statistics is to estimate the corresponding parameter. This formula has the problem that the estimated value isn't the same as the parameter. To counteract this, the sum of the squares of the deviations is divided by one less than the sample size. Standard Deviation There is a problem with variances. Recall that the deviations were squared. That means that the units were also squared. To get the units back the same as the original data values, the square root must be taken. The sample standard deviation is not the unbiased estimator for the population standard deviation. The calculator does not have a variance key on it. It does have a standard deviation key. You will have to square the standard deviation to find the variance. Sum of Squares (shortcuts) The sum of the squares of the deviations from the means is given a shortcut notation and several alternative formulas. A little algebraic simplification returns: What's wrong with the first formula, you ask? Consider the following example - the last row are the totals for the columns 1. Total the data values: 23 2. Divide by the number of values to get the mean: 23/5 = 4.6 3. Subtract the mean from each value to get the numbers in the second column. 4. Square each number in the second column to get the values in the third column. 5. Total the numbers in the third column: 5.2 6. Divide this total by one less than the sample size to get the variance: 5.2 / 4 = 1.3 x 4 4 - 4.6 = -0.6 ( - 0.6 )^2 = 0.36 5 5 - 4.6 = 0.4 ( 0.4 ) ^2 = 0.16 3 3 - 4.6 = -1.6 ( - 1.6 )^2 = 2.56 6 6 - 4.6 = 1.4 ( 1.4 )^2 = 1.96 5 5 - 4.6 = 0.4 ( 0.4 )^2 = 0.16 23 0.00 (Always) 5.2 Not too bad, you think. But this can get pretty bad if the sample mean doesn't happen to be an "nice" rational number. Think about having a mean of 19/7 = 2.714285714285... Those subtractions get nasty, and when you square them, they're really bad. Another problem with the first formula is that it requires you to know the mean ahead of time. For a calculator, this would mean that you have to save all of the numbers that were entered. The TI-82 does this, but most scientific calculators don't. Now, let's consider the shortcut formula. The only things that you need to find are the sum of the values and the sum of the values squared. There is no subtraction and no decimals or fractions until the end. The last row contains the sums of the columns, just like before. 1. Record each number in the first column and the square of each number in the second column. 2. Total the first column: 23 3. Total the second column: 111 4. Compute the sum of squares: 111 - 23*23/5 = 111 - 105.8 = 5.2 5. Divide the sum of squares by one less than the sample size to get the variance = 5.2 / 4 = 1.3 x x^2 4 16 5 25 3 9 6 36 5 25 23 111 |
The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011), MOEEN AKHTAR (Friday, October 19, 2012) |
#8
|
||||
|
||||
Counting Techniques
Stats: Counting Techniques
________________________________________ Definitions Factorial A positive integer factorial is the product of each natural number up to and including the integer. Permutation An arrangement of objects in a specific order. Combination A selection of objects without regard to order. Tree Diagram A graphical device used to list all possibilities of a sequence of events in a systematic way. Stats: Counting Techniques ________________________________________ Fundamental Theorems Arithmetic Every integer greater than one is either prime or can be expressed as an unique product of prime numbers Algebra Every polynomial function on one variable of degree n > 0 has at least one real or complex zero. Linear Programming If there is a solution to a linear programming problem, then it will occur at a corner point or on a boundary between two or more corner points Fundamental Counting Principle In a sequence of events, the total possible number of ways all events can performed is the product of the possible number of ways each individual event can be performed. The Bluman text calls this multiplication principle 2. Factorials If n is a positive integer, then n! = n (n-1) (n-2) ... (3)(2)(1) n! = n (n-1)! A special case is 0! 0! = 1 Permutations A permutation is an arrangement of objects without repetition where order is important. Permutations using all the objects A permutation of n objects, arranged into one group of size n, without repetition, and order being important is: nPn = P(n,n) = n! Example: Find all permutations of the letters "ABC" ABC ACB BAC BCA CAB CBA Permutations of some of the objects A permutation of n objects, arranged in groups of size r, without repetition, and order being important is: nPr = P(n,r) = n! / (n-r)! Example: Find all two-letter permutations of the letters "ABC" AB AC BA BC CA CB Shortcut formula for finding a permutation Assuming that you start a n and count down to 1 in your factorials ... P(n,r) = first r factors of n factorial Distinguishable Permutations Sometimes letters are repeated and all of the permutations aren't distinguishable from each other. Example: Find all permutations of the letters "BOB" To help you distinguish, I'll write the second "B" as "b" BOb BbO OBb ObB bBO bOB If you just write "B" as "B", however ... BOB BBO OBB OBB BBO BBO There are really only three distinguishable permutations here. BOB BBO OBB If a word has N letters, k of which are unique, and you let n (n1, n2, n3, ..., nk) be the frequency of each of the k letters, then the total number of distinguishable permutations is given by: Consider the word "STATISTICS": Here are the frequency of each letter: S=3, T=3, A=1, I=2, C=1, there are 10 letters total 10! 10*9*8*7*6*5*4*3*2*1 Permutations = -------------- = -------------------- = 50400 3! 3! 1! 2! 1! 6 * 6 * 1 * 2 * 1 Combinations A combination is an arrangement of objects without repetition where order is not important. Note: The difference between a permutation and a combination is not whether there is repetition or not -- there must not be repetition with either, and if there is repetition, you can not use the formulas for permutations or combinations. The only difference in the definition of a permutation and a combination is whether order is important. A combination of n objects, arranged in groups of size r, without repetition, and order being important is: nCr = C(n,r) = n! / ( (n-r)! * r! ) Another way to write a combination of n things, r at a time is using the binomial notation: Example: Find all two-letter combinations of the letters "ABC" AB = BA AC = CA BC = CB There are only three two-letter combinations. Shortcut formula for finding a combination Assuming that you start a n and count down to 1 in your factorials ... C(n,r) = first r factors of n factorial divided by the last r factors of n factorial Pascal's Triangle Combinations are used in the binomial expansion theorem from algebra to give the coefficients of the expansion (a+b)^n. They also form a pattern known as Pascal's Triangle. 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 1 6 15 20 15 6 1 1 7 21 35 35 21 7 1 Each element in the table is the sum of the two elements directly above it. Each element is also a combination. The n value is the number of the row (start counting at zero) and the r value is the element in the row (start counting at zero). That would make the 20 in the next to last row C(6,3) -- it's in the row #6 (7th row) and position #3 (4th element). Symmetry Pascal's Triangle illustrates the symmetric nature of a combination. C(n,r) = C(n,n-r) Example: C(10,4) = C(10,6) or C(100,99) = C(100,1) Shortcut formula for finding a combination Since combinations are symmetric, if n-r is smaller than r, then switch the combination to its alternative form and then use the shortcut given above. C(n,r) = first r factors of n factorial divided by the last r factors of n factorial Tree Diagrams Tree diagrams are a graphical way of listing all the possible outcomes. The outcomes are listed in an orderly fashion, so listing all of the possible outcomes is easier than just trying to make sure that you have them all listed. It is called a tree diagram because of the way it looks. The first event appears on the left, and then each sequential event is represented as branches off of the first event. The tree diagram to the right would show the possible ways of flipping two coins. The final outcomes are obtained by following each branch to its conclusion: They are from top to bottom: HH HT TH TT |
The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011), MOEEN AKHTAR (Friday, October 19, 2012) |
#9
|
||||
|
||||
Probability _______________
Probability
________________________________________
|
The Following User Says Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011) |
#10
|
||||
|
||||
Introduction to Probability
Introduction to Probability
________________________________________ Sample Spaces A sample space is the set of all possible outcomes. However, some sample spaces are better than others. Consider the experiment of flipping two coins. It is possible to get 0 heads, 1 head, or 2 heads. Thus, the sample space could be {0, 1, 2}. Another way to look at it is flip { HH, HT, TH, TT }. The second way is better because each event is as equally likely to occur as any other. When writing the sample space, it is highly desirable to have events which are equally likely. Another example is rolling two dice. The sums are { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }. However, each of these aren't equally likely. The only way to get a sum 2 is to roll a 1 on both dice, but you can get a sum of 4 by rolling a 1-3, 2-2, or 3-1. The following table illustrates a better sample space for the sum obtain when rolling two dice. First Die Second Die 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 Classical Probability The above table lends itself to describing data another way -- using a probability distribution. Let's consider the frequency distribution for the above sums. Sum Frequency Relative Frequency 2 1 1/36 3 2 2/36 4 3 3/36 5 4 4/36 6 5 5/36 7 6 6/36 8 5 5/36 9 4 4/36 10 3 3/36 11 2 2/36 12 1 1/36 If just the first and last columns were written, we would have a probability distribution. The relative frequency of a frequency distribution is the probability of the event occurring. This is only true, however, if the events are equally likely. This gives us the formula for classical probability. The probability of an event occurring is the number in the event divided by the number in the sample space. Again, this is only true when the events are equally likely. A classical probability is the relative frequency of each event in the sample space when each event is equally likely. P(E) = n(E) / n(S) Empirical Probability Empirical probability is based on observation. The empirical probability of an event is the relative frequency of a frequency distribution based upon observation. P(E) = f / n Probability Rules There are two rules which are very important. All probabilities are between 0 and 1 inclusive 0 <= P(E) <= 1 The sum of all the probabilities in the sample space is 1 There are some other rules which are also important. The probability of an event which cannot occur is 0. The probability of any event which is not in the sample space is zero. The probability of an event which must occur is 1. The probability of the sample space is 1. The probability of an event not occurring is one minus the probability of it occurring. P(E') = 1 - P(E) "OR" or Unions Mutually Exclusive Events Two events are mutually exclusive if they cannot occur at the same time. Another word that means mutually exclusive is disjoint. If two events are disjoint, then the probability of them both occurring at the same time is 0. Disjoint: P(A and B) = 0 If two events are mutually exclusive, then the probability of either occurring is the sum of the probabilities of each occurring. Specific Addition Rule Only valid when the events are mutually exclusive. P(A or B) = P(A) + P(B) Example 1: Given: P(A) = 0.20, P(B) = 0.70, A and B are disjoint I like to use what's called a joint probability distribution. (Since disjoint means nothing in common, joint is what they have in common -- so the values that go on the inside portion of the table are the intersections or "and"s of each pair of events). "Marginal" is another word for totals -- it's called marginal because they appear in the margins. B B' Marginal A 0.00 0.20 0.20 A' 0.70 0.10 0.80 Marginal 0.70 0.30 1.00 The values in red are given in the problem. The grand total is always 1.00. The rest of the values are obtained by addition and subtraction. Non-Mutually Exclusive Events In events which aren't mutually exclusive, there is some overlap. When P(A) and P(B) are added, the probability of the intersection (and) is added twice. To compensate for that double addition, the intersection needs to be subtracted. General Addition Rule Always valid. P(A or B) = P(A) + P(B) - P(A and B) Example 2: Given P(A) = 0.20, P(B) = 0.70, P(A and B) = 0.15 B B' Marginal A 0.15 0.05 0.20 A' 0.55 0.25 0.80 Marginal 0.70 0.30 1.00 Interpreting the table Certain things can be determined from the joint probability distribution. Mutually exclusive events will have a probability of zero. All inclusive events will have a zero opposite the intersection. All inclusive means that there is nothing outside of those two events: P(A or B) = 1. B B' Marginal A A and B are Mutually Exclusive if this value is 0 . . A' . A and B are All Inclusive if this value is 0 . Marginal . . 1.00 "AND" or Intersections Independent Events Two events are independent if the occurrence of one does not change the probability of the other occurring. An example would be rolling a 2 on a die and flipping a head on a coin. Rolling the 2 does not affect the probability of flipping the head. If events are independent, then the probability of them both occurring is the product of the probabilities of each occurring. Specific Multiplication Rule Only valid for independent events P(A and B) = P(A) * P(B) Example 3: P(A) = 0.20, P(B) = 0.70, A and B are independent. B B' Marginal A 0.14 0.06 0.20 A' 0.56 0.24 0.80 Marginal 0.70 0.30 1.00 The 0.14 is because the probability of A and B is the probability of A times the probability of B or 0.20 * 0.70 = 0.14. Dependent Events If the occurrence of one event does affect the probability of the other occurring, then the events are dependent. Conditional Probability The probability of event B occurring that event A has already occurred is read "the probability of B given A" and is written: P(B|A) General Multiplication Rule Always works. P(A and B) = P(A) * P(B|A) Example 4: P(A) = 0.20, P(B) = 0.70, P(B|A) = 0.40 A good way to think of P(B|A) is that 40% of A is B. 40% of the 20% which was in event A is 8%, thus the intersection is 0.08. B B' Marginal A 0.08 0.12 0.20 A' 0.62 0.18 0.80 Marginal 0.70 0.30 1.00 Independence Revisited The following four statements are equivalent 1. A and B are independent events 2. P(A and B) = P(A) * P(B) 3. P(A|B) = P(A) 4. P(B|A) = P(B) The last two are because if two events are independent, the occurrence of one doesn't change the probability of the occurrence of the other. This means that the probability of B occurring, whether A has happened or not, is simply the probability of B occurring. |
The Following 2 Users Say Thank You to Faraz_1984 For This Useful Post: | ||
Bilal Salim (Wednesday, February 02, 2011), shmaryan (Monday, May 31, 2010) |
Thread Tools | Search this Thread |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Very Important : How to Prepare Study Notes | Shaa-Baaz | Tips and Experience Sharing | 5 | Sunday, May 21, 2017 08:30 PM |
Regarding Notes | Anonymous84 | Tips and Experience Sharing | 1 | Wednesday, August 15, 2007 06:56 PM |