Notes and Topics on Statistics - Page 2

Faraz_1984 · #11 Tuesday, August 05, 2008

Stats: Conditional Probability
________________________________________
Conditional Probability
Recall that the probability of an event occurring given that another event has already occurred is called a conditional probability.
The probability that event B occurs, given that event A has already occurred is
P(B|A) = P(A and B) / P(A)
This formula comes from the general multiplication principle and a little bit of algebra.
Since we are given that event A has occurred, we have a reduced sample space. Instead of the entire sample space S, we now have a sample space of A since we know A has occurred. So the old rule about being the number in the event divided by the number in the sample space still applies. It is the number in A and B (must be in A since A has occurred) divided by the number in A. If you then divided numerator and denominator of the right hand side by the number in the sample space S, then you have the probability of A and B divided by the probability of A.
Examples
Example 1:
The question, "Do you smoke?" was asked of 100 people. Results are shown in the table.
. Yes No Total
Male 19 41 60
Female 12 28 40
Total 31 69 100
• What is the probability of a randomly selected individual being a male who smokes? This is just a joint probability. The number of "Male and Smoke" divided by the total = 19/100 = 0.19
• What is the probability of a randomly selected individual being a male? This is the total for male divided by the total = 60/100 = 0.60. Since no mention is made of smoking or not smoking, it includes all the cases.
• What is the probability of a randomly selected individual smoking? Again, since no mention is made of gender, this is a marginal probability, the total who smoke divided by the total = 31/100 = 0.31.
• What is the probability of a randomly selected male smoking? This time, you're told that you have a male - think of stratified sampling. What is the probability that the male smokes? Well, 19 males smoke out of 60 males, so 19/60 = 0.31666...
• What is the probability that a randomly selected smoker is male? This time, you're told that you have a smoker and asked to find the probability that the smoker is also male. There are 19 male smokers out of 31 total smokers, so 19/31 = 0.6129 (approx)
After that last part, you have just worked a Bayes' Theorem problem. I know you didn't realize it - that's the beauty of it. A Bayes' problem can be set up so it appears to be just another conditional probability. In this class we will treat Bayes' problems as another conditional probability and not involve the large messy formula given in the text (and every other text).
Example 2:
There are three major manufacturing companies that make a product: Aberations, Brochmailians, and Chompielians. Aberations has a 50% market share, and Brochmailians has a 30% market share. 5% of Aberations' product is defective, 7% of Brochmailians' product is defective, and 10% of Chompieliens' product is defective.
This information can be placed into a joint probability distribution
Company Good Defective Total
Aberations 0.50-0.025 = 0.475 0.05(0.50) = 0.025 0.50
Brochmailians 0.30-0.021 = 0.279 0.07(0.30) = 0.021 0.30
Chompieliens 0.20-0.020 = 0.180 0.10(0.20) = 0.020 0.20
Total 0.934 0.066 1.00
The percent of the market share for Chompieliens wasn't given, but since the marginals must add to be 1.00, they have a 20% market share.
Notice that the 5%, 7%, and 10% defective rates don't go into the table directly. This is because they are conditional probabilities and the table is a joint probability table. These defective probabilities are conditional upon which company was given. That is, the 7% is not P(Defective), but P(Defective|Brochmailians). The joint probability P(Defective and Brochmailians) = P(Defective|Brochmailians) * P(Brochmailians).
The "good" probabilities can be found by subtraction as shown above, or by multiplication using conditional probabilities. If 7% of Brochmailians' product is defective, then 93% is good. 0.93(0.30)=0.279.
• What is the probability a randomly selected product is defective? P(Defective) = 0.066
• What is the probability that a defective product came from Brochmailians? P(Brochmailian|Defective) = P(Brochmailian and Defective) / P(Defective) = 0.021/0.066 = 7/22 = 0.318 (approx).
• Are these events independent? No. If they were, then P(Brochmailians|Defective)=0.318 would have to equal the P(Brochmailians)=0.30, but it doesn't. Also, the P(Aberations and Defective)=0.025 would have to be P(Aberations)*P(Defective) = 0.50*0.066=0.033, and it doesn't.
The second question asked above is a Bayes' problem. Again, my point is, you don't have to know Bayes formula just to work a Bayes' problem.
Bayes' Theorem
However, just for the sake of argument, let's say that you want to know what Bayes' formula is.
Let's use the same example, but shorten each event to its one letter initial, ie: A, B, C, and D instead of Aberations, Brochmailians, Chompieliens, and Defective.
P(D|B) is not a Bayes problem. This is given in the problem. Bayes' formula finds the reverse conditional probability P(B|D).
It is based that the Given (D) is made of three parts, the part of D in A, the part of D in B, and the part of D in C.
P(B and D)
P(B|D) = -----------------------------------------
P(A and D) + P(B and D) + P(C and D)
Inserting the multiplication rule for each of these joint probabilities gives
P(D|B)*P(B)
P(B|D) = -----------------------------------------
P(D|A)*P(A) + P(D|B)*P(B) + P(D|C)*P(C)
However, and I hope you agree, it is much easier to take the joint probability divided by the marginal probability. The table does the adding for you and makes the problems doable without having to memorize the formulas.

Faraz_1984 · #12 Tuesday, August 05, 2008

Probability Distributions
________________________________________
Definitions
Random Variable
Variable whose values are determined by chance
Probability Distribution
The values a random variable can assume and the corresponding probabilities of each.
Expected Value
The theoretical mean of the variable.
Binomial Experiment
An experiment with a fixed number of independent trials. Each trial can only have two outcomes, or outcomes which can be reduced to two outcomes. The probability of each outcome must remain constant from trial to trial.
Binomial Distribution
The outcomes of a binomial experiment with their corresponding probabilities.
Multinomial Distribution
A probability distribution resulting from an experiment with a fixed number of independent trials. Each trial has two or more mutually exclusive outcomes. The probability of each outcome must remain constant from trial to trial.
Poisson Distribution
A probability distribution used when a density of items is distributed over a period of time. The sample size needs to be large and the probability of success to be small.
Hypergeometric Distribution
A probability distribution of a variable with two outcomes when sampling is done without replacement.

Stats: Probability Distributions
________________________________________
Probability Functions
A probability function is a function which assigns probabilities to the values of a random variable.
• All the probabilities must be between 0 and 1 inclusive
• The sum of the probabilities of the outcomes must be 1.
If these two conditions aren't met, then the function isn't a probability function. There is no requirement that the values of the random variable only be between 0 and 1, only that the probabilities be between 0 and 1.
Probability Distributions
A listing of all the values the random variable can assume with their corresponding probabilities make a probability distribution.
A note about random variables. A random variable does not mean that the values can be anything (a random number). Random variables have a well defined set of outcomes and well defined probabilities for the occurrence of each outcome. The random refers to the fact that the outcomes happen by chance -- that is, you don't know which outcome will occur next.
Here's an example probability distribution that results from the rolling of a single fair die.
x 1 2 3 4 5 6 sum
p(x) 1/6 1/6 1/6 1/6 1/6 1/6 6/6=1
Mean, Variance, and Standard Deviation
Consider the following.
The definitions for population mean and variance used with an ungrouped frequency distribution were:
Some of you might be confused by only dividing by N. Recall that this is the population variance, the sample variance, which was the unbiased estimator for the population variance was when it was divided by n-1.
Using algebra, this is equivalent to:
Recall that a probability is a long term relative frequency. So every f/N can be replaced by p(x). This simplifies to be:
What's even better, is that the last portion of the variance is the mean squared. So, the two formulas that we will be using are:

Here's the example we were working on earlier.
x 1 2 3 4 5 6 sum
p(x) 1/6 1/6 1/6 1/6 1/6 1/6 6/6 = 1
x p(x) 1/6 2/6 3/6 4/6 5/6 6/6 21/6 = 3.5
x^2 p(x) 1/6 4/6 9/6 16/6 25/6 36/6 91/6 = 15.1667
The mean is 7/2 or 3.5
The variance is 91/6 - (7/2)^2 = 35/12 = 2.916666...
The standard deviation is the square root of the variance = 1.7078
Do not use rounded off values in the intermediate calculations. Only round off the final answer.

Faraz_1984 · #13 Tuesday, August 05, 2008

Binomial Probabilities
________________________________________

Binomial Experiment
A binomial experiment is an experiment which satisfies these four conditions
• A fixed number of trials
• Each trial is independent of the others
• There are only two outcomes
• The probability of each outcome remains constant from trial to trial.
These can be summarized as: An experiment with a fixed number of independent trials, each of which can only have two possible outcomes.
The fact that each trial is independent actually means that the probabilities remain constant.
Examples of binomial experiments
• Tossing a coin 20 times to see how many tails occur.
• Asking 200 people if they watch ABC news.
• Rolling a die to see if a 5 appears.
Examples which aren't binomial experiments
• Rolling a die until a 6 appears (not a fixed number of trials)
• Asking 20 people how old they are (not two outcomes)
• Drawing 5 cards from a deck for a poker hand (done without replacement, so not independent)
Binomial Probability Function
Example:
What is the probability of rolling exactly two sixes in 6 rolls of a die?
There are five things you need to do to work a binomial story problem.
1. Define Success first. Success must be for a single trial. Success = "Rolling a 6 on a single die"
2. Define the probability of success (p): p = 1/6
3. Find the probability of failure: q = 5/6
4. Define the number of trials: n = 6
5. Define the number of successes out of those trials: x = 2
Anytime a six appears, it is a success (denoted S) and anytime something else appears, it is a failure (denoted F). The ways you can get exactly 2 successes in 6 trials are given below. The probability of each is written to the right of the way it could occur. Because the trials are independent, the probability of the event (all six dice) is the product of each probability of each outcome (die)
1 FFFFSS 5/6 * 5/6 * 5/6 * 5/6 * 1/6 * 1/6 = (1/6)^2 * (5/6)^4
2 FFFSFS 5/6 * 5/6 * 5/6 * 1/6 * 5/6 * 1/6 = (1/6)^2 * (5/6)^4
3 FFFSSF 5/6 * 5/6 * 5/6 * 1/6 * 1/6 * 5/6 = (1/6)^2 * (5/6)^4
4 FFSFFS 5/6 * 5/6 * 1/6 * 5/6 * 5/6 * 1/6 = (1/6)^2 * (5/6)^4
5 FFSFSF 5/6 * 5/6 * 1/6 * 5/6 * 1/6 * 5/6 = (1/6)^2 * (5/6)^4
6 FFSSFF 5/6 * 5/6 * 1/6 * 1/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
7 FSFFFS 5/6 * 1/6 * 5/6 * 5/6 * 5/6 * 1/6 = (1/6)^2 * (5/6)^4
8 FSFFSF 5/6 * 1/6 * 5/6 * 5/6 * 1/6 * 5/6 = (1/6)^2 * (5/6)^4
9 FSFSFF 5/6 * 1/6 * 5/6 * 1/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
10 FSSFFF 5/6 * 1/6 * 1/6 * 5/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
11 SFFFFS 1/6 * 5/6 * 5/6 * 5/6 * 5/6 * 1/6 = (1/6)^2 * (5/6)^4
12 SFFFSF 1/6 * 5/6 * 5/6 * 5/6 * 1/6 * 5/6 = (1/6)^2 * (5/6)^4
13 SFFSFF 1/6 * 5/6 * 5/6 * 1/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
14 SFSFFF 1/6 * 5/6 * 1/6 * 5/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
15 SSFFFF 1/6 * 1/6 * 5/6 * 5/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
Notice that each of the 15 probabilities are exactly the same: (1/6)^2 * (5/6)^4.
Also, note that the 1/6 is the probability of success and you needed 2 successes. The 5/6 is the probability of failure, and if 2 of the 6 trials were success, then 4 of the 6 must be failures. Note that 2 is the value of x and 4 is the value of n-x.
Further note that there are fifteen ways this can occur. This is the number of ways 2 successes can be occur in 6 trials without repetition and order not being important, or a combination of 6 things, 2 at a time.
The probability of getting exactly x success in n trials, with the probability of success on a single trial being p is:
P(X=x) = nCx * p^x * q^(n-x)
Example:
A coin is tossed 10 times. What is the probability that exactly 6 heads will occur.
1. Success = "A head is flipped on a single coin"
2. p = 0.5
3. q = 0.5
4. n = 10
5. x = 6
P(x=6) = 10C6 * 0.5^6 * 0.5^4 = 210 * 0.015625 * 0.0625 = 0.205078125
Mean, Variance, and Standard Deviation
The mean, variance, and standard deviation of a binomial distribution are extremely easy to find.

Another way to remember the variance is mu-q (since the np is mu).
Example:
Find the mean, variance, and standard deviation for the number of sixes that appear when rolling 30 dice.
Success = "a six is rolled on a single die". p = 1/6, q = 5/6.
The mean is 30 * (1/6) = 5. The variance is 30 * (1/6) * (5/6) = 25/6. The standard deviation is the square root of the variance = 2.041241452 (approx)

Faraz_1984 · #14 Tuesday, August 05, 2008

Stats: Normal Distribution
________________________________________
Definitions
Central Limit Theorem
Theorem which stats as the sample size increases, the sampling distribution of the sample means will become approximately normally distributed.
Correction for Continuity
A correction applied to convert a discrete distribution to a continuous distribution.
Finite Population Correction Factor
A correction applied to the standard error of the means when the sample size is more than 5% of the population size and the sampling is done without replacement.
Sampling Distribution of the Sample Means
Distribution obtained by using the means computed from random samples of a specific size.
Sampling Error
Difference which occurs between the sample statistic and the population parameter due to the fact that the sample isn't a perfect representation of the population.
Standard Error or the Mean
The standard deviation of the sampling distribution of the sample means. It is equal to the standard deviation of the population divided by the square root of the sample size.
Standard Normal Distribution
A normal distribution in which the mean is 0 and the standard deviation is 1. It is denoted by z.
Z-score
Also known as z-value. A standardized score in which the mean is zero and the standard deviation is 1. The Z score is used to represent the standard normal distribution.

Stats - Normal Distributions
________________________________________
Any Normal Distribution
• Bell-shaped
• Symmetric about mean
• Continuous
• Never touches the x-axis
• Total area under curve is 1.00
• Approximately 68% lies within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations of the mean. This is the Empirical Rule mentioned earlier.
• Data values represented by x which has mean mu and standard deviation sigma.
• Probability Function given by
Standard Normal Distribution
Same as a normal distribution, but also ...
• Mean is zero
• Variance is one
• Standard Deviation is one
• Data values represented by z.
• Probability Function given by
Normal Probabilities
There is a table which must be used to look up standard normal probabilities. The z-score is broken into two parts, the whole number and tenth are looked up along the left side and the hundredth is looked up across the top. The value in the intersection of the row and column is the area under the curve between zero and the z-score looked up.
Because of the symmetry of the normal distribution, look up the absolute value of any z-score.
Computing Normal Probabilities
There are several different situations that can arise when asked to find normal probabilities.
Situation Instructions
Between zero and
any number Look up the area in the table
Between two positives, or
Between two negatives Look up both areas in the table and subtract the smaller from the larger.
Between a negative and
a positive Look up both areas in the table and add them together
Less than a negative, or
Greater than a positive Look up the area in the table and subtract from 0.5000
Greater than a negative, or
Less than a positive Look up the area in the table and add to 0.5000
This can be shortened into two rules.
1. If there is only one z-score given, use 0.5000 for the second area, otherwise look up both z-scores in the table
2. If the two numbers are the same sign, then subtract; if they are different signs, then add. If there is only one z-score, then use the inequality to determine the second sign (< is negative, and > is positive).
Finding z-scores from probabilities
This is more difficult, and requires you to use the table inversely. You must look up the area between zero and the value on the inside part of the table, and then read the z-score from the outside. Finally, decide if the z-score should be positive or negative, based on whether it was on the left side or the right side of the mean. Remember, z-scores can be negative, but areas or probabilities cannot be.
Situation Instructions
Area between 0 and a value Look up the area in the table
Make negative if on the left side
Area in one tail Subtract the area from 0.5000
Look up the difference in the table
Make negative if in the left tail
Area including one complete half
(Less than a positive or greater than a negative) Subtract 0.5000 from the area
Look up the difference in the table
Make negative if on the left side
Within z units of the mean Divide the area by 2
Look up the quotient in the table
Use both the positive and negative z-scores
Two tails with equal area
(More than z units from the mean) Subtract the area from 1.000
Divide the area by 2
Look up the quotient in the table
Use both the positive and negative z-scores
Using the table becomes proficient with practice, work lots of the normal probability problems!

Standard Normal Probabilities
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
The values in the table are the areas between zero and the z-score. That is, P(0<Z<z-score)

Faraz_1984 · #15 Tuesday, August 05, 2008

Central Limit Theorem
________________________________________
Sampling Distribution of the Sample Means
Instead of working with individual scores, statisticians often work with means. What happens is that several samples are taken, the mean is computed for each sample, and then the means are used as the data, rather than individual scores being used. The sample is a sampling distribution of the sample means.
When all of the possible sample means are computed, then the following properties are true:
• The mean of the sample means will be the mean of the population
• The variance of the sample means will be the variance of the population divided by the sample size.
• The standard deviation of the sample means (known as the standard error of the mean) will be smaller than the population mean and will be equal to the standard deviation of the population divided by the square root of the sample size.
• If the population has a normal distribution, then the sample means will have a normal distribution.
• If the population is not normally distributed, but the sample size is sufficiently large, then the sample means will have an approximately normal distribution. Some books define sufficiently large as at least 30 and others as at least 31.
The formula for a z-score when working with the sample means is:
Finite Population Correction Factor
If the sample size is more than 5% of the population size and the sampling is done without replacement, then a correction needs to be made to the standard error of the means.
In the following, N is the population size and n is the sample size. The adjustment is to multiply the standard error by the square root of the quotient of the difference between the population and sample sizes and one less than the population size.

Faraz_1984 · #16 Tuesday, August 05, 2008

Normal Approximation to Binomial
________________________________________

Recall that according to the Central Limit Theorem, the sample mean of any distribution will become approximately normal if the sample size is sufficiently large.
It turns out that the binomial distribution can be approximated using the normal distribution if np and nq are both at least 5. Furthermore, recall that the mean of a binomial distribution is np and the variance of the binomial distribution is npq.
Continuity Correction Factor
There is a problem with approximating the binomial with the normal. That problem arises because the binomial distribution is a discrete distribution while the normal distribution is a continuous distribution. The basic difference here is that with discrete values, we are talking about heights but no widths, and with the continuous distribution we are talking about both heights and widths.
The correction is to either add or subtract 0.5 of a unit from each discrete x-value. This fills in the gaps to make it continuous. This is very similar to expanding of limits to form boundaries that we did with group frequency distributions.
Examples
Discrete Continuous
x = 6 5.5 < x < 6.5
x > 6 x > 6.5
x >= 6 x > 5.5
x < 6 x < 5.5
x <= 6 x < 6.5
As you can see, whether or not the equal to is included makes a big difference in the discrete distribution and the way the conversion is performed. However, for a continuous distribution, equality makes no difference.
Steps to working a normal approximation to the binomial distribution
1. Identify success, the probability of success, the number of trials, and the desired number of successes. Since this is a binomial problem, these are the same things which were identified when working a binomial problem.
2. Convert the discrete x to a continuous x. Some people would argue that step 3 should be done before this step, but go ahead and convert the x before you forget about it and miss the problem.
3. Find the smaller of np or nq. If the smaller one is at least five, then the larger must also be, so the approximation will be considered good. When you find np, you're actually finding the mean, mu, so denote it as such.
4. Find the standard deviation, sigma = sqrt (npq). It might be easier to find the variance and just stick the square root in the final calculation - that way you don't have to work with all of the decimal places.
5. Compute the z-score using the standard formula for an individual score (not the one for a sample mean).
6. Calculate the probability desired.

Faraz_1984 · #17 Tuesday, August 05, 2008

Stats: Estimation
________________________________________
Definitions
Confidence Interval
An interval estimate with a specific level of confidence
Confidence Level
The percent of the time the true mean will lie in the interval estimate given.
Consistent Estimator
An estimator which gets closer to the value of the parameter as the sample size increases.
Degrees of Freedom
The number of data values which are allowed to vary once a statistic has been determined.
Estimator
A sample statistic which is used to estimate a population parameter. It must be unbiased, consistent, and relatively efficient.
Interval Estimate
A range of values used to estimate a parameter.
Maximum Error of the Estimate
The maximum difference between the point estimate and the actual parameter. The Maximum Error of the Estimate is 0.5 the width of the confidence interval for means and proportions.
Point Estimate
A single value used to estimate a parameter.
Relatively Efficient Estimator
The estimator for a parameter with the smallest variance.
T distribution
A distribution used when the population variance is unknown.
Unbiased Estimator
An estimator whose expected value is the mean of the parameter being estimated.

Stats: Introduction to Estimation
________________________________________
One area of concern in inferential statistics is the estimation of the population parameter from the sample statistic. It is important to realize the order here. The sample statistic is calculated from the sample data and the population parameter is inferred (or estimated) from this sample statistic. Let me say that again: Statistics are calculated, parameters are estimated.
We talked about problems of obtaining the value of the parameter earlier in the course when we talked about sampling techniques.
Another area of inferential statistics is sample size determination. That is, how large of a sample should be taken to make an accurate estimation. In these cases, the statistics can't be used since the sample hasn't been taken yet.
Point Estimates
There are two types of estimates we will find: Point Estimates and Interval Estimates. The point estimate is the single best value.
A good estimator must satisfy three conditions:
• Unbiased: The expected value of the estimator must be equal to the mean of the parameter
• Consistent: The value of the estimator approaches the value of the parameter as the sample size increases
• Relatively Efficient: The estimator has the smallest variance of all estimators which could be used
Confidence Intervals
The point estimate is going to be different from the population parameter because due to the sampling error, and there is no way to know who close it is to the actual parameter. For this reason, statisticians like to give an interval estimate which is a range of values used to estimate the parameter.
A confidence interval is an interval estimate with a specific level of confidence. A level of confidence is the probability that the interval estimate will contain the parameter. The level of confidence is 1 - alpha. 1-alpha area lies within the confidence interval.
Maximum Error of the Estimate
The maximum error of the estimate is denoted by E and is one-half the width of the confidence interval. The basic confidence interval for a symmetric distribution is set up to be the point estimate minus the maximum error of the estimate is less than the true population parameter which is less than the point estimate plus the maximum error of the estimate. This formula will work for means and proportions because they will use the Z or T distributions which are symmetric. Later, we will talk about variances, which don't use a symmetric distribution, and the formula will be different.
Area in Tails
Since the level of confidence is 1-alpha, the amount in the tails is alpha. There is a notation in statistics which means the score which has the specified area in the right tail.
Examples:
• Z(0.05) = 1.645 (the Z-score which has 0.05 to the right, and 0.4500 between 0 and it)
• Z(0.10) = 1.282 (the Z-score which has 0.10 to the right, and 0.4000 between 0 and it).
As a shorthand notation, the () are usually dropped, and the probability written as a subscript. The greek letter alpha is used represent the area in both tails for a confidence interval, and so alpha/2 will be the area in one tail.
Here are some common values
Confidence
Level Area between
0 and z-score Area in one
tail (alpha/2) z-score
50% 0.2500 0.2500 0.674
80% 0.4000 0.1000 1.282
90% 0.4500 0.0500 1.645
95% 0.4750 0.0250 1.960
98% 0.4900 0.0100 2.326
99% 0.4950 0.0050 2.576
Notice in the above table, that the area between 0 and the z-score is simply one-half of the confidence level. So, if there is a confidence level which isn't given above, all you need to do to find it is divide the confidence level by two, and then look up the area in the inside part of the Z-table and look up the z-score on the outside.
Also notice - if you look at the student's t distribution, the top row is a level of confidence, and the bottom row is the z-score. In fact, this is where I got the extra digit of accuracy from.

Stats: Estimating the Mean
________________________________________
You are estimating the population mean, mu, not the sample mean, x bar.
Population Standard Deviation Known
If the population standard deviation, sigma is known, then the mean has a normal (Z) distribution.

The maximum error of the estimate is given by the formula for E shown. The Z here is the z-score obtained from the normal table, or the bottom of the t-table as explained in the introduction to estimation. The z-score is a factor of the level of confidence, so you may get in the habit of writing it next to the level of confidence.
Once you have computed E, I suggest you save it to the memory on your calculator. On the TI-82, a good choice would be the letter E. The reason for this is that the limits for the confidence interval are now found by subtracting and adding the maximum error of the estimate from/to the sample mean.

Student's t Distribution
When the population standard deviation is unknown, the mean has a Student's t distribution. The Student's t distribution was created by William T. Gosset, an Irish brewery worker. The brewery wouldn't allow him to publish his work under his name, so he used the pseudonym "Student".
The Student's t distribution is very similar to the standard normal distribution.
• It is symmetric about its mean
• It has a mean of zero
• It has a standard deviation and variance greater than 1.
• There are actually many t distributions, one for each degree of freedom
• As the sample size increases, the t distribution approaches the normal distribution.
• It is bell shaped.
• The t-scores can be negative or positive, but the probabilities are always positive.
Degrees of Freedom
A degree of freedom occurs for every data value which is allowed to vary once a statistic has been fixed. For a single mean, there are n-1 degrees of freedom. This value will change depending on the statistic being used.
Population Standard Deviation Unknown
If the population standard deviation, sigma is unknown, then the mean has a student's t (t) distribution and the sample standard deviation is used instead of the population standard deviation.
The maximum error of the estimate is given by the formula for E shown. The t here is the t-score obtained from the Student's t table. The t-score is a factor of the level of confidence and the sample size.
Once you have computed E, I suggest you save it to the memory on your calculator. On the TI-82, a good choice would be the letter E. The reason for this is that the limits for the confidence interval are now found by subtracting and adding the maximum error of the estimate from/to the sample mean.

Notice the formula is the same as for a population mean when the population standard deviation is known. The only thing that has changed is the formula for the maximum error of the estimate.

Faraz_1984 · #18 Tuesday, August 05, 2008

Student's T Probabilities
________________________________________

Conf. Level 50% 80% 90% 95% 98% 99%
One Tail 0.250 0.100 0.050 0.025 0.010 0.005
Two Tail 0.500 0.200 0.100 0.050 0.020 0.010
df . . . . . .
1 1.000 3.078 6.314 12.706 31.821 63.657
2 0.816 1.886 2.920 4.303 6.965 9.925
3 0.765 1.638 2.353 3.182 4.541 5.841
4 0.741 1.533 2.132 2.776 3.747 4.604
5 0.727 1.476 2.015 2.571 3.365 4.032
6 0.718 1.440 1.943 2.447 3.143 3.707
7 0.711 1.415 1.895 2.365 2.998 3.499
8 0.706 1.397 1.860 2.306 2.896 3.355
9 0.703 1.383 1.833 2.262 2.821 3.250
10 0.700 1.372 1.812 2.228 2.764 3.169
11 0.697 1.363 1.796 2.201 2.718 3.106
12 0.695 1.356 1.782 2.179 2.681 3.055
13 0.694 1.350 1.771 2.160 2.650 3.012
14 0.692 1.345 1.761 2.145 2.624 2.977
15 0.691 1.341 1.753 2.131 2.602 2.947
16 0.690 1.337 1.746 2.120 2.583 2.921
17 0.689 1.333 1.740 2.110 2.567 2.898
18 0.688 1.330 1.734 2.101 2.552 2.878
19 0.688 1.328 1.729 2.093 2.539 2.861
20 0.687 1.325 1.725 2.086 2.528 2.845
21 0.686 1.323 1.721 2.080 2.518 2.831
22 0.686 1.321 1.717 2.074 2.508 2.819
23 0.685 1.319 1.714 2.069 2.500 2.807
24 0.685 1.318 1.711 2.064 2.492 2.797
25 0.684 1.316 1.708 2.060 2.485 2.787
26 0.684 1.315 1.706 2.056 2.479 2.779
27 0.684 1.314 1.703 2.052 2.473 2.771
28 0.683 1.313 1.701 2.048 2.467 2.763
29 0.683 1.311 1.699 2.045 2.462 2.756
30 0.683 1.310 1.697 2.042 2.457 2.750
40 0.681 1.303 1.684 2.021 2.423 2.704
50 0.679 1.299 1.676 2.009 2.403 2.678
60 0.679 1.296 1.671 2.000 2.390 2.660
70 0.678 1.294 1.667 1.994 2.381 2.648
80 0.678 1.292 1.664 1.990 2.374 2.639
90 0.677 1.291 1.662 1.987 2.368 2.632
100 0.677 1.290 1.660 1.984 2.364 2.626
z 0.674 1.282 1.645 1.960 2.326 2.576
The values in the table are the areas critical values for the given areas in the right tail or in both tails.

Faraz_1984 · #19 Tuesday, August 05, 2008

Estimating the Proportion
________________________________________
You are estimating the population proportion, p.
All estimation done here is based on the fact that the normal can be used to approximate the binomial distribution when np and nq are both at least 5. Thus, the p that were talking about is the probability of success on a single trial from the binomial experiments.
Recall:
The best point estimate for p is p hat, the sample proportion:
If the formula for z is divided by n in both the numerator and the denominator, then the formula for z becomes:
Solving this for p to come up with a confidence interval, gives the maximum error of the estimate as: .
This is not, however, the formula that we will use. The problem with estimation is that you don't know the value of the parameter (in this case p), so you can't use it to estimate itself - if you knew it, then there would be no problem to work out. So we will replace the parameter by the statistic in the formula for the maximum error of the estimate.

The maximum error of the estimate is given by the formula for E shown. The Z here is the z-score obtained from the normal table, or the bottom of the t-table as explained in the

introduction to estimation. The z-score is a factor of the level of confidence, so you may get in the habit of writing it next to the level of confidence.
When you're computing E, I suggest that you find the sample proportion, p hat, and save it to P on the calculator. This way, you can find q as (1-p). Do NOT round the value for p hat and use the rounded value in the calculations. This will lead to error. Once you have computed E, I suggest you save it to the memory on your calculator. On the TI-82, a good choice would be the letter E. The reason for this is that the limits for the confidence interval are now found by subtracting and adding the maximum error of the estimate from/to the sample proportion.

Faraz_1984 · #20 Tuesday, August 05, 2008

Stats: Sample Size Determination
________________________________________
The sample size determination formulas come from the formulas for the maximum error of the estimates. The formula is solved for n. Be sure to round the answer obtained up to the next whole number, not off to the nearest whole number. If you round off, then you will exceed your maximum error of the estimate in some cases. By rounding up, you will have a smaller maximum error of the estimate than allowed, but this is better than having a larger one than desired.
Population Mean
Here is the formula for the sample size which is obtained by solving the maximum error of the estimate formula for the population mean for n.

Population Proportion
Here is the formula for the sample size which is obtained by solving the maximum error of the estimate formula for the population proportion for n. Some texts use p hat and q hat, but since the sample hasn't been taken, there is no value for the sample proportion. p and q are taken from a previous study, if one is available. If there is no previous study or estimate available, then use 0.5 for p and q, as these are the values which will give the largest sample size, and it is better to have too large of a sample size and come under the maximum error of the estimate than to have too small of a sample size and exceed the maximum error of the estimate.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Very Important : How to Prepare Study Notes	Shaa-Baaz	Tips and Experience Sharing	5	Sunday, May 21, 2017 08:30 PM
Regarding Notes	Anonymous84	Tips and Experience Sharing	1	Wednesday, August 15, 2007 06:56 PM

The Following User Says Thank You to Faraz_1984 For This Useful Post:
Bilal Salim (Wednesday, February 02, 2011)