CSS Forums - View Single Post

obaidkhan · #3 Tuesday, April 22, 2008

Dear All

this is the second post about statistics......

FREQUENCY POLYGON:
A frequency polygon is obtained by plotting the class frequencies against the mid-points of the classes, and connecting the points so obtained by straight line segments.
In our example of the EPA mileage ratings, the classes were:

And our frequency polygon came out to be:

Also, it was mentioned that, when the frequency polygon is smoothed, we obtain what may be called the FREQUENCY CURVE.
In our example:

In the above figure, the dotted line represents the frequency curve. It should be noted that it is not necessary that our frequency curve must touch all the points. The purpose of the frequency curve is simply to display the overall pattern of the distribution. Hence we draw the curve by the free-hand method, and hence it does not have to touch all the plotted points. It should be realized that the frequency curve is actually a theoretical concept.
If the class interval of a histogram is made very small, and the number of classes is very large, the rectangles of the histogram will be narrow as shown below:

The smaller the class interval and the larger the number of classes, the narrower the rectangles will be. In this way, the histogram approaches a smooth curve as shown below:

In spite of the fact that the frequency curve is a theoretical concept, it is useful in analyzing real-world problems. The reason is that very close approximations to theoretical curves are often generated in the real world so close that it is quite valid to utilize the properties of various types of mathematical curves in order to aid analysis of the real-world problem at hand.

VARIOUS TYPES OF FREQUENCY CURVES:
• the symmetrical frequency curve
• the moderately skewed frequency curve
• the extremely skewed frequency curve
• the U-shaped frequency curve
Let us discuss them one by one.

First of all, the symmetrical frequency curve is of the following shape:

If we place a vertical mirror in the centre of this graph, the left hand side will be the mirror image of the right hand side.

Next, we consider the moderately skewed frequency curve. We have the positively skewed curve and the negatively skewed curve. The positively skewed curve is that one whose right tail is longer than its left tail, as shown below

On the other hand, the negatively skewed frequency curve is the one for which the left tail is longer than the right tail.

Both of these that we have just consider are moderately positively and negatively skewed.
Sometimes, we have the extreme case when we obtain the EXTREMELY skewed frequency curve. An extremely negatively skewed curve is of the type shown below:

This is the case when the maximum frequency occurs at the end of the frequency table.
For example, if we think of the death rates of adult males of various age groups starting from age 20 and going up to age 79 years, we might obtain something like this:

This will result in a J-shaped distribution similar to the one shown above.
Similarly, the extremely positively skewed distribution is known as the REVERSE J-shaped distribution.

A relatively LESS frequently encountered frequency distribution is the U-shaped distribution.

If we consider the example of the death rates not for only the adult population but for the population of ALL the age groups, we will obtain the U-shaped distribution.
Out of all these curves, the MOST frequently encountered frequency distribution is the moderately skewed frequency distribution. There are thousands of natural and social phenomena which yield the moderately skewed frequency distribution. Suppose that we walk into a school and collect data of the weights, heights, marks, shoulder-lengths, finger-lengths or any other such variable pertaining to the children of any one class.
If we construct a frequency distribution of this data, and draw its histogram and its frequency curve, we will find that our data will generate a moderately skewed distribution. Until now, we have discussed the various possible shapes of the frequency distribution of a continuous variable.
Similar shapes are possible for the frequency distribution of a discrete variable.

Let us now consider another aspect of the frequency distribution i.e.
CUMULATIVE FREQUENCY DISTRIBUTION.
As in the case of the frequency distribution of a discrete variable, if we start adding the frequencies of our frequency table column-wise, we obtain the column of cumulative frequencies.
In our example, we obtain the cumulative frequencies shown below:

In the above table, 2+4 gives 6, 6+14 gives 20, and so on.
The question arises: “What is the purpose of making this column?”
You will recall that, when we were discussing the frequency distribution of a discrete variable, any particular cumulative frequency meant that we were counting the number of observations starting from the very first value of X and going up to THAT particular value of X against which that particular cumulative frequency was falling.
In case of a the distribution of a continuous variable, each of these cumulative frequencies represents the total frequency of a frequency distribution from the lower class boundary of the lowest class to the UPPER class boundary of THAT class whose cumulative frequency we are considering.
In the above table, the total number of cars showing mileage less than 35.95 miles per gallon is 6, the total number of car showing mileage less than 41.95 miles per gallon is 28, etc.

Such a cumulative frequency distribution is called a “less than” type of a cumulative frequency distribution. The graph of a cumulative frequency distribution is called a
CUMULATIVE FREQUENCY POLYGON or OGIVE.
A “less than” type ogive is obtained by marking off the upper class boundaries of the various classes along the X-axis and the cumulative frequencies along the y-axis, as shown below:

The cumulative frequencies are plotted on the graph paper against the upper class boundaries, and the points so obtained are joined by means of straight line segments.
Hence we obtain the cumulative frequency polygon shown below:

It should be noted that this graph is touching the X-Axis on the left-hand side. This is achieved by ADDING a class having zero frequency in the beginning of our frequency distribution, as shown below:

Since the frequency of the first class is zero, hence the cumulative frequency of the first class will also be zero, and hence, automatically, the cumulative frequency polygon will touch the X-Axis from the left hand side. If we want our cumulative frequency polygon to be closed from the right-hand side also , we can achieve this by connecting the last point on our graph paper with the X-axis by means of a vertical line, as shown below:

In the example of EPA mileage ratings, all the data-values were correct to one decimal place.
Let us now consider another example:

Source: “Pizza,” Copyright 1997 by Consumers Union of United States, Inc., Yonkers, N.Y. 10703.

In order to construct the frequency distribution of the above data, the first thing to note is that, in this example, all our data values are correct to two decimal places. As such, we should construct the class limits correct to TWO decimal places, and the class boundaries correct to three decimal places.
As in the last example, first of all, let us find the maximum and the minimum values in our data, and compute the RANGE.
Minimum value X0 = 0.52
Maximum value Xm = 1.90
Hence:
Range = 1.90 - 0.52
= 1.38

Lower limit of the first class = 0.51
Hence, our successive class limits come out to be:

Stretching the class limits to the left and to the right, we obtain class boundaries as shown below:

By tallying the data-values in the appropriate classes, we will obtain a frequency distribution similar to the one that we obtained in the examples of the EPA mileage ratings.
By constructing the histogram of this data-set, we will be able to decide whether our distribution is symmetric, positively skewed or negatively skewed. This may please be attempted as an exercise.

Keep Praying..............

Regards.

Obaid Khan