View Single Post
  #8  
Old Friday, July 12, 2013
bluesky's Avatar
bluesky bluesky is offline
Member
 
Join Date: Jan 2012
Location: France
Posts: 48
Thanks: 28
Thanked 25 Times in 15 Posts
bluesky is on a distinguished road
Default

Characteristics of a good test
There are a number of characteristics which are desirable for all tests. Standardized tests, which are developed by experts in both subject matter and evaluation, are more likely to live up to these standards. These characteristics, which are discussed in a more detailed manner, include;

a) Test objectivity
b) Discrimination
c) Comprehensiveness
d) Validity
e) Reliability
f) Specification of conditions of administering
g) Direction of scoring and interpretation

Test Objectivity

Test objectivity means that an individual’s score is the same, or essentially the same, regardless of who is doing the scoring. A test is objective when instructor opinion, bias, or individual judgment is not a major factor in scoring. Tests may be scored by more than one person, at different or the same time. In education, we wonder to what extent is the scoring of these individual scorers is the same. If the possible difference between the people scoring the same test high, that test is low in objectivity. Though individual’s may naturally be different in the way they perceive information, we assume that the more objective a test is, the more aspires to the high quality evaluation we all envision in education. This does not mean that tests which do not have high degree of objectivity (such as subjec- tive tests) are not of quality. Even subjective tests, though they are designed to measure information that can be looked at from different angles, certain level of objectivity is necessary. The individual designing must have something in mind that constitutes good performance on such test. That thing can be multiple, but criteria must be developed to make sure any scoring is fair enough to contribute to discriminating of students on such basis. Objectivity is a relative term.

Discrimination

The test should be constructed in such a manner that it will detect or measure small differences in achievement or attainment. This is essential if the test is to be used for ranking students on the basis of individual achievement or for assi- gning grades. It is not an important consideration if the test is used to measure the level of the entire class or as an instructional quiz where the primary purpose is instruction rather than measurement. As is true with validity, reliability, and objectivity, the discriminating power of a test is increased by concentrating on and improving each individual test item. After the test has been administered, an item analysis can be made that will show the relative difficulty of each item and the extent to which each discriminates between good and poor students. Often, as in obtaining reliability, it is necessary to increase the length of the test to get clear-cut discrimination. A discriminating test: (1) Produces a wide range of scores when administered to the students who have significantly different achievements. (2) Will include items at all levels of difficulty. Some items will be answered correctly only by the best students; others will be relatively easy and will be answered correctly by most students. If all students answer an item correctly, it lacks discrimination.

Comprehensiveness

For a test to be comprehensive, it should sample major lesson objectives. It is neither necessary nor practical to test every objective that is taught in a course, but a sufficient number of objectives should be included to provide a valid measure of student achievement in the complete course.

Validity

The most important characteristic of a good examination is validity; that is, the extent to which a test measures what it is intended to measure. It is vital for a test to be valid in order for the results to be accurately applied and interpreted. Validity isn’t determined by a single statistic, but by a body of research that de- monstrates the relationship between the test and the behaviour it is intended to measure. There are three types of validity:

a) Content validity When a test has content validity, the items on the test represent the en- tire range of possible items the test should cover. Individual test questions may be drawn from a large pool of items that cover a broad range of topics.
In some instances where a test measures a trait that is difficult to define, an expert judge may rate each item’s relevance. Because each judge is basing their rating on opinion, two independent judges rate the test separately. Items that are rated as strongly relevant by both judges will be included in the final test.

b) Criterion-related Validity A test is said to have criterion-related validity when the test is demonstrated to be effective in predicting criterion or indicators of a construct. There are two different types of criterion validity:

• Concurrent Validity occurs when the criterion measures are obtained at the same time as the test scores. This indicates the extent to which the test scores accurately estimate an individual’s current state with regards to the criterion. For example, on a test that measures levels of depres- sion, the test would be said to have concurrent validity if it measured the current levels of depression experienced by the test taker.

• Predictive Validity occurs when the criterion measures are obtained at a time after the test. Examples of test with predictive validity are career or aptitude tests, which are helpful in determining who is likely to succeed or fail in certain subjects or occupations.

c) Construct Validity A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. Intelligence tests are one example of measurement instruments that should have construct validity. The instructor can ensure whether his/her test items are valid by following accepted test construction procedures that include:

(1) Use of the lesson objectives as a basis for the test requirements. An exa- mination so constructed will tend to measure what has been taught.

(2) Review of the test items and the completed examination by other ins- tructors.

(3) Selection of the most appropriate form of test and type of test item. Thus, if the instructor desires to measure “ability to do,” he must select that form of the test that will require the student to demonstrate his “ability to do.” If another less desirable form is used, it must be recogni- zed that the validity of the measurement has been reduced.

(4) Presentation of test requirements in a clear and unambiguous manner. If the test material cannot be interpreted accurately by the student, he or she will not realize what is being covered; hence, he or she will be unable to respond as anticipated. Such a test cannot be valid.

(5) Elimination, so far as is possible, of those factors that are not related to the measurement of the teaching pints. A test that is not within the capabilities of the students as to time or educational level may fail to measure their actual learning in the course.

Reliability

Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result repeatedly. For example, if a test is designed to measure a trait (such as introversion), then each time the test is administered to a subject, the results should be approximately the same. Unfortunately, it is impossible to cal- culate reliability exactly, but there several different ways to estimate reliability.

a) Test-Retest Reliability To gauge test-retest reliability, the test is administered twice at two different points in time. This kind of reliability is used to assess the consistency of a test across time. This type of reliability assumes that there will be no change in the quality or construct being measured. Test-retest reliability is best used for things that are stable over time, such as intelligence. Generally, reliability will be higher when little time has passed between tests.

b) Inter-ratter Reliability
This type of reliability is assessed by having two or more independent judges score the test. The scores are then compared to determine the consistency of the ratters estimates. One way to test inter-ratter reliability is to have each ratter assign each test item a score. For example, each ratter might score items on a scale from 1 to 10. Next, you would calculate the correlation between the two rating to determine the level of inter-ratter reliability. Another means of testing inter-ratter reliability is to have ratters determine which category each observations falls into and then calculate the percentage of agreement between the ratters. So, if the ratters agree 8 out of 10 times, the test has an 80% inter-ratter reliability rate.

c) Parallel-Forms Reliability Parallel-forms reliability is gauged by comparing to different tests that were created using the same content. This is accomplished by creating a large pool of test items that measure the same quality and then randomly dividing the items into two separate tests. The two tests should then be administered to the same subjects at the same time.

d) Internal Consistency Reliability This form of reliability is used to judge the consistency of results across items on the same test. Essentially, you are comparing test items that measure the same construct to determine the tests internal consistency. When you see a question that seems very similar to another test question, it may indicate that the two ques- tions are being used to gauge reliability. Because the two questions are similar and designed to measure the same thing, the test taker should answer both questions the same, which would indicate that the test has internal consistency.

The following factors will influence the reliability of a test:

(1) Administration. It is essential that each student have the same time, equipment, instructions, assistance, and examination environment. Test directions should be strictly enforced.
(2) Scoring. Objectivity in scoring contributes to reliability. Every effort should be made to obtain uniformity of scoring standards and practices.
(3) Standards. The standards of performance that are established for one class should be consistent with those used in other classes. A change in grading policies not based upon facts, uniform standards, and expe- rience factors gained from other classes will affect the reliability of test results.
(4) Instruction. The reliability of tests results will be affected if the instruc- tion presented to a class tends to overemphasize the teaching points included in the examination. This is often known as “teaching the test” and is undesirable. When the instructor gives students obvious clues as to the test requirements, he not only affects the reliability of the test, but he insults the intelligence of his class. (5) Length. The more responses required of students, the more reliable will be the test or measuring device.


Specification of conditions of administration


A good test must specify the conditions under which the test must be conducted. The conditions must give all students a fair chance to show what their performance. This will improve the reliability of the test. Standardized must come along with specification on the conditions in which students must perform. Administering the test with highly varied conditions will highly interfere with the results of the test. In general, when administering a test, the following must be kept in mind:
• Physical conditions o Light, ventilation, quiet, etc.
• Psychological conditions o Avoid inducing test anxiety o Try to reduce test anxiety o Don’t give test when other events will distract
• Suggestions o Don’t talk unnecessarily before the test o Minimize interruptions o Don’t give hints to individuals who ask about items o Discourage cheating o Give students equal time to take the test.

Direction for scoring and interpreting test results

Good test must come with direction on how to score and interpret the results of a test. This is specially important for standardized tests, which are used by indi- viduals other than those who developed the test in the first place. Such directions are also important for locally-developed tests, since other individuals may get involved in the process of scoring and interpreting the test, due to unforeseen circumstances.
The following guidelines contribute to the development of clear direction for tests.
1. Provide clear descriptions of detailed procedures for administering tests in a standardized manner.
2. Provide information to test takers or test users on test question formats and procedures for answering test questions, including information on the use of any needed materials and equipment.
3. Establish and implement procedures to ensure the security of testing materials during all phases of test development, administration, scoring, and reporting.
4. Provide procedures, materials and guidelines for scoring the tests, and for monitoring the accuracy of the scoring process. If scoring the test is the responsibility of the test developer, provide adequate training for scorers.
5. Correct errors that affect the interpretation of the scores and communi- cate the corrected results promptly.
6. Develop and implement procedures for ensuring the confidentiality of scores.
Reply With Quote