PSYCH-105 Industrial Psychology
Chapter 16: Psychological Tests
Essentials of Good Psychological Tests
Test reliability and validity are two technical properties of a test that indicate the quality and usefulness of the test. These are the two most important features of a test. You should examine these features when evaluating the suitability of the test for your use.
Reliability is the extent to which a test is repeatable and yields consistent scores. A test is considered reliable if we get the same result repeatedly. The main characteristics of an objective or reliable test are as follows:
- There will be no difference in marks that a candidate receives if different examiners were used to score the papers.
- There will not be much difference in the marks obtained by the candidates if they are retested with the same or similar test.
- The purpose of the test is clearly defined so that another person working independently would arrive at the same conclusion with respect to the candidates.
A considerable number of factors can cause the test to have low reliability.
- If the test is not administered under standardized conditions, the reliability will tend to be low.
- People vary from time to time in their emotional state degree of attention, attitude, health, fatigue and so on.
Reliability is usually measured by means of the correlation coefficient.
Types of Reliability
1. Internal Consistency
Degree to which responses (e.g., items on a test) are correlated with one another. Internal consistency reliability indicates the extent to which items on a test measure the same thing. Test items that measure the same construct are compared to determine the tests internal consistency. The greater the number of similar items the greater is the internal consistency.
2. Test-Retest Reliability
The Test-retest method involves administering the test twice to the main group of people and correlating the two set of scores. Test-retest reliability indicates the repeatability of test scores with the passage of time. Correlation ranges between 0 (low reliability) and 1 (high reliability)
3. Parallel Form Reliability
The equivalent forms method for determining reliability also uses a test retest approach but instead of using the same test a second time, a similar form of the test is administered. Alternate or parallel form reliability indicates how consistent test scores are likely to be if a person takes two or more forms of a test. The two tests should then be administered to the same subject at the same time.
4. Inter-Rater Reliability
Degree of agreement in the ratings provided by two observers of the same behaviour. Inter-rater reliability indicates how consistent test scores are likely to be if the test is scored by two or more raters. On some tests, raters evaluate responses to questions and determine the score. Inter-rater reliability is useful because human observers will not necessarily interpret answers the same way; raters may disagree as to how well certain responses or material demonstrate knowledge of the construct or skill being assessed.
5. Split-Half Reliability
It involves splitting the items of one test in half and comparing the scores of the two halves. These half tests are selected so as to be as equivalent as possible although often the test is simply divided into two halves by putting all the odd numbered items into one half and all the even numbered items into the other half. This is called the odd even version of split half technique. Since this method involves only one test administration, there is no problem with item recall and practice effects.
In an employment situation, a valid test is one that accurately predicts the criterion of job success. Validity is the extent to which a test measures what it is supposed to measure. Validity refers to what characteristic the test measures and how well the test measures that characteristic. Validity is a subjective judgement made on the basis of experience and empirical indicators. The validity of a test is expressed in a coefficient of correlation, in which test score is correlated with some performance criterion. Validity gives meaning to the test scores. Validity evidence indicates that there is linkage between test performance and job performance. It can tell you what you may conclude or predict about someone from his or her score on the test. If a test has been demonstrated to be a valid predictor of performance on a specific job, you can conclude that persons scoring high on the test are more likely to perform well on the job than persons who score low on the test, all else being equal. Validity also describes the degree to which you can make specific conclusions or predictions about people based on their test scores. In other words, it indicates the usefulness of the test. A number of statistical procedures are available for determining the correlation.
Types of Reliability
1. Predictive Validity
It is also called as follow up validity. A measure of how well a test predicts abilities. It involves testing a group of subjects for a certain construct and then comparing them with results obtained at some point in the future. It is the most important type of validity for personal selection. This measures the extent to which a future level of a variable can be predicted from a current measurement. It involves using a selection test during the selection process and then identifying the successful candidates.
2. Concurrent Validity
Defines how well a test or experiment measures up to its claims. A test designed to measure depression must only measure that particular construct, not closely related ideals such as anxiety or stress. This involves determining the factors that are characteristics of successful employees and then using these factors as yardsticks. This is the degree to which a test corresponds to an external criterion that is known concurrently (i.e. occurring at the same time). If the new test is validated by a comparison with a currently existing criterion, we have concurrent validity.
3. Content Validity
The estimate of how much a measure represents every single element of a construct. Test is set to have reasonable content validity if the test includes all those task/ questions which are representative of possible job assignment. Content validity is a logical process where connections between the test items and the job-related tasks are established. Content validity is typically estimated by gathering a group of subject matter experts (SMEs) together to review the test items.
4. Construct Validity
Construct validity is the most important kind of validity. Establishing construct validity is a long and complex process. A test has Construct validity if it demonstrates an association between the test scores and prediction of theoretical traits.
5. Synthetic Validity
A basic assumption of synthetic validation is that different jobs involving the same kind of behaviour should also require the same knowledge, skills, abilities and other characteristics. Synthetic validity subsequently assumes that if a test is valid for a particular job element, then it will be valid for use with any job involving the same element.
6. Face Validity
A measure of how representative a research project is ‘at face value,’ and whether it appears to be a good project. This is the least sophisticated measure of validity. Face validity is simply whether the test appears (at face value) to measure what it claims to.
Standardization refers to the consistency or uniformity of the conditions and procedures for administering a psychological test. If we expect to compare the performance of several job applicants on the same test, then they must all take that test under identical circumstances. This means that every student or job applicant taking the test reads or listens to the same set of instructions, is allowed the same amount of time in which to respond and is situated in a similar physical environment.
A psychological test can be described as a standardized measure. Anastasi (1982) notes that standardization implies uniformity of procedure in administering the test. Consistency in the conditions and procedures for administering the test attempt to ensure that every test-taker is given the “same” test. With standardization, it is possible to compare the performance of a number of test-takers on the same test, since they have taken the test under near identical circumstances. Every detail of the testing situation including instructions, time limits, materials used and the test environment needs to be kept consistent.
The concept of objectivity is related to that of standardization. While standardization refers to uniformity in the test procedure and administration, objectivity refers to consistency in test interpretation and scoring. Thus, an objective test is free from subjective judgment or bias. In order for a test to be considered objective, any person scoring the test should obtain the same results as another person scoring the same test, since the scorer has no subjective input (e.g., bias) into the interpretation or scoring of test results.