Psychology: Tests and Measurements

By | April 26, 2017

Many fields of psychology use tests and measurement devices. The best-known psychological tool is intelligence testing. Since the early 1900s psychologists have been measuring intelligence—or, more accurately, the ability to succeed in schoolwork. Such tests have proved useful in classifying students, assigning people to training programs, and predicting success in many kinds of schooling. Special tests have been developed to predict success in different occupations and to assess how much knowledge people have about different kinds of specialties. In addition, psychologists have constructed tests for measuring aspects of personality, interests, and attitudes. Thousands of tests have been devised for measuring different human traits.

A key problem in test construction, however, is the development of a criterion—that is, some standard to which the test is to be related. For intelligence tests, for example, the usual criterion has been success in school, but intelligence tests have frequently been attacked on the basis of cultural bias (that is, the test results may reflect a child’s background as much as it does learning ability). For vocational-interest tests, the standard generally has been persistence in an occupation. One general difficulty with personality tests is the lack of agreement among psychologists as to what standards should be used. Many criteria have been proposed, but most are only indirectly related to the aspect of personality that is being measured.

Very sophisticated statistical models have been developed for tests, and a detailed technology underlies most successful testing. Many psychologists have become adept at constructing testing devices for special purposes and at devising measurements, once agreement is reached as to what should be measured.

Types of Tests

Currently, a wide range of testing procedures is used in the U.S. and elsewhere. Each type of procedure is designed to carry out specific functions.

Achievement Tests . These tests are designed to assess current performance in an academic area. Because achievement is viewed as an indicator of previous learning, it is often used to predict future academic success. An achievement test administered in a public school setting would typically include separate measures of vocabulary, language skills and reading comprehension, arithmetic computation and problem solving, science, and social studies. Individual achievement is determined by comparison of results with average scores derived from large representative national or local samples. Scores may be expressed in terms of “grade-level equivalents”; for example, an advanced third-grade pupil may be reading on a level equivalent to that of the average fourth-grade student.

Aptitude Tests. These tests predict future performance in an area in which the individual is not currently trained. Schools, businesses, and government agencies often use aptitude tests when assigning individuals to specific positions. Vocational guidance counseling may involve aptitude testing to help clarify individual career goals. If a person’s score is similar to scores of others already working in a given occupation, likelihood of success in that field is predicted. Some aptitude tests cover a broad range of skills pertinent to many different occupations. The General Aptitude Test Battery, for example, not only measures general reasoning ability but also includes form perception, clerical perception, motor coordination, and finger and manual dexterity. Other tests may focus on a single area, such as art, engineering, or modern languages.

Intelligence Tests. In contrast to tests of specific proficiencies or aptitudes, intelligence tests measure the global capacity of an individual to cope with the environment. Test scores are generally known as intelligence quotients, or IQs, although the various tests are constructed quite differently. The Stanford-Binet is heavily weighted with items involving verbal abilities; the Wechsler scales consist of two separate verbal and performance subscales, each with its own IQ. There are also specialized infant intelligence tests, tests that do not require the use of language, and tests that are designed for group administration.

The early intelligence scales yielded a mental-age score, expressing the child’s ability to do as well as average children who were older, younger, or equivalent in chronological age. The deviation IQ used today expresses the individual’s position in comparison to a representative group of people of the same age. The average IQ is set at 100; about half of those who take the test achieve scores between 90 and 110. IQ scores may vary according to testing conditions, and, thus, it is advisable to understand results of the tests as falling within a certain range, such as average or superior.

Interest Inventories. Self-report questionnaires on which the subject indicates personal preferences among activities are called interest inventories. Because interests may predict satisfaction with some area of employment or education, these inventories are used primarily in guidance counseling. They are not intended to predict success, but only to offer a framework for narrowing career possibilities. For example, one frequently used interest inventory, the Kudor Preference Record, includes ten clusters of occupational interests: outdoors, mechanical, computational, scientific, persuasive, artistic, literary, musical, social service, and clerical. For each item, the subject indicates which of three activities is best or least liked. The total score indicates the occupational clusters that include preferred activities.

Objective Personality Tests. These tests measure social and emotional adjustment and are used to identify the need for psychological counseling. Items that briefly describe feelings, attitudes, and behaviors are grouped into subscales, each representing a separate personality or style, such as social extroversion or depression. Taken together, the subscales provide a profile of the personality as a whole. One of the most popular psychological tests is the Minnesota Multiphasic Personality Inventory (MMPI), constructed to aid in diagnosing psychiatric patients. Research has shown that the MMPI may also be used to describe differences among normal personality types.

Projective Techniques. Some personality tests are based on the phenomenon of projection, a mental process described by Sigmund Freud as the tendency to attribute to others personal feelings or characteristics that are too painful to acknowledge. Because projective techniques are relatively unstructured and offer minimal cues to aid in defining responses, they tend to elicit concerns that are highly personal and significant. The best-known projective tests are the Rorschach test, popularly known as the inkblot test, and the Thematic Apperception Test; others include word-association techniques, sentence-completion tests, and various drawing procedures. The psychologist’s past experience provides the framework for evaluating individual responses. Although the subjective nature of interpretation makes these tests particularly vulnerable to criticism, in clinical settings they are part of the standard battery of psychological tests.

Interpretation of Results

The most important aspect of psychological testing involves the interpretation of test results.

Scoring. The raw score is the simple numerical count of responses, such as the number of correct answers on an intelligence test. The usefulness of the raw score is limited, however, because it does not convey how well someone does in comparison with others taking the same test. Percentile scores, standard scores, and norms are all devices for making this comparison.

Percentile scoring expresses the rank order of the scores in percentages. The percentile level of a person’s score indicates the proportion of the group that scored above and below that individual. When a score falls at the 50th percentile, for example, half of the group scored higher and half scored lower; a score at the 80th percentile indicates that 20 percent scored higher and 80 percent scored lower than the person being evaluated.

Standard scores are derived from a comparison of the individual raw score with the mean and standard deviation of the group scores. The mean, or arithmetic average, is determined by adding the scores and dividing by the total number of scores obtained. The standard deviation measures the variation of the scores around the mean. Standard scores are obtained by subtracting the mean from the raw score and then dividing by the standard deviation.

Tables of norms are included in test manuals to indicate the expected range of raw scores. Normative data are derived from studies in which the test has been administered to a large, representative group of people. The test manual should include a description of the sample of people used to establish norms, including age, sex, geographical location, and occupation. Norms based on a group of people whose major characteristics are markedly dissimilar from those of the person being tested do not provide a fair standard of comparison.

Validity. Interpretation of test scores ultimately involves predictions about a subject’s behavior in a specified situation. If a test is an accurate predictor, it is said to have good validity. Before validity can be demonstrated, a test must first yield consistent, reliable measurements. In addition to reliability, psychologists recognize three main types of validity.

A test has content validity if the sample of items in the test is representative of all the relevant items that might have been used. Words included in a spelling test, for example, should cover a wide range of difficulty.

Criterion-related validity refers to a test’s accuracy in specifying a future or concurrent outcome. For example, an art-aptitude test has predictive validity if high scores are achieved by those who later do well in art school. The concurrent validity of a new intelligence test may be demonstrated if its scores correlate closely with those of an already well-established test.

Construct validity is generally determined by investigating what psychological traits or qualities a test measures; that is, by demonstrating that certain patterns of human behavior account to some degree for performance on the test. A test measuring the trait “need for achievement,” for instance, might be shown to predict that high scorers work more independently, persist longer on problem-solving tasks, and do better in competitive situations than low scores.

Controversies. The major psychological testing controversies stem from two interrelated issues: technical shortcomings in test design and ethical problems in interpretation and application of results. Some technical weaknesses exist in all tests. Because of this, it is crucial that results be viewed as only one kind of information about any individual. Most criticisms of testing arise from the overvaluation of and inappropriate reliance on test results in making major life decisions. These criticisms have been particularly relevant in the case of intelligence testing. Psychologists generally agree that using tests to bar youngsters from educational opportunities, without careful consideration of past and present resources or motivation, is unethical. Because tests tend to draw on those skills associated with white, middle-class functioning, they may discriminate against disadvantaged and minority groups. As long as unequal learning opportunities exist, they will continue to be reflected in test results. In the U.S., therefore, some states have established laws that carefully define the use of tests in public schools and agencies. The American Psychological Association, meanwhile, continues to work actively to monitor and refine ethical standards and public policy recommendations regarding the use of psychological testing.

You might also like:

  • Transactional analysis

  • Analytical psychology
  • Cognitive psychology