6.4 Measurement Quality
- Define reliability.
- Define validity.
Once we’ve managed to define our terms and specify the operations for measuring them, how do we know that our measures are any good? Without some assurance of the quality of our measures, we cannot be certain that our findings have any meaning or, at the least, that our findings mean what we think they mean. When social scientists measure concepts, they aim to achieve reliabilityExists when the same measure, applied consistently to the same person, yields the same result each time. and validityExists when there is a shared understanding of the meaning of whatever concept is being measured. in their measures. These two aspects of measurement quality are the focus of this section. We’ll consider reliability first and then take a look at validity. For both aspects of measurement quality, let’s say our interest is in measuring the concepts of alcoholism and alcohol intake. What are some potential problems that could arise when attempting to measure this concept, and how might we work to overcome those problems?
First, let’s say we’ve decided to measure alcoholism by asking people to respond to the following question: Have you ever had a problem with alcohol? If we measure alcoholism in this way, it seems likely that anyone who identifies as an alcoholic would respond with a yes to the question. So this must be a good way to identify our group of interest, right? Well, maybe. Think about how you or others you know would respond to this question. Would responses differ after a wild night out from what they would have been the day before? Might a teetotaler’s current headache from the single glass of wine he had last night influence how he answers the question this morning? How would that same person respond to the question before consuming the wine? In each of these cases, if the same person would respond differently to the same question at different points, it is possible that our measure of alcoholism has a reliability problem. Reliability in measurement is about consistency. If a measure is reliable, it means that if the same measure is applied consistently to the same person, the result will be the same each time.
One common problem of reliability with social scientific measures is memory. If we ask research participants to recall some aspect of their own past behavior, we should try to make the recollection process as simple and straightforward for them as possible. Sticking with the topic of alcohol intake, if we ask respondents how much wine, beer, and liquor they’ve consumed each day over the course of the past 3 months, how likely are we to get accurate responses? Unless a person keeps a journal documenting their intake, there will very likely be some inaccuracies in their responses. If, on the other hand, we ask a person how many drinks of any kind he or she has consumed in the past week, we might get a more accurate set of responses.
Reliability is like a scale: the data you collect is only as dependable as the instrument doing the measuring.
Reliability can be an issue even when we’re not reliant on others to accurately report their behaviors. Perhaps a field researcher is interested in observing how alcohol intake influences interactions in public locations. She may decide to conduct observations at a local pub, noting how many drinks patrons consume and how their behavior changes as their intake changes. But what if the researcher has to use the restroom and misses the three shots of tequila that the person next to her downs during the brief period she is away? The reliability of this researcher’s measure of alcohol intake, counting numbers of drinks she observes patrons consume, depends on her ability to actually observe every instance of patrons consuming drinks. If she is unlikely to be able to observe every such instance, then perhaps her mechanism for measuring this concept is not reliable.
While reliability is about consistency, validity is about shared understanding. What image comes to mind for you when you hear the word alcoholic? Are you certain that the image you conjure up is similar to the image others have in mind? If not, then we may be facing a problem of validity.
To be valid, we must be certain that our measures accurately get at the meaning of our concepts. Think back to the first possible measure of alcoholism we considered in the subsection “Reliability.” There, we initially considered measuring alcoholism by asking research participants the following question: Have you ever had a problem with alcohol? We realized that this might not be the most reliable way of measuring alcoholism because the same person’s response might vary dramatically depending on how he or she is feeling that day. Likewise, this measure of alcoholism is not particularly valid. What is “a problem” with alcohol? For some, it might be having had a single regrettable or embarrassing moment that resulted from consuming too much. For others, the threshold for “problem” might be different; perhaps a person has had numerous embarrassing drunken moments but still gets out of bed for work every day so doesn’t perceive himself or herself to have a problem. Because what each respondent considers to be problematic could vary so dramatically, our measure of alcoholism isn’t likely to yield any useful or meaningful results if our aim is to objectively understand, say, how many of our research participants are alcoholics.Of course, if our interest is in how many research participants perceive themselves to have a problem, then our measure may be just fine.
Let’s consider another example. Perhaps we’re interested in learning about a person’s dedication to healthy living. Most of us would probably agree that engaging in regular exercise is a sign of healthy living, so we could measure healthy living by counting the number of times per week that a person visits his local gym. At first this might seem like a reasonable measure, but if this respondent’s gym is anything like some of the gyms I’ve seen, there exists the distinct possibility that his gym visits include activities that are decidedly not fitness related. Perhaps he visits the gym to use their tanning beds, not a particularly good indicator of healthy living, or to flirt with potential dates or sit in the sauna. These activities, while potentially relaxing, are probably not the best indicators of healthy living. Therefore, recording the number of times a person visits the gym may not be the most valid way to measure his or her dedication to healthy living. Using this measure wouldn’t really give us an indication of a person’s dedication to healthy living. So we wouldn’t really be measuring what we intended to measure.
Validity is like a portrait. No measure is exact; what’s important is how closely your measure approximates your concept.
At its core, validity is about social agreement. One quick and easy way to help ensure that your measures are valid is to discuss them with others. One way to think of validity is to think of it as you would a portrait. Some portraits of people look just like the actual person they are intended to represent. But other representations of people’s images, such as caricatures and stick drawings, are not nearly as accurate. While a portrait may not be an exact representation of how a person looks, what’s important is the extent to which it approximates the look of the person it is intended to represent. The same goes for validity in measures. No measure is exact, but some measures are more accurate than others.
- Reliability is a matter of consistency.
- Validity is a matter of social agreement.
- Operationalize a concept that is of interest to you. What are some possible problems of reliability or validity that you could run into given your operationalization? How could you tweak your operationalization and overcome those problems?
- Sticking with the same concept you identified in exercise 1, find out how other sociologists have operationalized this concept. You can do this by revisiting readings from other sociology courses you’ve taken or by looking up a few articles using Sociological Abstracts. How does your plan for operationalization differ from that used in previous research? What potential problems of reliability or validity do you see? How do the researchers address those problems?