# Lesson 5: Analysis of data in case control studies: the odds ratio

## Overview:

Students learn about the odds ratio, the statistic used by epidemiologists to analyze data in a case control study. They apply their knowledge by calculating the odds ratio for a case control study example. They also learn about the error measurement tool used for odds ratios, the 95% confidence interval. The class visits the database to see how to calculate OR and 95% CI using the “Graphical Data for Q 19” step. They also learn about criteria for causality to determine whether an association may be causal. In an optional activity students learn about sources of errors in case control studies.

Class time: 100 minutes (2 days)

Learning Objectives Evidence
Learn about odds, odds ratios, and the 2×2 table as they apply to case control studies. As the class discusses a scenario about the effect of car passengers on accident rates, students are able to identify the outcome and exposure, populate a 2×2 table, and calculate odds and odds ratios.
Use Step 1.2: Graphical Data for Q 19 in the database to learn how to define exposed and not exposed for the questions they are studying. As the class is analyzing question 19, students are able to discuss what should be entered for exposed and not exposed.
Understand the purpose of the 95% confidence interval. After analyzing the results from question 19, students are able to discuss whether the result occurred by chance.
Understand the difference between association and causation. Students are able to apply the criteria for causation to question 19.
Learn about sources of errors in case control studies (optional activity). Students determine what error is present in four example studies in Student Sheet 5.6

## Instruction:

Optional Section A. Interview with an Epidemiologist- Further introduction to epidemiology and causality

1. Hand out Blank_Student Sheet 5.1

2. Show students the epidemiologist video clip with Jeffrey Stanaway (video 3/3)

3. Hold a brief discussion on Student Sheet 5.2 to compare student responses. Students should understand that epidemiologists use databases frequently to try to connect exposures and outcomes, or find associations. Scientists must go beyond the results of a case control study to imply causality.

Section B. Reviewing the case control study and odds ratio

1. Ask students to review what information they currently have about the smoking behavior study, and make a list on the board. Here are some things they should include:

• That it is a case control study (compares case and control subjects retrospectively)
• Cases are regular smokers and controls are nonsmokers who tried smoking
• The database has both environmental and genetic information about the subjects based on the questionnaire and genotyping from the blood samples of subjects.

2. Access the database (http://gsoutreach.gs.washington.edu/database2), and click on “Step 1.2: Graphical Data for Q 19”: Did you believe that smoking cigarettes could be harmful to your health?” At this stage you are accessing the database using Step 1.2: Graphical Data for Q 19 so you can show students the “raw data” for Question 19 and elicit their ideas about how you might describe the data mathematically.

 Video – How to go through Step 1.2 in the database, discussion of odds and probability Please enable JavaScript LinkEmbedCopy and paste this HTML code into your webpage to embed.

 Discussion Question and Evidence of Student Understanding Ask the students how they would interpret the data in Question 19. What do the data tell us? The data show that many of the cases (smokers) did not believe that smoking was harmful to their health, while very few controls did. A large number of cases and most of the controls did believe that smoking was harmful to their health. How do we know? Looking at the data, it seems there are many more smokers who answered “no” to this question than nonsmokers and more nonsmokers who answered “yes” compared to smokers. If students are having trouble noticing these differences, you can point them to the graph which visually shows this difference. How can we express the results in numerical terms? Students should suggest ways to interpret the data mathematically, such as ratios or probabilities.

3. Tell students that they are going to learn the calculation used by epidemiologists to analyze data in a case control study, called the odds ratio. It may be helpful to go back to this question after learning about the odds ratio and ask students how they would calculate the odds ratio from the given data.

Section C. What is odds? (Discussion or done actively)

 Sections C-G can be completed by the use of the Lesson 5 Odds, OR, 95CI PowerPoint. See notes below PowerPoint to help facilitate the lecture on the odds ratio,95% confidence interval, and criteria for causality.

1. Explaining odds can be done actively by actively flipping coins or dice to get an understanding of odds or can be done through this discussion.

 Video- Performing the coin flip activity example Please enable JavaScript LinkEmbedCopy and paste this HTML code into your webpage to embed.

2. Ask students: Do you know what “odds” means?

3. Tell students that the odds compares the likelihood (not probability; see Comments on comparison of odds and probability for detail) of something occurring to the likelihood of something not occurring and is written as:

(number of times an event occurs) / (number of times that event does not occur)

4. Ask students: If you flip a coin, what are the odds that you will get heads?

Explain as follows:

• There are 2 possible outcomes: heads and tails
• 1 possible outcome is heads: 1 possible outcome is not heads (tails)

The odds are read as a ratio. In this situation, the odds are said to be “1 to 1” or 1:1 that you will flip a heads.

5. Ask students: What are the odds of rolling a three on a die?

Explain as follows:

• 6 possible outcomes when rolling a die: 1, 2, 3, 4, 5, 6
• 1 possible outcome is 3, and 5 possible outcomes are not 3.
• Odds are said to be 1:5 (“1 to 5”) or 1/5 that you will roll a three. Note that this is different from saying “1 in 5.”

Ask class some other simple odds questions.

What are the odds that you will roll an odd number on the die?

• 3 possible outcomes are odd (1, 3, 5): 3 possible outcomes are not odd (2, 4, 6)
• Odds are 3:3 or 1, that you will roll an odd number.

You have a sack of 10 tootsie rolls, 20 Jolly Ranchers, and 10 gumballs. What are the odds that you will stick your hand in the bag and pull out a tootsie roll?

• 10 possible outcomes are tootsie: 30 possible outcomes are not tootsie

Odds are 1:3 that you will choose a tootsie roll.

Section D. What is an Odds Ratio

1. Pass out Blank_Student Sheet 5.1-2×2 Table and the Car Passenger Case Control Study. Read out loud the two paragraphs describing the Car Passenger study.

2. Tell students that in epidemiological research, a table like the one on Student Sheet 5.1 is called a 2 X 2 table and is used to organize the data obtained in a case control study. It is called a 2 X 2 table because there are 4 squares arranged in a 2 X 2 array. All of the data gathered will fall into one of the 4 squares. Point out the 4 squares on Figure 5.1- 2×2 table (outline them in red, for example).

3. Ask students what the outcome is, and who the cases and controls in the study are. On the Figure label the Case and Control columns “drivers who got in an accident” and “drivers who did not get in an accident.”

4. Ask students what the exposure is. Explain that an exposure can be any factor (genetic, environmental, policy, or social factor) that you believe causes the outcome. Label the Exposed and Not Exposed rows “had 1 or more passengers” and “did not have passengers.”

The exposure is 1 or more passengers (rather than no passengers) and is entered on the top line of the 2 X 2 table.

5. Give students a few minutes to fill out the table with the numbers from the Car Passenger Study and to answer questions 1-6 on Student Sheet 5.1.

6. Go over the answers and make sure students understand how to fill out the table.

7. Explain to students that in order to analyze their data, they need to make a calculation called the odds ratio. This requires that they calculate the odds for cases and for controls. Help them to answer questions 7 and 8 in Student Sheet 5.1.

8. Use the following explanation as well as Figure 5.1 to help students answer questions 9-12.

Odds Ratio Explanation

To determine whether cases are more likely to have been exposed than controls, epidemiologists perform a calculation called the odds ratio. For example, in the Car Passenger study, they would compare the odds that cases (drivers in car accidents) were exposed (had 1 or more passengers) to the odds that controls (drivers not in car accidents) were exposed. They compare the two odds by making a ratio of them.

If cases and controls had equal odds of being exposed, then the odds ratio would be 1.

In other words, if having 1 or more passengers was not associated with getting in a car accident, then:

(odds of a case having one or more passenger) / (odds of a control having one or more passenger) = 1

If the odds ratio turned out to be quite a bit more than 1, then you would have demonstrated an association between having 1 or more passengers and car accidents. The larger the odds ratio is, the greater the strength of association. Whether the association is statistically significant or could be due to chance alone, is a question that students will look at more closely a little later.

Conversely, if the odds ratio turned out to be quite a bit less than 1, then you would have demonstrated that drivers in car accidents are less likely to have 1 or more passengers than drivers not in car accidents.

Significance: If the sample size is fairly big and the odds ratio is 2 or 3 or 4, then the association is very likely significant (this is not a hard and fast rule). But if the odds ratio is only 1.4, can we conclude it isn’t statistically significant? How different is 1.4 from 1 (no association)? Could an odds ratio of 1.4 be due just to chance? If the sample size was very large, then it could very well be that an odds ratio of 1.4 is a significant association. But if the sample size is small, an odds ratio of 1.4 would be much more likely to be due to chance alone. The coin flip activity can also be used to talk about random chance:

Try this odds ratio explanation with your students

Students may need to have odds ratio explained to them in several ways. Try this explanation for additional clarity. You can use this explanation when presenting Figure 5.1- 2×2 table.

• If the odds ratio is 1, then cases and controls are equally likely to have been exposed. This shows there is no association between the outcome and exposure.
• If the odds ratio is 6, then cases are 6 times more likely to have been exposed than controls. This demonstrates there is an association between outcome and exposure.

If you calculate the odds ratio and find it greater than 1, you can always make the statement:

“If the odds ratio is X, then cases are X times more likely to have been exposed than controls.”

The bigger your sample size is, the more likely it is that the association you have detected with your odds ratio calculation is real, that is, statistically significant and not due to chance alone.

It is also possible to have an odds ratio that is less than 1. When this occurs, it shows that the exposure is less likely to be associated with cases than controls.

How to use Figure 5.1- 2×2 table

You may want to use this figure to show students that the 4 squares in the 2 X 2 table can be labeled a, b, c, and d to help keep track of the numbers in their calculations. Epidemiologists do this too. That means that the odds of a case being exposed would be a/c and the odds of a control being exposed would be b/d. The odds ratio is nothing more than the ratio between two odds. Therefore, the odds ratio would be a/c / b/d. This is shown on the bottom of the overhead transparency. What this odds ratio means can be stated in sentence form, “Cases are a/c / b/d more likely to have been exposed than controls.”

How to use Figure 5.2- Sampling a population

This figure shows how a sample is not necessarily representative of a whole population. Even though researchers try very hard to select a sample that is representative, this is very hard to do. Consider the following points as you discuss the figure with the students:

• Is the sample a good representative of the whole population? Students should notice that there are relatively more red stars and yellow circles in the sample than in the whole population.
• Will the odds ratio calculated for the sample be the same as the “true odds ratio” for the whole population? Students should recognize that the sample OR is an estimate of the population OR and may be different.
• Challenge the students to consider what to do about the difficulty of getting a sample that is representative of a whole population. Students may suggest doing additional sampling or trying to select cases and controls that are more similar for the characteristics that are being used for matching. They may also suggest that you need a way to estimate the error.
• Use this conversation to lead into a discussion of the 95% confidence interval. The 95% confidence interval is a tool to estimate the range of the true population odds ratio.

Section E. Error bars for odds ratio: 95% confidence interval

1. Use Figure 5.3- 95% Confidence interval to discuss the 95% confidence interval. You may want to hand out Blank_Student Sheet 5.5- Reading for students to review the odds ratio and 95% confidence interval.

2. Part A of  compares traditional error bars used in many types of measurements with 95% confidence intervals.

3. Explain to students that the confidence interval is the range of values that is believed to contain the true OR with 95% confidence. By “True OR” we mean the OR for the entire population, not just our study sample. The 95% confidence interval means if the study was repeated 100 times, the confidence interval would contain the “True OR” 95 times. However, the interval would also NOT contain the true OR 5 times out of 100. Unfortunately, we do not know if our calculated 95% confidence interval contains the “True OR”, and therefore we must conclude that “we are 95% confident that our confidence interval contains the True OR”.

4. The formula for the 95% confidence interval is given. Tell students that they don’t need to remember it because the database calculates it automatically.

5. Use the number line in Part B to point out that the 95% CI tells about the statistical significance of the odds ratio: if the 95% CI contains the value of 1 then there is no association, but if it does not contain the value of 1 then the OR is statistically significant, meaning that the OR did not occur as a result of chance. Like the OR, the 95% CI can be greater than 1, less than 1, or could include 1.

Section F. Calculating odds ratio and 95% confidence interval using the database

1. Project the database on the class screen and select Hypothesis Testing. Select Question 19, “During your experimental smoking phase, did you believe that smoking cigarettes could be harmful to your health?”

2. Ask students to suggest a specific hypothesis that could be tested using this question, and type it into the appropriate text box.

Possible specific hypothesis: Smokers are more likely to have thought smoking wasn’t harmful to their health than nonsmokers.”

3. Ask students how they would define “exposed” and “not exposed” using responses to this question, and enter their responses into the text boxes for exposed and not exposed. Exposed: believing smoking is not harmful to your health (response b); Not exposed: believing smoking is harmful to your health (response a)

4. Ask students whether they think response c should be used to define exposed or not exposed. They will probably say that it should not be used. Make sure that students understand that the response, “Don’t know/not sure” should not be used to define exposed or not exposed.

5. Show students how to select “exposed” for response b and “not exposed” for response a. They should leave response c as “neither.”

6. Show students that they can select one of three populations — everyone, males only, or females only. Tell students that unless their hypothesis is specific for either males or females, they should select “everyone” to obtain the greatest sample size to make the correct statistical conclusion.

7. Hit “Get odds ratio” to calculate the OR for this question.

8. The screen will show a section called, “Report your results and interpretation.” Be sure to point out the following features:

• The original questionnaire question and the responses for exposed and not exposed are given.
• Everything they typed into the text boxes is also given. This allows students to double check that they calculated the odds ratio they intended to do.
• The 2 x 2 table is given, and the sample size, odds ratio, and 95% CI are calculated.

9. Guide students in providing responses to Task a) and b). They should be able to state the following: Task a) The odds ratio was 9.03. This means that smokers were 9.03 times as likely as nonsmokers to have not believed that smoking was harmful to their health during their experimental smoking period.

Task b) The 95% confidence interval is [4.38, 18.62]. It does not include the value 1, so the association is significant (likely did not occur by chance).

Section G. Applying the Criteria for Causality (What do we mean by Causality?)

 Video- Epidemiology causality explanation by Noel Weiss, UW Professor of Epidemiology Please enable JavaScript LinkEmbedCopy and paste this HTML code into your webpage to embed.

1. Task c) asks about causality. Remind students that just because there is an association between believing that smoking is not harmful to one’s health and becoming a smoker, we can’t infer that this causes smoking. Epidemiologists apply several criteria to determine whether the exposure might increase the risk of the outcome. Review the Criteria for Causality found below task c) or on Figure 5.4- Criteria for Causality with your class. Then work with the class to apply the criteria to the question, “Does not believing smoking is harmful to your health during your experimental smoking phase cause smoking?” The following are some possible responses: Strength of association: The odds ratio of 9.03 is very high, indicating a strong association and the 95% confidence interval does not contain 1.00.

Dose-response relationship: This is not applicable to this question.

Temporal sequence: People had their beliefs about whether or not smoking was harmful to their health during their experimental smoking stage, which occurred before they became regular smokers.

Consistent with other studies: Students can use Google Scholar (http://scholar.google.com/) to look for other studies that addressed this issue. Students should only read the abstract of research papers.

Biological plausibility: It makes sense that if a person does not believe smoking can be harmful to their health they will be more likely to continue smoking.

Lack of confounder or significant bias: Information about the health risks of smoking became more prevalent in the 1970s and later, so people who were in their experimental smoking phase during the 1960s may not have been aware of the health risks. The age distribution of cases in this study is older than controls, so it is possible that many cases became regular smokers at a time when the harmful health effects of smoking were not well publicized. The different age distribution for the case and control groups is an example of selection bias. The incomplete matching by age may partly contribute to the high OR for this question.

Optional Section H. Sources of error in case control studies and other extensions

1. Review other optional case control studies. These case control studies can be used as assessment tools:

Mr. Limon’s History class study- Blank_Student Sheet 5.3

Smoking and Lung Cancer. A case control study by Doll & Hill (1950)- Blank_Student Sheet 5.4

2. Finding case control studies. Students may also look for case control studies online by doing a simple Google search on case control study and a disease or condition of interest. For example, a Google search on “case control study leprosy” turns up studies examining how effective leprosy vaccines are. Allow students to follow their interests by choosing their own topics. Seeing the many studies they can find in this simple way will help demonstrate to students how widely used and valuable the case control study really is.

3. Sources of Error. Blank_Student Sheet 5.6  is a reading containing further information on distinguishing association from causality. It also includes information on types of study errors that can bias results and lead to apparent associations. This information may be even more useful when students are preparing to analyze data in the database in Lesson 6.