Skip to Content

Overview of Curriculum

Exploring Databases- STEM Learning and Authentic Research in the High School Classroom: A Teacher’s Guide

Overview of Curriculum

Exploring Databases ( was developed by the University of Washington’s Department of Genome Sciences and the Institute for Science and Math Education in the College of Education. The two to three week module engages students in conducting authentic research that addresses the question, “What environmental and genetic factors contribute to some people becoming addicted smokers?”. Using a database from an actual epidemiological study of smoking behavior, students test their own hypotheses about how genetic or environmental factors influence whether people become smokers after trying it. The module includes learning activities and homework assignments designed to teach students the skills and knowledge necessary to conduct their research, such as the brain’s reward pathway and the basics of epidemiology and statistics.

In Exploring Databases, student research teams develop a hypothesis related to genetic or environmental influences on smoking behavior by reviewing profiles of smokers, past research, and their personal observations. Then they identify questions in the database that address their hypothesis and use the database to find evidence supporting or refuting it. The database provides scaffolding to guide students as they submit queries, interpret their results, and decide if an association is present between the factor they are testing and becoming a regular smoker. Students apply the criteria for causality to infer if an association is causal in nature.

Because of statistical constraints, students can only conduct a few queries as part of hypothesis testing. To explore the full potential of the database, they also conduct a process called hypothesis generation. In this activity, they can pose unlimited queries of the database to generate a new hypothesis, which they integrate into a new hypothetical research study. For this process, students may look at the raw data (in tabular and graphical form) to learn what happens to the statistics as the research parameters are changed and to recognize patterns in the data.

To conclude the module, students create a final presentation (for example, PowerPoint or poster) that shows their results and analyses from their hypothesis testing and hypothesis generation experiments. During classroom presentations, students participate in scientific argumentation by critiquing the claims of their peers and responding to the questions and comments of others.

Exploring Databases Curriculum Summary

Lesson 1. Why and how do people do science? Through several interviews with scientists, students learn about the many approaches scientists take to conduct their research.
Lesson 2.  Why do some people become smokers and others do not? Students explore the wide range of smoking behavior and discover factors that influence people’s smoking through profiles of actual smokers and nonsmokers.
Lesson 3. How do genes influence smoking behavior? Students learn how nicotine interacts with the body and discuss what genes might influence variation in smoking  behavior
Lesson 4. How can we study genetic and environmental influences on smoking behavior? Students learn the characteristics of one commonly used epidemiological study design, the case control study; and they learn the details of the smoking behavior case control study.
Lesson 5. Analysis of data in case control studies: the odds ratio Students learn how to calculate the odds ratio and determine its statistical significance using the 95% confidence interval. They learn the difference between association and causality and how to apply the criteria for causality in their research.
Lesson 6. Database research: What can we learn from the smoking behavior data?* Students develop and test specific  hypotheses using the smoking behavior database. They analyze results and make conclusions about their overarching hypotheses. Students learn the difference between hypothesis testing and hypothesis generation.
Lesson 7. Hypothesis Generation: Exploring the database to propose new studies Students analyze many questions and exposures to generate new hypotheses. Students analyze all query results to propose future studies.
Final Project
This section includes suggestions for the final project format, a template for the final project, and samples of a grading rubric and self and peer evaluation tool

* Note that the hypothesis testing in the curriculum is a set of two components: statistical tests (estimating odds ratio and confidence interval) for association and reasoning with the criteria for causality.

Learning Objectives

Overarching Concepts

  • multifactorial traits
  • more than one way to do science
  • understanding the nature of science and research

Scientific concepts

  • causation
  • genetics
  • stages of smoking
  • what is evidence (argumentation)
  • basic neuroscience
  • reward pathway
  • hypothesis testing vs hypothesis generation

Epidemiology concepts

  • exposed, not-exposed, outcome, retrospective study
  • differences between observational and experimental research
  • sources of error (optional)

Statistical concepts

  • population vs sampling
  • odds ratio
  • 95% CI
  • statistically significant
  • multiple comparison problem


This curriculum is typically used for general biology high school students. The curriculum has also been used in International Baccalaureate, Biotechnology, pre-AP and AP biology courses, and community college introductory biology courses. Students being taught this curriculum should have a basic understanding of:

  1. Mendelian genetics
  2. Gene expression: DNA codes for RNA, which codes for protein
  3. What a gene is
  4. What a protein is
  5. How changes in a gene can result in a change in a protein, which can affect its function
  6. Genetic variation (diversity in the inherited traits within a species)

Teacher Testimonial

“One of the strengths of the Exploring Databases Curriculum is in the way it allows students to see science as multidisciplinary involving aspects that they may not have thought about before such as epidemiology, bioethics, and database analysis.  It is a curriculum that clearly shows the nature of the science process in refining hypotheses and looking at evidence.  Many curricula do not emphasize this aspect of science, or help students see the field away from the “man in white lab coat”.  Not only that, but it is in a topic – addiction and genetics – that interests and pulls in students.  Every year I have a number of students who name the “smoking unit” as their favorite unit of the year because they related to it and liked that they were contributing to “real science” (vs. cookbook labs).  It allows a “window” or “hook” into learning neurobiology, gene control, basic epidemiology, experimental designs, and ethics in a way that keeps students interested and makes them want to focus, and it allows us to meet the society in science standards.  I would without a doubt recommend this curriculum to any teacher, since it is flexible, gets into issues students enjoy, and can be adapted to meet a wide variety of needs.  It is also well aligned with state and national standards.”

-Renee Agatsuma, former Garfield High School teacher, current Genetic Epidemiology student at University of Washington