Summer Research Highlight: Genomic and Computational Approaches to Gene Regulation with Dr. John Quackenbush P’24
 Each cell in our body contains a set of instructions, encoded in a molecule called DNA, that helps determine both our traits—including things like height, weight, hair color—and our individual risk of developing diseases such as diabetes or cancer. DNA is a long-chain polymer, and these instructions are encoded in the DNA “sequence” (the series of polymer subunits) and grouped in “genes,” each of which carries the instructions for creating one of the proteins that make up our cells.
Each cell in our body contains a set of instructions, encoded in a molecule called DNA, that helps determine both our traits—including things like height, weight, hair color—and our individual risk of developing diseases such as diabetes or cancer. DNA is a long-chain polymer, and these instructions are encoded in the DNA “sequence” (the series of polymer subunits) and grouped in “genes,” each of which carries the instructions for creating one of the proteins that make up our cells.
The sequencing of the human genome at the start of the twenty-first century gave us a catalog of human genes and provided us with tools to explore how differences in our DNA make us unique. More than twenty years later, we are still exploring how genes, and the genetic instructions that control them, ultimately determine these traits.
BUA parent Dr. John Quackenbush P’24, Professor of Computational Biology and Bioinformatics and Chair of the Department of Biostatistics at the Harvard T.H. Chan School of Public Health, is one of the world’s leading researchers in studying the links between genes and traits. His research team has developed methods that model how genes work together in networks that alter cells as they grow and develop or as they move from health to disease. This summer, seven students—four rising BUA seniors working on their senior thesis projects; one recent BUA graduate who did her senior thesis work in the lab in the summer of 2021; one rising BUA junior; and a junior from the nearby Winsor School—are working with Dr. Quackenbush and his research team to explore what one can learn about gene regulation.
BUA graduate Mia Shapoval ’22, who will attend Boston University as freshman this fall, is following up on her senior thesis project in which she explored how the process of gene regulation differs between tumors in males and females suffering from lung cancer and trying to understand how changes in regulatory networks might help explain why there are sex differences in lung cancer risk, the risk posed by smoking, the rate at which lung cancer develops, and how well people respond to therapy. What drives sex differences in human health and disease is one of the most understudied problems in biomedical research and answering this question promises to provide insight into how to treat diseases using sex as a tool to select treatments most likely to be effective in each person.
Audrey Xiao ’23 and Christian Asdourian ’23 are looking at how changes in gene regulation over the course of a lifetime alter the state of our health. These “epigenetic clocks” run at different rates is different people—or even in different tissues in the same person—and contribute to many diseases. For example, decreases in lung function are normal during one’s lifetime, but the rapid decrease in function seen in emphysema appears to be a disease of dramatically accelerated aging. Audrey is looking at different ways of assessing epigenetic age, trying to understand why epigenetic age and chronological age differ between people and testing the hypothesis that lung cancer and normal tissue may show different epigenetic clocks in the same person—changes that may help shed light on how to better treat lung cancers. Christian is trying to understand whether the rate affects some genes differently than others, leading to differences in every person’s disease risk. Specifically, he is looking at how differences in lung cancer characteristics, such as tumor stage, and demographics variables such as sex or chronological age, are related to the epigenetic age of cancer tissues. If cancer is a disease involving alteration of epigenetic age, this analysis will provide insight into how and why lung tumors develop and progress in ways that are unique to each individual.
Rohan Biju ’23 is tackling the problem of how one finds interpretable patterns in gene regulatory networks that contain more than 25,000 genes and the elements that control them. Using a technique called “network graph embedding” to summarize and visualize the complex regulatory landscapes active in each cell, Rohan’s goal is to find groups of genes that are differently regulated between health and disease conditions in ways that explain what we know of the disease process. Aditya Venkatesh ’23 is exploring ways of inferring gene regulatory networks using data derived from thousands of individual cells (rather than from “bulk” tissue samples). By looking at the regulatory processes that are active the collection of individual cells, Aditya is trying to understand what cell types comprise organs and tissues—and how the composition shifts as disease develop.
Adam Quackenbush ’24 and Jaya Kolluri, a rising junior at the Winsor School, are exploring ways in which we can accelerate applications of the scientific method by finding new, completely unexpected hypotheses to test. Using large-scale datasets that measure the activity of all 25,000 human genes in each of 40 tissues in nearly 1000 individuals on whom there is extensive clinical data (including age, height, weight, sex, race, smoking status, and so on), Adam and Jaya are looking for all possible correlations and associations between different measured variables, and then using these to create an online “serendipity engine,” SEAHORSE, that allows users to ask and answer open-ended questions about individual variables or about groups of research subjects by selecting multiple demographic and other parameters. For example, one could ask “What clinical variables correlate with subject age?” Or, “What gene expression levels in which tissues are correlated with subject weight?” Or “What gene regulatory network edges are correlated with smoking status?” The hope is that SEAHORSE will allow these open-ended questions to be quickly explored and so will spur the development of new hypotheses that can ultimately be tested and validated or rejected.
Each of these students is working directly with one or more graduate students or postdoctoral fellows and participating as active members of Dr. Quackenbush’s research laboratory. They are developing and applying new methods for data analysis in the R statistical programming language, troubleshooting their software, and discussing their results and challenges with all the senior scientists in the team’s weekly group meetings. Students will be expected to present their summer’s work before returning to school and to work with their mentors on developing a scientific publication based on their work—Adam and Jaya’s project has already been selected for presentation at the 2022 Women in Statistics and Data Science conference and they have started working on a paper about their online tool.