Columbia University in the City of New York

People

Genevera I. Allen, PhD

Professor of Statistics, Department of Statistics; Principal Investigator at Columbia's Zuckerman Institute

I develop and use new machine learning and statistical techniques to make discoveries. That's my goal in neuroscience.

Genevera Allen mines the ever expanding and diversifying neuroscience databases for rare and subtle signals that will lead to discoveries about the brain that would otherwise remain hidden.

Read more about Genevera I. Allen, PhD >

About Genevera Allen

The Data Whisperer

Using mathematical, statistical and machine-learning innovations to eke out scientific insights hidden in vast neuroscience datasets

Looking at today’s vast datasets of brain imagery, cellular activity and genetic expression can be daunting and deceiving. But if you devise the right tools for re-envisioning and recasting the raw data, you just might find in it that next big neuroscience discovery. It’s this tension between the ways data can enlighten – or fool – that inspires statistician Genevera Allen, PhD, to head to Columbia’s Zuckerman Institute every day.

“I came to neuroscience because I think the data problems are fascinating and challenging,” said Dr. Allen, also a member of Columbia’s Herbert and Florence Irving Institute for Cancer Dynamics, the Center for Theoretical Neuroscience, and a professor in Columbia’s Department of Statistics

Like a treasure hunter sweeping a metal detector over an enormous beach, Dr. Allen uses mathematical and statistical tools to search for scientific treasures in huge and complex databases. The knowledge she covets are secrets about the brain that are otherwise too well hidden to discern in the ever increasing and diverse expanses of data generated by neuroscientists today. 

“We need tools that can detect in these datasets not the big signals, which are known and well studied, but the rarest and smallest of signals,” said Dr. Allen. “Then we can explore whether these signals are just noise generated from a dataset or true scientific discoveries.”
One data-intensive challenge Dr. Allen has taken on is brain connectivity; her research explores how cells form microscopic connections and how brain regions form macroscopic connections. A payoff in this context might be a theoretical model that depicts how synapse patterns and cell-to-cell circuit structures give rise to broader functional interactions in the brain, ultimately enabling whole-brain cognitive feats like retrieving memories and making decisions.
Doing this requires managing, melding and recasting several kinds of data — for example, microscope imagery, fMRI scans, brain-cell recordings and even genomic and clinical data — and developing machine-learning techniques that can ferret out patterns in multifaceted datasets.
“Current machine-learning tools are good at analyzing one kind of data, but not many at once. The latter is what it will take to answer many of the most interesting neuroscience questions,” Dr. Allen noted.
Among Dr. Allen’s most challenging projects is training computers to uncover patterns in data without using labels in what is called unsupervised discovery; here, researchers provide precious little guidance in the computational search. This involves using machine learning to analyze data that lacks key identifying information: the task or behavior the animal is doing, for instance, or the cell type in the brain.
Stripping out this information helps neuroscientists to avoid being deceived by their own preconceived ideas about what messages they think the data might contain. And, as one of her papers documents, it can spot otherwise hidden patterns shared by different types of patient data, such as ones that center on genetic variations and gene expression in people with Alzheimer’s disease. It’s the kind of data-anchored discovery, Dr. Allen said, that can point toward promising hypotheses for follow-up experiments in biomedical research.
“The goal here is to enable my machine learning tool to guide us to the discovery that's hidden in the data,” Dr. Allen said. “To do that, we need to mine through these massive amounts of data, put together disparate types of information and bring them together into a coherent and confirmable picture.”
Dr. Allen’s pathway into statistics, math and ultimately neuroscience began with a shoulder injury during her freshman year at Rice University. That put a damper on her plans to major in music as a violist. With an eye on a STEM field, she turned to statistics and ended up liking it so much that she went on from Rice to earn her PhD in statistics at Stanford University.
A month after she got her doctorate, Dr. Allen was back at Rice for her first academic appointment. In her 14 years there until joining Columbia’s Zuckerman Institute this year, she gathered associate professor appointments in the departments of electrical and computer engineering, statistics, and computer science. Along the way, she also became an investigator at the Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital and Baylor College of Medicine. One of her lasting marks at Rice is the Center for Transforming Data to Knowledge, or the D2K Lab, a hub for data-science education that she founded in 2018.
“At the Zuckerman Institute, I feel like a kid in a candy shop,” she said. “There are so many different scientists here, and they all have their own datasets. They're developing new
neurotechnologies that are producing new and complex types of data, and they’re going to need new mathematical, statistical and machine-learning methods to eke out the most discovery from their troves of data.”