Research

The Center for Statistics in Biomedical Big Data (CSBBD) engages in both methodological and collaborative research in big data. The topics shown below represent our current research areas. Our focus changes as new types of data appear.

INTEGRATIVE ANALYSIS OF GENOMIC DATA

Methods for characterizing the extent of genetic control of a disease phenotype as well as the identification of specific genetic factors and perturbed pathways influencing disease risk by integrating heterogeneous genomic, epigenomic and metagenomic data. The main application areas include integrative cancer genomics and omics studies of cardiovascular diseases, diabetes and renal diseases.  Leveraging eQTL data across multiple issues in genetic analysis of complex diseases in the framework of causal mediation analysis is the initial focus of this area.

METHODS FOR MICROBIOME AND METAGENOMICS

Methods for measuring, annotating, and prioritizing microbial taxa and microbial genes and their association with various phenotypes; Methods for studying microbial community dynamics and how they are perturbed by environments and treatments.  Methods for elucidating the role of gut microbial metabolism in influencing the predisposition to and treatment of heat disease, cancer and autoimmune disease by integrating metagenomic data with metabolomics data.

METHODS FOR ANALYSIS OF WEARABLE BIOMEDICAL DEVICES DATA

Wearable biomedical devices record and report information such as physical activity, sleep patterns, environmental factors, physiological sensors and patients’ health status.  Such wearable systems allow clinicians to monitor individuals over extended periods of time at an unprecedented scale. One of the key questions is to use these densely collected data together with the patients’ EHR to predict the patient’s health state and the patient’s future health trajectory and to predict the optimal choice of intervention for the patient. Methods will be developed for learning richer and data-driven descriptions of diseases to obtain digintal penotypes. Novel study designs are needed in order to develop studies to better answer the clinical questions. Functional data analysis and deep learning   methods are expected to contribute greatly to analyze such data. Cloud computing and data storage are necessary to make any online patient health state prediction.

HIGH DIMENSIONAL CAUSAL INFERENCE IN GENOMICS

Developing machine learning and high dimensional data methods for estimating heterogeneous causal effects in genomics and diseases. Developing new methods to estimate the effect of multiple simultaneous interventions (e.g., multiple gene knockouts or gene editing by CRISPR-Cas9), under the assumption that the observational data come from an unknown linear structural equation model with independent errors.  Graphical models in high dimensional setting and formal statistical theory of causal inferences by data fusion will be our initial focus in this area.

 

About Us

We are interested in statistical inference methods in big data in health science research. 

PUBLICATIONS

We publish both in top statistical journals such as JASA, JRSS-B, Biometrika, Annals of Applied Statistics and in top subject area journals such as Science, Nature, Nature Genetics. 

  

Contact Us

Assistant: Janine M. Pritchard

Tel:   (215) 573-4045

Email: jpritcha@pennmedicine.upenn.edu