**Integrating Heterogeneous Microarray Datasets to Increase Experimental Power**

## Using Pre-Existing Microarray Datasets to Increase Experimental Power: Application to Insulin Resistance. In: PLoS Comput Biol, 6 (3), pp. e1000718, 2010.

Although they have become a widely used experimental technique for identifying differentially expressed (DE) genes, DNA microarrays are notorious for generating noisy data. A common strategy for mitigating the effects of noise is to perform many experimental replicates. This approach is often costly and sometimes impossible given limited resources; thus, analytical methods are needed which increase accuracy at no additional cost.

One inexpensive source of microarray replicates comes from prior work: to date, data from over one million hybridizations are in the public domain. To leverage this information, we have created the SVD Augmented Gene expression Analysis Tool (SAGAT), a mathematically principled and data-driven approach for identifying DE genes. SAGAT increases the power of a microarray experiment by integrating coexpression module information, which it extracts from the vast number of publicly available microarray datasets. In specific applications, SAGAT increments effective sample sizes by as many as 2.72 arrays and leads to a 50 percent increase in the number of DE genes detected.

**Efficient Biochemical Parameter Estimation using Rare Event Simulation Techniques**

## Accelerated Maximum Likelihood Parameter Estimation for Stochastic Biochemical Systems. In: BMC Bioinformatics, 13 (1), pp. 68, 2012, ISSN: 1471-2105.

Despite their widespread use in systems biology, cell population-level models of biological processes (e.g. using ordinary differential equations) are unable to capture and characterize the cell-to-cell variability exhibited even by genetically identical cells. This limitation inhibits our ability to answer detailed questions about single-cell biological mechanisms, where low copy numbers of DNA, mRNA, and protein species often lead to large stochastic fluctuations. In order to make sense of these fluctuations, which are often linked to important cellular behaviors and disease states, methods for modeling, simulation, and analysis of stochastic biological systems are required.

We have developed one such method—Monte Carlo Expectation-Maximization with Modified Cross-Entropy Method (MCEM^{2})—for estimating unknown kinetic parameters (e.g. transcription and translation rates) from single-cell time series data. By using techniques previously developed for rare event characterization, MCEM^{2} provides a computationally efficient stochastic simulation-based approach for parameter estimation that requires no prior knowledge of unknown parameter values. When applied to stochastic models of varying complexity, MCEM^{2} accurately identifies parameter values in cases where recently proposed methods are ineffective.

**Inferring Single-Cell Biological Mechanisms using Stochastic Simulation**

## Inferring Single-Cell Gene Expression Mechanisms Using Stochastic Simulation. In: Bioinformatics, 31 (9), pp. 1428–1435, 2015, ISSN: 1367-4803, 1460-2059.

Although widely used, the two-state model of promoter activity (active/inactive) represents an oversimplification of the architecture of most promoters, as simultaneous regulation by multiple transcription factors as well as epigenetic interactions increase the effective number of promoter states. Given the essential role of gene expression in all cellular functions, efficient computational techniques for characterizing promoter architectures are critically needed.

We have developed a novel model reduction for promoters with arbitrary numbers of active and inactive states, allowing us to approximate complex promoter switching behavior with a small set of parameters. Using this model reduction, we created bursty MCEM^{2}, an efficient parameter estimation and model selection technique for inferring the number and configuration of promoter states from single-cell gene expression data. Application of bursty MCEM^{2} to data from the endogenous mouse glutaminase promoter suggests that, rather than occupying just two states, the glutaminase promoter may traverse through 10 or more states before initiation of transcription. Each of these states represents a putative regulatory interaction to be further characterized.