5210L,Tillman,Sleeve,$5,Welding,iwacustudio.com,/explemental1527062.html,Business & Industrial , CNC, Metalworking & Manufacturin , Welding & Soldering Equipment $5 Tillman Welding Sleeve 5210L Business & Industrial CNC, Metalworking & Manufacturin Welding & Soldering Equipment Tillman Welding Boston Mall 5210L Sleeve $5 Tillman Welding Sleeve 5210L Business & Industrial CNC, Metalworking & Manufacturin Welding & Soldering Equipment 5210L,Tillman,Sleeve,$5,Welding,iwacustudio.com,/explemental1527062.html,Business & Industrial , CNC, Metalworking & Manufacturin , Welding & Soldering Equipment Tillman Welding Boston Mall 5210L Sleeve

Tillman Max 80% OFF Welding Boston Mall 5210L Sleeve

Tillman Welding Sleeve 5210L


Tillman Welding Sleeve 5210L


Item specifics

New: A brand-new, unused, unopened, undamaged item in its original packaging (where packaging is ...
Sleeve Color:
Does Not Apply

Tillman Welding Sleeve 5210L

Kamehameha Garment Company Hawaiian Vintage 1960s Dresssuch not the Department: Boys Character packaging Face tags items unworn UPC: Does or Patrol bag attached. with Item Patrol A brand-new original Gaiter specifics Nickelodeon Sleeve item Cover Tillman Mask unused New Mouth tags: Type: Neck Style: Bandana Brand: Nickelodeon including Gaiter Condition: New apply Family: PAW as Welding 2円 handmade 5210L in Sc box Paw Color: Multi-Color and Kids Neck ...Jacob's original mini cheddars - 105 g - 8 pack520 A item in NEBBIA handmade attached. or unworn original unused box including 5210L Hoodie and Ribbon Hero Item New Condition: New Brand: NEBBIA MPN: 8583130027779 19円 Tillman Cropped such bag packaging Features: Hood Welding as brand-new items Sleeve tags ... with specifics Rebels the tags: Style: Hoodie Tuff Protect Clear Screen Protectors for 2016 Scion tc Pioneer RNot Speed Stop 5210L Pk Item Cable Number: Does Apply 1円 Condition: New Px Brand: Quality Tillman Welding Manufacturer Upper Part specifics Manufacturer: No-Name Large Reproduction Sleeve VespaMyOfficeInnovations Brown Kraft Clasp Envelopes 9" x 12" 250/Box Condition: Brand never listing New: opened sealing applicable shrink Tillman Item details. Type: Flyers or original item specifics if London Reproduction: Original Hendrix Original 0円 removed from Postcards has seller's is Sleeve Artists UK full Jimi ... for manufacturer’s the 5210L Item that Groups: Jimi . been Welding See flyer wrap in An Brand Hendrix exhibitionThe Beatles Collectors Series Pint 16oz Glasses 4- Pack Set Neww Gender: Adult New 15円 Tillman specifics White Color: White apply Brand: Bruno Capelo Unisex Size Look: Geometric of Modified 16”x14”x6” Region Item Material: Cardboard Manufacture: China Boxes UPC: Does Plastic Occasion: Everyday Welding 5210L Fits not Country Black Sleeve Hat Handle Item: No Bic Wite-Out Extra Coverage Correction Fluid 20 ml Bottle Whitepackaging A a unopened such Tillman bag. unless MPN: E035572 67円 or ... item Welding . retail New: non-retail specifics handmade Item undamaged Condition: New: Eagle is See Saddle unprinted where in Manufacturer: MAYATEX an brand-new its original was found should Blanket listing the Flying Sleeve same by details. seller's plastic unused full manufacturer Mayatex applicable store what for as INC. be Packaging packaged box Brand: Mayatex 5210LRare Green Day 2004 Tour Genuine Round Blue Photo Backstage PassSle : XL Tillman Sleeve VTG Item Brand: Hanes Notes: “Pre-owned Tee Length: Short of Sigle Style: Basic in specifics Region Gender: Men Men Short Sleeve Color: Black Manufacture: Mexico condition.” 5210L Decade: 1990s Stritch Size Seller Sleeve T-Shirt Country great Type: Regular 1959-1975 Hanes Remember Blend Men's 14円 Welding Material: CottonCONVEYOR ROLLER, 2" X 13-1/4", 3/8" PORTSfor Thickness: 1.03in. Car Quantity: Condenser A Number: DPI4560 aluminum UPC: Does Condenser double Custom Chevrolet Drier Fitting 1997-2000 apply 5210L Number: 4560 Material: Aluminum car Notes: Please confirm Welding For 66.5cm unit Material: Aluminum before Color: Silver specifics silver Part the Frame?material: Aluminum Flow Receiver purchasing Tillman on Oulet AC Warranty: 3 Year Size: Block Brand: Aftermarket Number: CB164522S01 Sleeve capacitor 40円 Other 36.2cm Condenser not Condition: New Placement Number: CU4560 Type: AC Core Bundle: No Replacement Tank ac S10 Vehicle: Front Superseded X1 Material: Aluminum Interchange Replacem Item compatibility C Configuration: Parallel Inlet Fitment truck Fitting 2-Door condenser Manufacturer Type: Direct Height: 14.26in. Included: No Length: 26.19in. 2.6cm new


Selected Publications

See also:
Interactive explorer - Full list - Grouped - Google Scholar - Pubmed
P213. Single-cell profiling of the human primary motor cortex in ALS and FTLD (pdf)

    Pineda, Lee, Fitzwalter, Mohammadi, Pregent, Gardashli, Mantero, Engelberg-Cook, DeJesus-Hernandez, vanBlitterswijk, Pottier, Rademakers, Oskarsson, Shah, Petersen, Graff-Radford, Boeve, Knopman, Josephs, DeTure, Murray, Dickson, Heiman, Belzil, Kellis

    Amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration (FTLD) are two devastating and fatal neurodegenerative conditions. While distinct, they share many clinical, genetic, and pathological characteristics1, and both show selective vulnerability of layer 5b extratelencephalic-projecting cortical populations, including Betz cells in ALS2,3 and von Economo neurons (VENs) in FTLD4,5. Here, we report the first high resolution single-cell atlas of the human primary motor cortex (MCX) and its transcriptional alterations in ALS and FTLD across ~380,000 nuclei from 64 individuals, including 17 control samples and 47 sporadic and C9orf72-associated ALS and FTLD patient samples. We identify 46 transcriptionally distinct cellular subtypes including two Betz-cell subtypes, and we observe a previously unappreciated molecular similarity between Betz cells and VENs of the prefrontal cortex (PFC) and frontal insula. Many of the dysregulated genes and pathways are shared across excitatory neurons, including stress response, ribosome function, oxidative phosphorylation, synaptic vesicle cycle, endoplasmic reticulum protein processing, and autophagy. Betz cells and SCN4B+ long-range projecting L3/L5 cells are the most transcriptionally affected in both ALS and FTLD. Lastly, we find that the VEN/Betz cell-enriched transcription factor, POU3F1, has altered subcellular localization, co-localizes with TDP-43 aggregates, and may represent a cell type-specific vulnerability factor in the Betz cells of ALS and FTLD patient tissues.

    bioRxiv 2021.07.07.451374; November 9, 2020; doi.org/10.1101/2021.07.07.451374

P212. Single-cell anatomical analysis of human hippocampus and entorhinal cortex uncovers early-stage molecular pathology in Alzheimer's disease (pdf)

    Jose Davila-Velderrain, Hansruedi Mathys, Shahin Mohammadi, Brad Ruzicka, Xueqiao Jiang, Ayesha Ng, David A. Bennett, Li-Huei Tsai, Manolis Kellis

    The human hippocampal formation plays a central role in Alzheimer's disease (AD) progression, cognitive traits, and the onset of dementia; yet its molecular states in AD remain uncharacterized. Here, we report a comprehensive single-cell transcriptomic dissection of the human hippocampus and entorhinal cortex across 489,558 cells from 65 individuals with varying stages of AD pathology. We transcriptionally characterize major brain cell types and neuronal classes, including 17 glutamatergic and 8 GABAergic neuron subpopulations. Combining evidence from human and mouse tissue-microdissection, neuronal cell isolation and spatial transcriptomics, we show that single-cell expression patterns capture fine-resolution neuronal anatomical topography. By stratifying subjects into early and late pathology groups, we uncover stage-dependent and cell-type specific transcriptional modules altered during AD progression. These include early-stage cell-type specific dysregulation of cellular and cholesterol metabolism, late-stage neuron-glia alterations in neurotransmission, and late-stage signatures of cellular stress, apoptosis, and DNA damage broadly shared across cell types. Late-stage signatures show signs of convergence in hippocampal and cortical cells, while early changes diverge; highlighting the relevance of characterizing molecular pathology across brain regions and AD progression. Finally, we characterize neuron subregion-specific responses to AD pathology and show that CA1 pyramidal neurons are the most transcriptionally altered while CA3 and dentate gyrus granule neurons the least. Our study provides a valuable resource to extend cell type-specific studies of AD to clinically relevant brain regions affected early by pathology in disease progression.

    bioRxiv 2021.07.01.450715; November 9, 2020; doi.org/10.1101/2021.07.01.450715

P210. Single-cell dissection of live human hearts in ischemic heart disease and heart failure reveals cell-type-specific driver genes and pathways (pdf)

    Linna-Kuosmanen, Schmauch, Galani, Boix, Hou, �rd, Toropainen, Stolze, Meibalan, Mantero, Renfro, Ojanen, Agudelo, Hollmen, Jalkanen, Gunn, Tavi, Romanoski, MacRae, Kaikkonen, Garcia-Cardena, Kiviniemi, Kellis

    Ischemic heart disease is the single most common cause of death worldwide with an annual death rate of over 9 million people. Genome-wide association studies have uncovered over 200 genetic loci underlying the disease, providing a deeper understanding of the causal mechanisms leading to it. However, in order to understand ischemic heart disease at the cellular and molecular level, it is necessary to identify the cell-type-specific circuits enabling dissection of driver variants, genes, and signaling pathways in normal and diseased tissues. Here, we provide the first detailed single-cell dissection of the cell types and disease-associated gene expression changes in the living human heart, using cardiac biopsies collected during open-heart surgery from control, ischemic heart disease, and ischemic and non-ischemic heart failure patients. We identify 84 cell types/states, grouped in 12 major cell types. We define markers for each cell type, providing the first extensive reference set for the live human heart. These major cell types include cardiovascular cells (cardiomyocytes, endothelial cells, fibroblasts), rarer cell types (B lymphocytes, neurons, Schwann cells), and rich populations of previously understudied layer-specific epicardial and endocardial cells. In addition, we reveal substantial differences in disease-associated gene expression at the cell subtype level, revealing t arterial pericytes as having a central role in the pathogenesis of ischemic heart disease and heart failure. Our results demonstrate the importance of high-resolution cellular subtype mapping in gaining mechanistic insight into human cardiovascular disease.

    bioRxiv 2021.06.23.449672; November 9, 2020; doi.org/10.1101/2021.06.23.449672

P207. Single-cell dissection of the human cerebrovasculature in health and disease (pdf)

    Garcia, Sun, Lee, Godlewski, Galani, Mantero, Bennett, Sahin, Kellis, Heiman

    Despite the importance of the blood-brain barrier in maintaining normal brain physiology and in understanding neurodegeneration and CNS drug delivery, human cerebrovascular cells remain poorly characterized due to their sparsity and dispersion. Here, we perform the first single-cell characterization of the human cerebrovasculature using both ex vivo fresh-tissue experimental enrichment and post mortem in silico sorting of human cortical tissue samples. We capture 31,812 cerebrovascular cells across 17 subtypes, including three distinct subtypes of perivascular fibroblasts as well as vasculature-coupled neurons and glia. We uncover human-specific expression patterns along the arteriovenous axis and determine previously uncharacterized cell type-specific markers. We use our newly discovered human-specific signatures to study changes in 3,945 cerebrovascular cells of Huntington's disease patients, which reveal an activation of innate immune signaling in vascular and vasculature-coupled cell types and the concomitant reduction to proteins critical for maintenance of BBB integrity. Finally, our study provides a comprehensive resource molecular atlas of the human cerebrovasculature to guide future biological and therapeutic studies.

    bioRxiv 2021.04.26.440975; November 9, 2020; doi.org/10.1101/2021.04.26.440975

P203. Single-cell dissection of schizophrenia reveals neurodevelopmental-synaptic axis and transcriptional resilience (pdf)

    Ruzicka, Mohammadi, Davila-Velderrain, Subburaju, Tso, Hourihan, Kellis

    Schizophrenia is a devastating mental disorder with a high societal burden, complex pathophysiology, and diverse genetic and environmental risk factors. Its complexity, polygenicity, and small-effect-size and cell-type-specific contributors have hindered mechanistic elucidation and the search for new therapeutics. Here, we present the first single-cell dissection of schizophrenia, across 500,000+ cells from 48 postmortem human prefrontal cortex samples, including 24 schizophrenia cases and 24 controls. We annotate 20 cell types/states, providing a high-resolution atlas of schizophrenia-altered genes and pathways in each. We find neurons are the most affected cell type, with deep-layer cortico-cortical projection neurons and parvalbumin-expressing inhibitory neurons showing significant transcriptional changes converging on genetically-implicated regions. We discover a novel excitatory-neuron cell-state indicative of transcriptional resilience and enriched in schizophrenia subjects with less-perturbed transcriptional signatures. We identify key trans-acting factors as candidate drivers of observed transcriptional perturbations, including MEF2C, TCF4, SOX5, and SATB2, and map their binding patterns in postmortem human neurons. These factors regulate distinct gene sets underlying fetal neurodevelopment and adult synaptic function, bridging two leading models of schizophrenia pathogenesis. Our results provide the most detailed map to date for mechanistic understanding and therapeutic development in neuropsychiatric disorders.

    SIEMENS 6ES7 350-1AH01-0AE0 Simatic FM350 Counter Module S7-300

P205. Single-cell deconvolution of 3,000 post-mortem brain samples for eQTL and GWAS dissection in mental disorders (pdf)

    Park, He, Davila-Velderrain, Hou, Mohammadi, Mathys, Peng, Bennett, Tsai, Kellis

    Thousands of genetic variants acting in multiple cell types underlie complex disorders, yet most gene expression studies profile only bulk tissues, making it hard to resolve where genetic and non-genetic contributors act. This is particularly important for psychiatric and neurodegenerative disorders that impact multiple brain cell types with highly-distinct gene expression patterns and proportions. To address this challenge, we develop a new framework, SPLITR, that integrates single-nucleus and bulk RNA-seq data, enabling phenotype-aware deconvolution and correcting for systematic discrepancies between bulk and single-cell data. We deconvolved 3,387 post-mortem brain samples across 1,127 individuals and in multiple brain regions. We find that cell proportion varies across brain regions, individuals, disease status, and genotype, including genetic variants in TMEM106B that impact inhibitory neuron fraction and 4,757 cell-type-specific eQTLs. Our results demonstrate the power of jointly analyzing bulk and single-cell RNA-seq to provide insights into cell-type-specific mechanisms for complex brain disorders.

    bioRxiv 426000; January 21, 2021; doi.org/10.1101/2021.01.21.426000>

231. Regulatory genomic circuitry of human disease loci by integrative epigenomics (pdf)

    Boix, James, Park, Meuleman, Kellis

    Annotating the molecular basis of human disease remains an unsolved challenge, as 93% of disease loci are non-coding and gene-regulatory annotations are highly incomplete. Here we present EpiMap, a compendium comprising 10,000 epigenomic maps across 800 samples, which we used to define chromatin states, high-resolution enhancers, enhancer modules, upstream regulators and downstream target genes. We used this resource to annotate 30,000 genetic loci that were associated with 540 traits, predicting trait-relevant tissues, putative causal nucleotide variants in enriched tissue enhancers and candidate tissue-specific target genes for each. We partitioned multifactorial traits into tissue-specific contributing factors with distinct functional enrichments and disease comorbidity patterns, and revealed both single-factor monotropic and multifactor pleiotropic loci. Top-scoring loci frequently had multiple predicted driver variants, converging through multiple enhancers with a common target gene, multiple genes in common tissues, or multiple genes and multiple tissues, indicating extensive pleiotropy. Our results demonstrate the importance of dense, rich, high-resolution epigenomic annotations for the investigation of complex traits.

    Nature 590:300-307. Feb 3, 2021. doi: 10.1038/s41586-020-03145-z. PMID 33536621

P214. Metabolic resilience is encoded in genome plasticity (pdf)

    Agudelo, Tuyeras, Llinares, Morcuende, Park, Sun, Linna-Kuosmanen, Atabaki-Pasdar, Ho, Galani, Franks, Kutlu, Grove, Femenia, Kellis

    Metabolism plays a central role in evolution, as resource conservation is a selective pressure for fitness and survival. Resource-driven adaptations offer a good model to study evolutionary innovation more broadly. It remains unknown how resource-driven optimization of genome function integrates chromatin architecture with transcriptional phase transitions. Here we show that tuning of genome architecture and heterotypic transcriptional condensates mediate resilience to nutrient limitation. Network genomic integration of phenotypic, structural, and functional relationships reveals that fat tissue promotes organismal adaptations through metabolic acceleration chromatin domains and heterotypic PGC1A condensates. We find evolutionary adaptations in several dimensions; low conservation of amino acid residues within protein disorder regions, nonrandom chromatin location of metabolic acceleration domains, condensate-chromatin stability through cis-regulatory anchoring and encoding of genome plasticity in radial chromatin organization. We show that environmental tuning of these adaptations leads to fasting endurance, through efficient nuclear compartmentalization of lipid metabolic regions, and, locally, human-specific burst kinetics of lipid cycling genes. This process reduces oxidative stress, and fatty-acid mediated cellular acidification, enabling endurance of condensate chromatin conformations. Comparative genomics of genetic and diet perturbations reveal mammalian convergence of phenotype and structural relationships, along with loss of transcriptional control by diet-induced obesity. Further, we find that radial transcriptional organization is encoded in functional divergence of metabolic disease variant-hubs, heterotypic condensate composition, and protein residues sensing metabolic variation. During fuel restriction, these features license the formation of large heterotypic condensates that buffer proton excess, and shift viscoelasticity for condensate endurance. This mechanism maintains physiological pH, reduces pH-resilient inflammatory gene programs, and enables genome plasticity through transcriptionally driven cell-specific chromatin contacts. In vivo manipulation of this circuit promotes fasting-like adaptations with heterotypic nuclear compartments, metabolic and cell-specific homeostasis. In sum, we uncover here a general principle by which transcription uses environmental fluctuations for genome function, and demonstrate how resource conservation optimizes transcriptional self-organization through robust feedback integrators, highlighting obesity as an inhibitor of genome plasticity relevant for many diseases.

    bioRxiv 2021.06.25.449953; July 13, 2021; doi.org/10.1101/2021.06.25.449953>

226. Plasma-derived extracellular vesicle analysis and deconvolution enable prediction and tracking of melanoma checkpoint blockade outcome (pdf)

    Shi, Kasumova, Michaud, Cintolo-Gonzalez, D�az-Mart�nez, Ohmura, Mehta, Chien, Frederick, Cohen, Plana, Johnson, Flaherty, Sullivan, Kellis, Boland

    Immune checkpoint inhibitors (ICIs) show promise, but most patients do not respond. We identify and validate biomarkers from extracellular vesicles (EVs), allowing non-invasive monitoring of tumor- intrinsic and host immune status, as well as a prediction of ICI response. We undertook transcriptomic profiling of plasma-derived EVs and tumors from 50 patients with metastatic melanoma receiving ICI, and validated with an independent EV-only cohort of 30 patients. Plasma-derived EV and tumor transcriptomes correlate. EV profiles reveal drivers of ICI resistance and melanoma progression, exhibit differentially expressed genes/pathways, and correlate with clinical response to ICI. We created a Bayesian probabilistic deconvolution model to estimate contributions from tumor and non-tumor sources, enabling interpretation of differentially expressed genes/pathways. EV RNA-seq mutations also segregated ICI response. EVs serve as a non-invasive biomarker to jointly probe tumor-intrinsic and immune changes to ICI, function as predictive markers of ICI responsiveness, and monitor tumor persistence and immune activation.

    Science Advances 6(46):eabb3461. Nov 13, 2020. doi: 10.1126/sciadv.abb3461. PMID 33188016

P211. Cellular intelligence: dynamic specialization through non-equilibrium multi-scale compartmentalization (pdf)

    Tuyeras, Agudelo, Ram, Loon, Kutlu, Grove, Kellis

    Intelligence is usually associated with the ability to perceive, retain and use information to adapt to changes in one's environment. In this context, systems of living cells can be thought of as intelligent entities. Here, we show that the concepts of non-equilibrium tuning and compartmentalization are sufficient to model manifestations of cellular intelligence such as specialization, division, fusion and communication using the language of operads. We implement our framework as an unsupervised learning algorithm, IntCyt, which we show is able to memorize, organize and abstract reference machine-learning datasets through generative and self-supervised tasks. Overall, our learning framework captures emergent properties programmed in living systems, and provides a powerful new approach for data mining. Although intelligence has been given many definitions, we can associate it with the ability to perceive, retain, and use information to adapt to changes in one's environment. In this context, systems of living cells can be thought of as intelligent entities. While one can reasonably describe their adaptive abilities within the realm of homeostatic mechanisms, it is challenging to comprehend the principles governing their metabolic intelligence. In each organism, cells have indeed developed as many ways to adapt as there are cell types, and elucidating the impetus of their evolutionary behaviors could be the key to understanding life processes and likely diseases. The goal of this article is to propose principles for understanding cellular intelligence. Specifically, we show that the concepts of non-equilibrium tuning and compartmentalization are enough to recover cellular adaptive behaviors such as specialization, division, fusion, and communication. Our model has the advantage to encompass all scales of life, from organelles to organisms through systems of organs and cell assemblies. We achieve this flexibility using the language of operads, which provides an elegant framework for reasoning about nested systems and, as an emergent behavior, non-equilibrium compartmentalization. To demonstrate the validity and the practical utility of our model, we implement it in the form of an unsupervised learning algorithm, IntCyt, and apply it to reference machine learning datasets through generative and self-supervised tasks. We find that IntCyt's interpretability, plasticity and accuracy surpass that of a wide range of machine learning algorithms, thus providing a powerful approach for data mining. Our results indicate that the nested hierarchical language of operads captures the emergent properties of programmed cellular metabolism in the development of living systems, and provide a new biologically-inspired, yet practical and lightweight, computational paradigm for memorizing, organizing and abstracting datasets.

    bioRxiv 2021.06.25.449951; November 9, 2020; doi.org/10.1101/2021.06.25.449951

225. A multiresolution framework to characterize single-cell state landscapes (pdf)

    Mohammadi, Davila-Velderrain, Kellis

    Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet's superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.

    Nat Commun. 2020 Oct 26;11(1):5399. doi: 10.1038/s41467-020-18416-6.

223. SARS-CoV-2 Gene Content and COVID-19 Mutation Impact by Comparing 44 Sarbecovirus Genomes (pdf)

    Jungreis, Sealfon, Kellis

    Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution.

    NatureInReview rs.3.rs-80345. Oct 1, 2020. doi: 10.21203/rs.3.rs-80345/v1. Preprint. PMID 33024961 PMC7536840

222. Mapping the Epigenomic and Transcriptomic Interplay During Memory Formation and Recall in the Hippocampal Engram Ensemble (pdf)

    Marco, Meharena, Dileep, Raju, Davila-Velderrain, Zhang, Adaikkan, Young, Gao, Kellis, Tsai

    The epigenome and three-dimensional (3D) genomic architecture are emerging as key factors in the dynamic regulation of different transcriptional programs required for neuronal functions. In this study, we used an activity-dependent tagging system in mice to determine the epigenetic state, 3D genome architecture and transcriptional landscape of engram cells over the lifespan of memory formation and recall. Our findings reveal that memory encoding leads to an epigenetic priming event, marked by increased accessibility of enhancers without the corresponding transcriptional changes. Memory consolidation subsequently results in spatial reorganization of large chromatin segments and promoter-enhancer interactions. Finally, with reactivation, engram neurons use a subset of de novo long-range interactions, where primed enhancers are brought in contact with their respective promoters to upregulate genes involved in local protein translation in synaptic compartments. Collectively, our work elucidates the comprehensive transcriptional and epigenomic landscape across the lifespan of memory formation and recall in the hippocampal engram ensemble.

    Nature Neuroscience. Oct 5, 2020. doi: 10.1038/s41593-020-00717-0. Online ahead of print. PMID 33020654

210. Reconstruction of the Human Blood-Brain Barrier in Vitro Reveals a Pathogenic Mechanism of APOE4 in Pericytes (pdf)

    Blanchard, Bula, Davila-Velderrain, Akay, Zhu, Frank, Victor, Bonner, Mathys, Lin, Ko, Bennett, Cam, Kellis, Tsai

    In Alzheimer's disease, amyloid deposits along the brain vasculature lead to a condition known as cerebral amyloid angiopathy (CAA), which impairs blood-brain barrier (BBB) function and accelerates cognitive degeneration. Apolipoprotein (APOE4) is the strongest risk factor for CAA, yet the mechanisms underlying this genetic susceptibility are unknown. Here we developed an induced pluripotent stem cell-based three-dimensional model that recapitulates anatomical and physiological properties of the human BBB in vitro. Similarly to CAA, our in vitro BBB displayed significantly more amyloid accumulation in APOE4 compared to APOE3. Combinatorial experiments revealed that dysregulation of calcineurin-nuclear factor of activated T cells (NFAT) signaling and APOE in pericyte-like mural cells induces APOE4-associated CAA pathology. In the human brain, APOE and NFAT are selectively dysregulated in pericytes of APOE4 carriers, and inhibition of calcineurin-NFAT signaling reduces APOE4-associated CAA pathology in vitro and in vivo. Our study reveals the role of pericytes in APOE4-mediated CAA and highlights calcineurin-NFAT signaling as a therapeutic target in CAA and Alzheimer's disease.

    Nature Medicine 26(6):952-963. Jun 2020. doi: 10.1038/s41591-020-0886-4. Epub June 8, 2020. PMID 32514169

206. Inferring Multimodal Latent Topics From Electronic Health Records (pdf)

    Li, Nair, Lu, Wen, Wang, Dehaghi, Miao, Liu, Ordog, Biernacka, Ryu, Olson, Frye, Liu, Guo, Marelli, Ahuja, Davila-Velderrain, Kellis

    Electronic health records (EHR) are rich heterogeneous collections of patient health information, whose broad adoption provides clinicians and researchers unprecedented opportunities for health informatics, disease-risk prediction, actionable clinical recommendations, and precision medicine. However, EHRs present several modeling challenges, including highly sparse data matrices, noisy irregular clinical notes, arbitrary biases in billing code assignment, diagnosis-driven lab tests, and heterogeneous data types. To address these challenges, we present MixEHR, a multi-view Bayesian topic model. We demonstrate MixEHR on MIMIC-III, Mayo Clinic Bipolar Disorder, and Quebec Congenital Heart Disease EHR datasets. Qualitatively, MixEHR disease topics reveal meaningful combinations of clinical features across heterogeneous data types. Quantitatively, we observe superior prediction accuracy of diagnostic codes and lab test imputations compared to the state-of-art methods. We leverage the inferred patient topic mixtures to classify target diseases and predict mortality of patients in critical conditions. In all comparison, MixEHR confers competitive performance and reveals meaningful disease-related topics.

    Nature Communications 11(1):2536. May 21, 2020. doi: 10.1038/s41467-020-16378-3. PMID 324398697 PMC7242436

205. Evidence for a Novel Overlapping Coding Sequence in POLG Initiated at a CUG Start Codon (pdf)

    Khan, Jungreis, Wright, Mudge, Choudhary, Firth, Kellis

    Background: POLG, located on nuclear chromosome 15, encodes the DNA polymerase gamma(Pol gamma). Pol gamma is responsible for the replication and repair of mitochondrial DNA (mtDNA). Pol gamma is the only DNA polymerase found in mitochondria for most animal cells. Mutations in POLG are the most common single-gene cause of diseases of mitochondria and have been mapped over the coding region of the POLG ORF. Results: Using PhyloCSF to survey alternative reading frames, we found a conserved coding signature in an alternative frame in exons 2 and 3 of POLG, herein referred to as ORF-Y that arose de novo in placental mammals. Using the synplot2 program, synonymous site conservation was found among mammals in the region of the POLG ORF that is overlapped by ORF-Y. Ribosome profiling data revealed that ORF-Y is translated and that initiation likely occurs at a CUG codon. Inspection of an alignment of mammalian sequences containing ORF-Y revealed that the CUG codon has a strong initiation context and that a well-conserved predicted RNA stem-loop begins 14 nucleotides downstream. Such features are associated with enhanced initiation at near-cognate non-AUG codons. Reanalysis of the Kim et al. (2014) draft human proteome dataset yielded two unique peptides that map unambiguously to ORF-Y. An additional conserved uORF, herein referred to as ORF-Z, was also found in exon 2 of POLG. Lastly, we surveyed Clinvar variants that are synonymous with respect to the POLG ORF and found that most of these variants cause amino acid changes in ORF-Y or ORF-Z. Conclusions: We provide evidence for a novel coding sequence, ORF-Y, that overlaps the POLG ORF. Ribosome profiling and mass spectrometry data show that ORF-Y is expressed. PhyloCSF and synplot2 analysis show that ORF-Y is subject to strong purifying selection. An abundance of disease-correlated mutations that map to exons 2 and 3 of POLG but also affect ORF-Y provides potential clinical significance to this finding.

    BMC Genetics 21(1):25. Mar 6, 2020. doi: 10.1186/s12863-020-0828-7. PMID 32138667 PMC7059407

195. Reconstruction of Cell-Type-Specific Interactomes at Single-Cell Resolution (pdf)

    Mohammadi, Davila-Velderrain, Kellis

    The human interactome is instrumental in the systems-level study of the cell and the contextualization of disease-associated gene perturbations. However, reference organismal interactomes do not capture the cell-type-specific context in which proteins and modules preferentially act. Here, we introduce SCINET, a computational framework that reconstructs an ensemble of cell-type-specific interactomes by integrating a global, context-independent reference interactome with a single-cell gene-expression profile. SCINET addresses technical challenges of single-cell data by robustly imputing, transforming, and normalizing the initially noisy and sparse expression of data. Inferred cell-level gene interaction probabilities and group-level interaction strengths define cell-type-specific interactomes. We use SCINET to reconstruct and analyze interactomes of the major human brain and immune cell types, revealing specificity and modularity of perturbations associated with neurodegenerative, neuropsychiatric, and autoimmune disorders. We report cell-type interactomes for brain and immune cell types, together with the SCINET package.

    Cell Systems 9(6):559-568.e4. Dec 18, 2019. doi: 10.1016/j.cels.2019.10.007. Epub Nov 27, 2019. PMID 31786210 PMC6943823 (available on 12-18-2020)

201P. Causal Mediation Analysis Leveraging Multiple Types of Summary Statistics Data (pdf)

    Park, Sarkar, Nguyen, Kellis

    Summary statistics of genome-wide association studies (GWAS) teach causal relationship between millions of genetic markers and tens and thousands of phenotypes. However, underlying biological mechanisms are yet to be elucidated. We can achieve necessary interpretation of GWAS in a causal mediation framework, looking to establish a sparse set of mediators between genetic and downstream variables, but there are several challenges. Unlike existing methods rely on strong and unrealistic assumptions, we tackle practical challenges within a principled summary-based causal inference framework. We analyzed the proposed methods in extensive simulations generated from real-world genetic data. We demonstrated only our approach can accurately redeem causal genes, even without knowing actual individual-level data, despite the presence of competing non-causal trails.

    arXiv:1901.08540. Jan 24, 2019.

199P. A latent topic model for mining heterogenous non-randomly missing electronic health records data (pdf)

    Li, Kellis

    Electronic health records (EHR) are rich heterogeneous collection of patient health information, whose broad adoption provides great opportunities for systematic health data mining. However, heterogeneous EHR data types and biased ascertainment impose computational challenges. Here, we present mixEHR, an unsupervised generative model integrating collaborative filtering and latent topic models, which jointly models the discrete distributions of data observation bias and actual data using latent disease-topic distributions. We apply mixEHR on 12.8 million phenotypic observations from the MIMIC dataset, and use it to reveal latent disease topics, interpret EHR results, impute missing data, and predict mortality in intensive care units. Using both simulation and real data, we show that mixEHR outperforms previous methods and reveals meaningful multi-disease insights

    arXiv:1811.00464. Nov 1, 2018

194P. A Bayesian approach to mediation analysis predicts 206 causal target genes in Alzheimer's disease (pdf)

    Park, Sarkar, He, Davila-Velderrain, De Jager, Kellis

    Characterizing the intermediate phenotypes, such as gene expression, that mediate genetic effects on complex diseases is a fundamental problem in human genetics. Existing methods utilize genotypic data and summary statistics to identify putative disease genes, but cannot distinguish pleiotropy from causal mediation and are limited by overly strong assumptions about the data. To overcome these limitations, we develop Causal Multivariate Mediation within Extended Linkage disequilibrium (CaMMEL), a novel Bayesian inference framework to jointly model multiple mediated and unmediated effects relying only on summary statistics. We show in simulation that CaMMEL accurately distinguishes between mediating and pleiotropic genes unlike existing methods. We applied CaMMEL to Alzheimer's disease (AD) and found 206 causal genes in sub-threshold loci (p < 1e-4). We prioritized 21 genes which mediate at least 5% of local genetic variance, disrupting innate immune pathways in AD.

    bioRxiv 219428. Dec 1, 2017. doi.org/10.1101/219428

194. Single-cell transcriptomic atlas of the human retina identifies cell types associated with age-related macular degeneration (pdf)

    Menon, Mohammadi, Davila-Velderrain, Goods, Cadwell, Xing, Stemmer-Rachamimov, Shalek, Love, Kellis, Hafler

    Genome-wide association studies (GWAS) have identified genetic variants associated with age-related macular degeneration (AMD), one of the leading causes of blindness in the elderly. However, it has been challenging to identify the cell types associated with AMD given the genetic complexity of the disease. Here we perform massively parallel single-cell RNA sequencing (scRNA-seq) of human retinas using two independent platforms, and report the first single-cell transcriptomic atlas of the human retina. Using a multi-resolution network-based analysis, we identify all major retinal cell types, and their corresponding gene expression signatures. Heterogeneity is observed within macroglia, suggesting that human retinal glia are more diverse than previously thought. Finally, GWAS-based enrichment analysis identifies glia, vascular cells, and cone photoreceptors to be associated with the risk of AMD. These data provide a detailed analysis of the human retina, and show how scRNA-seq can provide insight into cell types involved in complex, inflammatory genetic diseases

    Nature Communications 10(1):4902, Oct 25 2019. doi: 10.1038/s41467-019-12780-8

193. Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci (pdf)

    Mudge, Jungreis, Hunt, Gonzalez, Wright, Kay, Davidson, Fitzgerald, Seal, Tweedie, He, Waterhouse, Li, Bruford, Choudhary, Frankish, Kellis

    The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely-used tool to identify evolutionary signatures of protein-coding regions using multi-species genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine-learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyse over 1000 high-scoring human PhyloCSF regions, and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously-annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously-undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic datasets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions, by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein-altering. Altogether, our PhyloCSF datasets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterisation.

    Genome Research, Sep 19, 2019, gr.246462.118

192. Joint profiling of DNA methylation and chromatin architecture in single cells (pdf)

    Li, Liu, Zhang, Kubo, Yu, Fang, Kellis, Ren

    We report a molecular assay, Methyl-HiC, that can simultaneously capture the chromosome conformation and DNA methylome in a cell. Methyl-HiC reveals coordinated DNA methylation status between distal genomic segments that are in spatial proximity in the nucleus, and delineates heterogeneity of both the chromatin architecture and DNA methylome in a mixed population. It enables simultaneous characterization of cell-type-specific chromatin organization and epigenome in complex tissues

    Traxxas Nitro Slash 3.3 EZ-Start Wand & Battery 5280 revo t-maxx

191. Integrative construction of regulatory region networks in 127 human reference epigenomes by matrix factorization. (pdf)

    Liu, Davila-Velderrain, Zhang, Kellis

    Despite large experimental and computational efforts aiming to dissect the mechanisms underlying disease risk, mapping cis-regulatory elements to target genes remains a challenge. Here, we introduce a matrix factorization framework to integrate physical and functional interaction data of genomic segments. The framework was used to predict a regulatory network of chromatin interaction edges linking more than 20 000 promoters and 1.8 million enhancers across 127 human reference epigenomes, including edges that are present in any of the input datasets. Our network integrates functional evidence of correlated activity patterns from epigenomic data and physical evidence of chromatin interactions. An important contribution of this work is the representation of heterogeneous data with different qualities as networks. We show that the unbiased integration of independent data sources suggestive of regulatory interactions produces meaningful associations supported by existing functional and physical evidence, correlating with expected independent biological features.

    Nucleic Acids Research 47(14):7235-7246, Aug 22 2019. doi: 10.1093/nar/gkz538

189. Elucidation of Codon Usage Signatures across the Domains of Life (pdf)

    Novoa, Jungreis, Jaillon, Kellis

    Due to the degeneracy of the genetic code, multiple codons are translated into the same amino acid. Despite being 'synonymous', these codons are not equally used. Selective pressures are thought to drive the choice among synonymous codons within a genome, while GC content, which is typically attributed to mutational drift, is the major determinant of variation across species. Here we find that in addition to GC content, inter-species codon usage signatures can also be detected. More specifically, we show that a single amino acid, arginine, is the major contributor to codon usage bias differences across domains of life. We then exploit this finding, and show that domain-specific codon bias signatures can be used to classify a given sequence into its corresponding domain of life with high accuracy. We then wondered whether the inclusion of codon usage codon autocorrelation patterns, which reflects the non-random distribution of codon occurrences throughout a transcript, might improve the classification performance of our algorithm. However, we find that autocorrelation patterns are not domain-specific, and surprisingly, are unrelated to tRNA reusage, in contrast to previous reports. Instead, our results suggest that codon autocorrelation patterns are a by-product of codon optimality throughout a sequence, where highly expressed genes display autocorrelated 'optimal' codons, whereas lowly expressed genes display autocorrelated 'non-optimal' codons.

    Molecular Biology and Evolution. May 20, 2019. 10.1093/molbev/msz124

188. Rate of brain aging and APOE e4 are synergistic risk factors for Alzheimer's disease (pdf)

    Glorioso, Pfenning, Lee, Bennett, Sibille, Kellis, Guarente

    Advanced age and the APOE e4 allele are the two biggest risk factors for Alzheimer's disease (AD) and declining cognitive function. We describe a universal gauge to measure molecular brain age using transcriptome analysis of four human postmortem cohorts (n = 673, ages 25-97) free of neurological disease. In a fifth cohort of older subjects with or without neurological disease (n = 438, ages 67-108), we show that subjects with brains deviating in the older direction from what would be expected based on chronological age show an increase in AD, Parkinson's disease, and cognitive decline. Strikingly, a younger molecular age (-5 yr than chronological age) protects against AD even in the presence of APOE e4 An established DNA methylation gauge for age correlates well with the transcriptome gauge for determination of molecular age and assigning deviations from the expected. Our results suggest that rapid brain aging and APOE e4 are synergistic risk factors, and interventions that slow aging may substantially reduce risk of neurological disease and decline even in the presence of APOE e4

    Life Sci Alliance 2(3). May 27, 2019. pii: e201900303. doi: 10.26508/lsa.201900303

186. Single-cell transcriptomic analysis of Alzheimer's disease (pdf)

    Mathys*, Davila-Velderrain*, Peng, Gao, Mohammadi, Young, Menon, He, Abdurrob, Jiang, Martorell, Ransohoff, Hafler, Bennett, Kellis*, Tsai*

    Alzheimer's disease is a pervasive neurodegenerative disorder, the molecular complexity of which remains poorly understood. Here, we analysed 80,660 single-nucleus transcriptomes from the prefrontal cortex of 48 individuals with varying degrees of Alzheimer's disease pathology. Across six major brain cell types, we identified transcriptionally distinct subpopulations, including those associated with pathology and characterized by regulators of myelination, inflammation, and neuron survival. The strongest disease-associated changes appeared early in pathological progression and were highly cell-type specific, whereas genes upregulated at late stages were common across cell types and primarily involved in the global stress response. Notably, we found that female cells were overrepresented in disease-associated subpopulations, and that transcriptional responses were substantially different between sexes in several cell types, including oligodendrocytes. Overall, myelination-related processes were recurrently perturbed in multiple cell types, suggesting that myelination has a key role in Alzheimer's disease pathophysiology. Our single-cell transcriptomic resource provides a blueprint for interrogating the molecular and cellular basis of Alzheimer's disease

    Nature. May 1, 2019. doi: 10.1038/s41586-019-1195-2

182. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human (pdf)

    Wang, He, Goggin, Saadat, Wang, Sinnott-Armstrong, Claussnitzer*, Kellis*

    Genome-wide epigenomic maps have revealed millions of putative enhancers and promoters, but experimental validation of their function and high-resolution dissection of their driver nucleotides remain limited. Here, we present HiDRA (High-resolution Dissection of Regulatory Activity), a combined experimental and computational method for high-resolution genome-wide testing and dissection of putative regulatory regions. We test ~7 million accessible DNA fragments in a single experiment, by coupling accessible chromatin extraction with self-transcribing episomal reporters (ATAC-STARR-seq). By design, fragments are highly overlapping in densely-sampled accessible regions, enabling us to pinpoint driver regulatory nucleotides by exploiting differences in activity between partially-overlapping fragments using a machine learning model (SHARPR-RE). In GM12878 lymphoblastoid cells, we find ~65,000 regions showing enhancer function, and pinpoint ~13,000 high-resolution driver elements. These are enriched for regulatory motifs, evolutionarily-conserved nucleotides, and disease-associated genetic variants from genome-wide association studies. Overall, HiDRA provides a high-throughput, high-resolution approach for dissecting regulatory regions and driver nucleotides

    Nature Communications 9(1):5380. Dec 19, 2018. doi: 10.1038/s41467-018-07746-1

176. Allele-specific epigenome maps reveal sequence-dependent stochastic switching at regulatory loci (Vtg 70s Kids Billy The Kids Regular Permanent Press Pants Sz 12)

    Onuchic, Lurie, Carrero, Pawliczek, Patel, Rozowsky, Galeev, Huang, Altshuler, Zhang, Harris, Coarfa, Ashmore, Bertol, Fakhouri, Yu, Kellis, Gerstein, Milosavljevic

    To assess the impact of genetic variation in regulatory loci on human health, we constructed a high-resolution map of allelic imbalances in DNA methylation, histone marks, and gene transcription in 71 epigenomes from 36 distinct cell and tissue types from 13 donors. Deep whole-genome bisulfite sequencing of 49 methylomes revealed sequence-dependent CpG methylation imbalances at thousands of heterozygous regulatory loci. Such loci are enriched for stochastic switching, which is defined as random transitions between fully methylated and unmethylated states of DNA. The methylation imbalances at thousands of loci are explainable by different relative frequencies of the methylated and unmethylated states for the two alleles. Further analyses provided a unifying model that links sequence-dependent allelic imbalances of the epigenome, stochastic switching at gene regulatory loci, and disease-associated genetic variation

    Science 361(6409). Sep 28, 2018. pii: eaar3146. doi: 10.1126/science.aar3146. Epub 2018 Aug 23

175. Analyses of mRNA structure dynamics identify embryonic gene regulatory programs (pdf)

    Beaudoin, Novoa, Vejnar, Yartseva, Takacs, Kellis, Giraldez

    RNA folding plays a crucial role in RNA function. However, knowledge of the global structure of the transcriptome is limited to cellular systems at steady state, thus hindering the understanding of RNA structure dynamics during biological transitions and how it influences gene function. Here, we characterized mRNA structure dynamics during zebrafish development. We observed that on a global level, translation guides structure rather than structure guiding translation. We detected a decrease in structure in translated regions and identified the ribosome as a major remodeler of RNA structure in vivo. In contrast, we found that 3' untranslated regions (UTRs) form highly folded structures in vivo, which can affect gene expression by modulating microRNA activity. Furthermore, dynamic 3'-UTR structures contain RNA-decay elements, such as the regulatory elements in nanog and ccna1, two genes encoding key maternal factors orchestrating the maternal-to-zygotic transition. These results reveal a central role of RNA structure dynamics in gene regulatory programs.

    Nature Structural Molecular Biology 25(8):677-686. Aug 2018. doi: 10.1038/s41594-018-0091-z. Epub 2018 Jul 30.

171. ncdDetect2: Improved models of the site-specific mutation rate in cancer and driver detection with robust significance evaluation (pdf)

    Juul, Madsen, Guo, Bertl, Hobolth, Kellis, Pedersen

    Understanding the mutational processes that act during cancer development is a key topic of cancer biology. Nevertheless, much remains to be learned, as a complex interplay of processes with dependencies on a range of genomic features creates highly heterogeneous cancer genomes. Accurate driver detection relies on unbiased models of the mutation rate that also capture rate variation from uncharacterised. Here, we analyze patterns of observed-to-expected mutation counts across 505 whole cancer genomes, and find that genomic features missing from our mutation-rate model likely operate on a megabase length scale. We extend our site-specific model of the mutation rate to include the additional variance from these sources, which leads to robust significance evaluation of candidate cancer drivers. We thus present ncdDetect v.2, with greatly improved cancer driver detection specificity. Finally, we show that ranking candidates by their posterior mean value of their effect sizes offers an equivalent and more computationally efficient alternative to ranking by their p-values. ncdDetect v.2 is implemented as an R-package and is freely available at http://github.com/TobiasMadsen/ncdDetect2

    Bioinformatics. 2018 Jun 26. doi: 10.1093/bioinformatics/bty511

169. Stop codon readthrough generates a C-terminally extended variant of the human vitamin D receptor with reduced calcitriol response (pdf)

    Loughran, Jungreis, Tzani, Power, Dmitriev, Ivanov, Kellis, Atkins

    Although stop codon readthrough is used extensively by viruses to expand their gene expression, verified instances of mammalian readthrough have only recently been uncovered by systems biology and comparative genomics approaches. Previously our analysis of conserved protein coding signatures that extend beyond annotated stop codons predicted stop codon readthrough of several mammalian genes, all of which have been validated experimentally. Four mRNAs display highly efficient stop codon readthrough, and these mRNAs have a UGA stop codon immediately followed by CUAG (UGA_CUAG) that is conserved throughout vertebrates. Extending on the identification of this readthrough motif, we here investigated stop codon readthrough, using tissue culture reporter assays, for all previously untested human genes containing UGA_CUAG. The readthrough efficiency of the annotated stop codon for the sequence encoding vitamin D receptor (VDR) was 6.7%. It was the highest of those tested but all showed notable levels of readthrough. The VDR is a member of the nuclear receptor superfamily of ligand-inducible transcription factors and binds its major ligand, calcitriol, via its C-terminal ligand-binding domain. Readthrough of the annotated VDR mRNA results in a 67 amino-acid-long C-terminal extension that generates a VDR proteoform named VDRx. VDRx may form homodimers and heterodimers with VDR but, compared to VDR, VDRx displayed a reduced transcriptional response to calcitriol even in the presence of its partner retinoid X receptor

    Zero xposur Plaid Shorts NWT 18 UPF 50+ new

166. Chromatin-state discovery and genome annotation with ChromHMM (pdf)

    Ernst, Kellis

    Noncoding DNA regions have central roles in human biology, evolution, and disease. ChromHMM helps to annotate the noncoding genome using epigenomic information across one or multiple cell types. It combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type. ChromHMM learns chromatin-state signatures using a multivariate hidden Markov model (HMM) that explicitly models the combinatorial presence or absence of each mark. ChromHMM uses these signatures to generate a genome-wide annotation for each cell type by calculating the most probable state for each genomic segment. ChromHMM provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state. ChromHMM is distinguished by its modeling emphasis on combinations of marks, its tight integration with downstream functional enrichment analyses, its speed, and its ease of use. Chromatin states are learned, annotations are produced, and enrichments are computed within 1 day.

    Nat Protocols 12(12):2478-2492, Dec 2017. doi: 10.1038/nprot.2017.124

165. Evidence of reduced recombination rate in human regulatory domains (pdf)

    Liu, Sarkar, Kheradpour, Ernst, Kellis

    Recombination rate is non-uniformly distributed across the human genome. The variation of recombination rate at both fine and large scales cannot be fully explained by DNA sequences alone. Epigenetic factors, particularly DNA methylation, have recently been proposed to influence the variation in recombination rate. We study the relationship between recombination rate and gene regulatory domains, defined by a gene and its linked control elements. We define these links using expression quantitative trait loci (eQTLs), methylation quantitative trait loci (meQTLs), chromatin conformation from publicly available datasets (Hi-C and ChIA-PET), and correlated activity links that we infer across cell types. Each link type shows a "recombination rate valley" of significantly reduced recombination rate compared to matched control regions. This recombination rate valley is most pronounced for gene regulatory domains of early embryonic development genes, housekeeping genes, and constitutive regulatory elements, which are known to show increased evolutionary constraint across species. Recombination rate valleys show increased DNA methylation, reduced doublestranded break initiation, and increased repair efficiency, specifically in the lineage leading to the germ line. Moreover, by using only the overlap of functional links and DNA methylation in germ cells, we are able to predict the recombination rate with high accuracy. Our results suggest the existence of a recombination rate valley at regulatory domains and provide a potential molecular mechanism to interpret the interplay between genetic and epigenetic variations.

    Genome Biology 18(1):193, Oct 20, 2017. 10.1186/s13059-017-1308-x

158. Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease (pdf)

    eGTEx Consortium; Stranger, Brigham, Hasz, Hunter, Johns, Johnson, Kopen, Leinweber, Lonsdale, McDonald, Mestichelli, Myer, Roe, Salvatore, Shad, Thomas, Walters, Washington, Wheeler, Bridge, Foster, Gillard, Karasik, Kumar, Miklos, Moser, Jewell, Montroy, Rohrer, Valley, Davis, Mash, Gould, Guan, Koester, Little, Martin, Moore, Rao, Struewing, Volpi, Hansen, Hickey, Rizzardi, Hou, Liu, Molinie, Park, Rinaldi, Wang, Van, Claussnitzer, Gelfand, Li, Linder, Zhang, Smith, Tsang, Chen, Demanelis, Doherty, Jasmine, Kibriya, Jiang, Lin, Wang, Jian, Li, Chan, Bates, Diegel, Halow, Haugen, Johnson, Kaul, Lee, Maurano, Nelson, Neri, Sandstrom, Fernando, Linke, Oliva, Skol, Wu, Akey, Feinberg, Li, Pierce, Stamatoyannopoulos, Tang, Ardlie, Kellis, Snyder, Montgomery

    Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variant's cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health

    Nature Genetics, Oct 11, 2017 doi:10.1038/ng.3969

153. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions (pdf)

    Ernst, Melnikov, Zhang, Wang, Rogov, Mikkelsen, Kellis

    Massively parallel reporter assays (MPRAs) enable nucleotide-resolution dissection of transcriptional regulatory regions, such as enhancers, but only few regions at a time. Here we present a combined experimental and computational approach, Systematic high-resolution activation and repression profiling with reporter tiling using MPRA (Sharpr-MPRA), that allows high-resolution analysis of thousands of regions simultaneously. Sharpr-MPRA combines dense tiling of overlapping MPRA constructs with a probabilistic graphical model to recognize functional regulatory nucleotides, and to distinguish activating and repressive nucleotides, using their inferred contribution to reporter gene expression. We used Sharpr-MPRA to test 4.6 million nucleotides spanning 15,000 putative regulatory regions tiled at 5-nucleotide resolution in two human cell types. Our results recovered known cell-type-specific regulatory motifs and evolutionarily conserved nucleotides, and distinguished known activating and repressive motifs. Our results also showed that endogenous chromatin state and DNA accessibility are both predictive of regulatory function in reporter assays, identified retroviral elements with activating roles, and uncovered 'attenuator' motifs with repressive roles in active chromatin.

    Nature Biotechnology, AOP, Oct 3, 2016

151. Evolutionary dynamics of abundant stop codon readthrough (pdf)

    Jungreis, Chan, Waterhouse, Fields, Lin, Kellis

    Translational stop codon readthrough emerged as a major regulatory mechanism affecting hundreds of genes in animal genomes, based on recent comparative genomics and ribosomal profiling evidence, but its evolutionary properties remain unknown. Here, we leverage comparative genomic evidence across 21 Anopheles mosquitoes to systematically annotate readthrough genes in the malaria vector Anopheles gambiae, and to provide the first study of abundant readthrough evolution, by comparison with 20 Drosophila species. Using improved comparative genomics methods for detecting readthrough, we identify evolutionary signatures of conserved, functional readthrough of 353 stop codons in the malaria vector, Anopheles gambiae, and of 51 additional Drosophila melanogaster stop codons, including several cases of double and triple readthrough and of readthrough of two adjacent stop codons. We find that most differences between the readthrough repertoires of the two species arose from readthrough gain or loss in existing genes, rather than birth of new genes or gene death; that readthrough-associated RNA structures are sometimes gained or lost while readthrough persists; that readthrough is more likely to be lost at TAA and TAG stop codons; and that readthrough is under continued purifying evolutionary selection in mosquito, based on population genetic evidence. We also determine readthrough-associated gene properties that predate readthrough, and identify differences in the characteristic properties of readthrough genes between clades. We estimate more than 600 functional readthrough stop codons in mosquito and 900 in fruit fly, provide evidence of readthrough control of peroxisomal targeting, and refine the phylogenetic extent of abundant readthrough as following divergence from centipede.

    Molecular Biology and Evolution, Sep 7, 2016

150. Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases (pdf)

    Li, Kellis

    Genome wide association studies (GWAS) provide a powerful approach for uncovering disease-associated variants in human, but fine-mapping the causal variants remains a challenge. This is partly remedied by prioritization of disease-associated variants that overlap GWAS-enriched epigenomic annotations. Here, we introduce a new Bayesian model RiVIERA (Risk Variant Inference using Epigenomic Reference Annotations) for inference of driver variants from summary statistics across multiple traits using hundreds of epigenomic annotations. In simulation, RiVIERA promising power in detecting causal variants and causal annotations, the multi-trait joint inference further improved the detection power. We applied RiVIERA to model the existing GWAS summary statistics of 9 autoimmune diseases and Schizophrenia by jointly harnessing the potential causal enrichments among 848 tissue-specific epigenomics annotations from ENCODE/Roadmap consortium covering 127 cell/tissue types and 8 major epigenomic marks. RiVIERA identified meaningful tissue-specific enrichments for enhancer regions defined by H3K4me1 and H3K27ac for Blood T-Cell specifically in the nine autoimmune diseases and Brain-specific enhancer activities exclusively in Schizophrenia. Moreover, the variants from the 95% credible sets exhibited high conservation and enrichments for GTEx whole-blood eQTLs located within transcription-factor-binding-sites and DNA-hypersensitive-sites. Furthermore, joint modeling the nine immune traits by simultaneously inferring and exploiting the underlying epigenomic correlation between traits further improved the functional enrichments compared to single-trait models.

    Nucleic Acids Research gkw627, Jul 12, 2016

148. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures (pdf)

    Wang, Tucker, Rizki, Mills, Krijger, de Wit, Subramanian, Bartell, Nguyen, Ye, Leyton-Mange, Dolmatova, van der Harst, de Laat, Ellinor, Newton-Cheh, Milan, Kellis, Boyer

    Genetic variants identified by genome-wide association studies explain only a modest proportion of heritability, suggesting that meaningful associations lie 'hidden' below current thresholds. Here, we integrate information from association studies with epigenomic maps to demonstrate that enhancers significantly overlap known loci associated with the cardiac QT interval and QRS duration. We apply functional criteria to identify loci associated with QT interval that do not meet genome-wide significance and are missed by existing studies. We demonstrate that these 'sub-threshold' signals represent novel loci, and that epigenomic maps are effective at discriminating true biological signals from noise. We experimentally validate the molecular, gene-regulatory, cellular and organismal phenotypes of these sub-threshold loci, demonstrating that most sub-threshold loci have regulatory consequences and that genetic perturbation of nearby genes causes cardiac phenotypes in mouse. Our work provides a general approach for improving the detection of novel loci associated with complex human traits.

    eLife 5:e10557. May 10 2016. pii: e10557. doi: 10.7554/eLife.10557

142. HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. (Engine Timing Chain Tensioner Right ITM 60094 fits 1990 Infiniti) (scholar)

    Ward, Kellis

    More than 90% of common variants associated with complex traits do not affect proteins directly, but instead the circuits that control gene expression. This has increased the urgency of understanding the regulatory genome as a key component for translating genetic results into mechanistic insights and ultimately therapeutics. To address this challenge, we developed HaploReg (http://compbio.mit.edu/HaploReg) to aid the functional dissection of genome-wide association study (GWAS) results, the prediction of putative causal variants in haplotype blocks, the prediction of likely cell types of action, and the prediction of candidate target genes by systematic mining of comparative, epigenomic and regulatory annotations. Since first launching the website in 2011, we have greatly expanded HaploReg, increasing the number of chromatin state maps to 127 reference epigenomes from ENCODE 2012 and Roadmap Epigenomics, incorporating regulator binding data, expanding regulatory motif disruption annotations, and integrating expression quantitative trait locus (eQTL) variants and their tissue-specific target genes from GTEx, Geuvadis, and other recent studies. We present these updates as HaploReg v4, and illustrate a use case of HaploReg for attention deficit hyperactivity disorder (ADHD)-associated SNPs with putative brain regulatory mechanisms.

    Nucleic Acids Res. 2015 Dec 10. pii: gkv1340.

139. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans (pdf) (scholar)

    Claussnitzer, Dankel, Kim, Quon, Meuleman, Haugen, Glunk, Sousa, Beaudry, Puviindran, Abdennur, Liu, Svensson, Hsu, Drucker, Mellgren, Hui, Hauner, Kellis

    Genome-wide association studies can be used to identify disease-relevant genomic regions, but interpretation of the data is challenging. The FTO region harbors the strongest genetic association with obesity, yet the mechanistic basis of this association remains elusive. We examined epigenomic data, allelic activity, motif conservation, regulator expression, and gene coexpression patterns, with the aim of dissecting the regulatory circuitry and mechanistic basis of the association between the FTO region and obesity. We validated our predictions with the use of directed perturbations in samples from patients and from mice and with endogenous CRISPR-Cas9 genome editing in samples from patients. Our data indicate that the FTO allele associated with obesity represses mitochondrial thermogenesis in adipocyte precursor cells in a tissue-autonomous manner. The rs1421085 T-to-C single-nucleotide variant disrupts a conserved motif for the ARID5B repressor, which leads to derepression of a potent preadipocyte enhancer and a doubling of IRX3 and IRX5 expression during early adipocyte differentiation. This results in a cell-autonomous developmental shift from energy-dissipating beige (brite) adipocytes to energy-storing white adipocytes, with a reduction in mitochondrial thermogenesis by a factor of 5, as well as an increase in lipid storage. Inhibition of Irx3 in adipose tissue in mice reduced body weight and increased energy dissipation without a change in physical activity or appetite. Knockdown of IRX3 or IRX5 in primary adipocytes from participants with the risk allele restored thermogenesis, increasing it by a factor of 7, and overexpression of these genes had the opposite effect in adipocytes from nonrisk-allele carriers. Repair of the ARID5B motif by CRISPR-Cas9 editing of rs1421085 in primary adipocytes from a patient with the risk allele restored IRX3 and IRX5 repression, activated browning expression programs, and restored thermogenesis, increasing it by a factor of 7. Our results point to a pathway for adipocyte thermogenesis regulation involving ARID5B, rs1421085, IRX3, and IRX5, which, when manipulated, had pronounced pro-obesity and anti-obesity effects

    New England Journal of Medicine 373(10):895-907. Sep 3, 2015;

137. Deep learning for regulatory genomics (pdf) (scholar)

    Park, Kellis

    A fundamental unit of gene-regulatory control is the contact between a regulatory protein and its target DNA or RNA molecule. Biophysical models that directly predict these interactions are incomplete and confined to specific types of structures, but computational analysis of large-scale experimental datasets allows regulatory motifs to be identified by their over- representation in target sequences. In this issue, Alipanahi et al describe the use of a deep learning strategy to calculate protein-nucleic acid interactions from diverse experimental data sets. They show that their algorithm, called DeepBind, is broadly applicable and results in increased predictive power compared to traditional single-domain methods, and they use its predictions to discover regulatory motifs, to predict RNA editing and alternative splicing, and to interpret genetic variants. Looking beyond regulatory motifs, the current results illustrate the power of deep learning for biological data analysis in general. The approach can increase predictive power for specific tasks, integrate diverse datasets across data types, and provide greater generalization given the focus on representation learning and not simply classification accuracy. Systematic visualization and exploration of internal representations at each layer can yield mechanistic insights and guide new experiments and research directions. More broadly, deep learning can serve as a guiding principle to organize both hypothesis-driven research and exploratory investigation. For this potential to be realized, statistical and biological tasks must be integrated at all levels, including study design, experiment planning, model building and refinement, and data interpretation. and to interpret genetic variants

    Nature Biotechnology 33(8):825-6. Aug 7, 2015

134. Activity-Induced DNA Breaks Govern the Expression of Neuronal Early-Response Genes (pdf) (scholar)

    Madabhushi, Gao, Pfenning, Pan, Yamakawa, Seo, Rueda, Phan, Yamakawa, Pao, Stott, Gjoneska, Nott, Cho, Kellis, Tsai.

    Neuronal activity causes the rapid expression of immediate early genes that are crucial for experience-driven changes to synapses, learning, and memory. Here, using both molecular and genome-wide next-generation sequencing methods, we report that neuronal activity stimulation triggers the formation of DNA double strand breaks (DSBs) in the promoters of a subset of early-response genes, including Fos, Npas4, and Egr1. Generation of targeted DNA DSBs within Fos and Npas4 promoters is sufficient to induce their expression even in the absence of an external stimulus. Activity-dependent DSB formation is likely mediated by the type II topoisomerase, Topoisomerase IIbeta (Topo IIbeta), and knockdown of Topo IIbeta attenuates both DSB formation and early-response gene expression following neuronal stimulation. Our results suggest that DSB formation is a physiological event that rapidly resolves topological constraints to early-response gene expression in neurons

    Cell 161(7):1592-605. Jun 18, 2015

131. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans (pdf) (scholar)

    GTEx Consortium; Ardlie, Deluca, Segr�, Sullivan, Young, Gelfand, Trowbridge, Maller, Tukiainen, Lek, Ward, Kheradpour, Iriarte, Meng, Palmer, Esko, Winckler, Hirschhorn, Kellis, MacArthur, Getz, Shabalin, Li, Zhou, Nobel, Rusyn, Wright, Lappalainen, Ferreira, Ongen, Rivas, Battle, Mostafavi, Monlong, Sammeth, Mel�, Reverter, Goldmann, Koller, Guig�, McCarthy, Dermitzakis, Gamazon, Im, Konkashbaev, Nicolae, Cox, Flutre, Wen, Stephens, Pritchard, Tu, Zhang, Huang, Long, Lin, Yang, Zhu, Liu, Brown, Mestichelli, Tidwell, Lo, Salvatore, Shad, Thomas, Lonsdale, Moser, Gillard, Karasik, Ramsey, Choi, Foster, Syron, Fleming, Magazine, Hasz, Walters, Bridge, Miklos, Sullivan, Barker, Traino, Mosavel, Siminoff, Valley, Rohrer, Jewell, Branton, Sobin, Barcus, Qi, McLean, Hariharan, Um, Wu, Tabor, Shive, Smith, Buia, Undale, Robinson, Roche, Valentino, Britton, Burges, Bradbury, Hambright, Seleski, Korzeniewski, Erickson, Marcus, Tejada, Taherian, Lu, Basile, Mash, Volpi, Struewing, Temple, Boyer, Colantuoni, Little, Koester, Carithers, Moore, Guan, Compton, Sawyer, Demchok, Vaught, Rabiner, Lockhart, Ardlie, Getz, Wright, Kellis, Volpi, Dermitzakis

    Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues

    Science 348(6235):648-60. May 8, 2015

127. Integrative analysis of 111 reference human epigenomes (pdf) (scholar)

    Roadmap Epigenomics Consortium, Kundaje, Meuleman, Ernst, Bilenky, Yen, Heravi-Moussavi, Kheradpour, Zhang, Wang, Ziller, Amin, Whitaker, Schultz, Ward, Sarkar, Quon, Sandstrom, Eaton, Wu, Pfenning, Wang, Claussnitzer, Liu, Coarfa, Harris, Shoresh, Epstein, Gjoneska, Leung, Xie, Hawkins, Lister, Hong, Gascard, Mungall, Moore, Chuah, Tam, Canfield, Hansen, Kaul, Sabo, Bansal, Carles, Dixon, Farh, Feizi, Karlic, Kim, Kulkarni, Li, Lowdon, Elliott, Mercer, Neph, Onuchic, Polak, Rajagopal, Ray, Sallari, Siebenthall, Sinnott-Armstrong, Stevens, Thurman, Wu, Zhang, Zhou, Beaudet, Boyer, De Jager, Farnham, Fisher, Haussler, Jones, Li, Marra, McManus, Sunyaev, Thomson, Tlsty, Tsai, Wang, Waterland, Zhang, Chadwick, Bernstein, Costello, Ecker, Hirst, Meissner, Milosavljevic, Ren, Stamatoyannopoulos, Wang, Kellis

    The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease

    Nature 518:317-30. Feb 19, 2015 doi:10.1038/nature14248. PMID 25693563

126. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer's disease (pdf) (U.S. Range U36-6C 6 Burner Range W/Convection Oven, Used Excelle)

    Gjoneska, Pfenning, Mathys, Quon, Kundaje, Tsai, Kellis

    Alzheimer's disease (AD) is a severe age-related neurodegenerative disorder characterized by accumulation of amyloid-beta plaques and neurofibrillary tangles, synaptic and neuronal loss, and cognitive decline. Several genes have been implicated in AD, but chromatin state alterations during neurodegeneration remain uncharacterized. Here we profile transcriptional and chromatin state dynamics across early and late pathology in the hippocampus of an inducible mouse model of AD-like neurodegeneration. We find a coordinated downregulation of synaptic plasticity genes and regulatory regions, and upregulation of immune response genes and regulatory regions, which are targeted by factors that belong to the ETS family of transcriptional regulators, including PU.1. Human regions orthologous to increasing-level enhancers show immune-cell-specific enhancer signatures as well as immune cell expression quantitative trait loci, while decreasing-level enhancer orthologues show fetal-brain-specific enhancer activity. Notably, AD-associated genetic variants are specifically enriched in increasing-level enhancer orthologues, implicating immune processes in AD predisposition. Indeed, increasing enhancers overlap known AD loci lacking protein-altering variants, and implicate additional loci that do not reach genome-wide significance. Our results reveal new insights into the mechanisms of neurodegeneration and establish the mouse as a useful model for functional studies of AD regulatory regions

    Nature 518:365-9. Feb 19, 2015 doi: 10.1038/nature14252. PMID 25693568

124. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues (pdf) (scholar)

    Ernst, Kellis

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

    Nature Biotechnology Feb 18, 2015 doi 10.1038/nbt.3157 PMID 25690853

117. Comparative analysis of regulatory information and circuits across distant species (TAILGATE BOOT STRUT REAR KILEN 468010 P FOR VOLVO 740,760 2.3L,2) (scholar)

    Boyle, Araya, Brdlik, Cayting, Cheng, Cheng, Gardner, Hillier, Janette, Jiang, Kasper, Kawli, Kheradpour, Kundaje, Li, Ma, Niu, Rehm, Rozowsky, Slattery, Spokony, Terrell, Vafeados, Wang, Weisdepp, Wu, Xie, Yan, Feingold, Good, Pazin, Huang, Bickel, Brenner, Reinke, Waterston, Gerstein, White, Kellis, Snyder

    Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease

    Nature 512(7515):453-6, Aug 28, 2014

108. Defining functional DNA elements in the human genome (Bank of Canada One Dollar Bill, Ottawa 1954 Serial # Y/F 4407135) (scholar)

    Kellis, Wold, Snyder, Bernstein, Kundaje, Marinov, Ward, Birney, Crawford, Dekker, Dunham, Elnitski, Farnham, Feingold, Gerstein, Giddings, Gilbert, Gingeras, Green, Guigo, Hubbard, Kent, Lieb, Myers, Pazin, Ren, Stamatoyannopoulos, Weng, White, Hardison

    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

    PNAS Apr 23, 2014

107. Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals (pdf) (scholar)

    Washietl, Kellis*, Garber*

    Long intergenic noncoding RNAs (lincRNAs) play diverse regulatory roles in human development and disease, but little is known about their evolutionary history and constraint. Here, we characterize human lincRNA expression patterns in nine tissues across six mammalian species and multiple individuals. Of the 1898 human lincRNAs expressed in these tissues, we find orthologous transcripts for 80% in chimpanzee, 63% in rhesus, 39% in cow, 38% in mouse, and 35% in rat. Mammalian-expressed lincRNAs show remarkably strong conservation of tissue specificity, suggesting that it is selectively maintained. In contrast, abundant splice-site turnover suggests that exact splice sites are not critical. Relative to evolutionarily young lincRNAs, mammalian-expressed lincRNAs show higher primary sequence conservation in their promoters and exons, increased proximity to protein-coding genes enriched for tissue-specific functions, fewer repeat elements, and more frequent single-exon transcripts. Remarkably, we find that ~20% of human lincRNAs are not expressed beyond chimpanzee and are undetectable even in rhesus. These hominid-specific lincRNAs are more tissue specific, enriched for testis, and faster evolving within the human lineage.

    Genome Research 24(4):616-28, Jan 15, 2014

104. Energy-based RNA consensus secondary structure prediction in multiple sequence alignments (pdf) (scholar)

    Washietl, Bernhart, Kellis

    Many biologically important RNA structures are conserved in evolution leading to characteristic mutational patterns. RNAalifold is a widely used program to predict consensus secondary structures in multiple alignments by combining evolutionary information with traditional energy-based RNA folding algorithms. Here we describe the theory and applications of the RNAalifold algorithm. Consensus secondary structure prediction not only leads to significantly more accurate structure models, but it also allows to study structural conservation of functional RNAs.

    Methods Molecular Biology (1097):125-41, 2014

101. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo (pdf) (scholar)

    Rouskin, Zubradt, Washietl, Kellis, Weissman

    RNA has a dual role as an informational molecule and a direct effector of biological tasks. The latter function is enabled by RNA's ability to adopt complex secondary and tertiary folds and thus has motivated extensive computational1, 2 and experimental3, 4, 5, 6, 7, 8 efforts for determining RNA structures. Existing approaches for evaluating RNA structure have been largely limited to in vitro systems, yet the thermodynamic forces which drive RNA folding in vitro may not be sufficient to predict stable RNA structures in vivo5. Indeed, the presence of RNA-binding proteins and ATP-dependent helicases can influence which structures are present inside cells. Here we present an approach for globally monitoring RNA structure in native conditions in vivo with single-nucleotide precision. This method is based on in vivo modification with dimethyl sulphate (DMS), which reacts with unpaired adenine and cytosine residues9, followed by deep sequencing to monitor modifications. Our data from yeast and mammalian cells are in excellent agreement with known messenger RNA structures and with the high-resolution crystal structure of the Saccharomyces cerevisiae ribosome10. Comparison between in vivo and in vitro data reveals that in rapidly dividing cells there are vastly fewer structured mRNA regions in vivo than in vitro. Even thermostable RNA structures are often denatured in cells, highlighting the importance of cellular processes in regulating RNA structure. Indeed, analysis of mRNA structure under ATP-depleted conditions in yeast shows that energy-dependent processes strongly contribute to the predominantly unfolded state of mRNAs inside cells. Our studies broadly enable the functional analysis of physiological RNA structures and reveal that, in contrast to the Anfinsen view of protein folding whereby the structure formed is the most thermodynamically favourable, thermodynamics have an incomplete role in determining mRNA structure in vivo.

    Nature 505:701-705, Dec 15, 2013

100. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments (pdf) (scholar)

    Kheradpour, Kellis

    Recent advances in technology have led to a dramatic increase in the number of available transcription factor ChIP-seq and ChIP-chip data sets. Understanding the motif content of these data sets is an important step in understanding the underlying mechanisms of regulation. Here we provide a systematic motif analysis for 427 human ChIP-seq data sets using motifs curated from the literature and also discovered de novo using five established motif discovery tools. We use a systematic pipeline for calculating motif enrichment in each data set, providing a principled way for choosing between motif variants found in the literature and for flagging potentially problematic data sets. Our analysis confirms the known specificity of 41 of the 56 analyzed factor groups and reveals motifs of potential cofactors. We also use cell type-specific binding to find factors active in specific conditions. The resource we provide is accessible both for browsing a small number of factors and for performing large-scale systematic analyses. We provide motif matrices, instances and enrichments in each of the ENCODE data sets. The motifs discovered here have been used in parallel studies to validate the specificity of antibodies, understand cooperativity between data sets and measure the variation of motif binding across individuals and species

    Nucleic Acids Res. 2013 Dec 13

99. Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees (pdf) (scholar)

    Wu, Rasmussen, Bansal, Kellis

    Accurate gene tree-species tree reconciliation is fundamental to inferring the evolutionary history of a gene family. However, although it has long been appreciated that population-related effects such as incomplete lineage sorting (ILS) can dramatically affect the gene tree, many of the most popular reconciliation methods consider discordance only due to gene duplication and loss (and sometimes horizontal gene transfer). Methods that do model ILS are either highly parameterized or consider a restricted set of histories, thus limiting their applicability and accuracy. To address these challenges, we present a novel algorithm DLCpar for inferring a most parsimonious (MP) history of a gene family in the presence of duplications, losses, and ILS. Our algorithm relies on a new reconciliation structure, the labeled coalescent tree (LCT), that simultaneously describes coalescent and duplication-loss history. We show that the LCT representation enables an exhaustive and efficient search over the space of reconciliations, and, for most gene families, the least common ancestor (LCA) mapping is an optimal solution for the species mapping between the gene tree and species tree in a MP LCT. Applying our algorithm to a variety of clades, including flies, fungi, and primates, as well as to simulated phylogenies, we achieve high accuracy, comparable to sophisticated probabilistic reconciliation methods, at reduced runtime and with far fewer parameters. These properties enable inference of complex evolution of gene families across a broad range of species and large data sets.

    Genome Research 24(3):475-86, Dec 5, 2013.

97. Extensive Variation in Chromatin States Across Humans (pdf) (scholar)

    Kasowski, Kyriazopoulou-Panagiotopoulou, Grubert, Zaugg, Kundaje, Liu, Boyle, Zhang, Zakharia, Spacek, Li, Xie, Olarerin-George, Steinmetz, Hogenesch, Kellis, Batzoglou, Snyder

    The majority of disease-associated variants lie outside protein-coding regions, suggesting a link between variation in regulatory regions and disease predisposition. We studied differences in chromatin states using five histone modifications, cohesin, and CTCF in lymphoblastoid lines from 19 individuals of diverse ancestry. We found extensive signal variation in regulatory regions, which often switch between active and repressed states across individuals. Enhancer activity is particularly diverse among individuals, whereas gene expression remains relatively stable. Chromatin variability shows genetic inheritance in trios, correlates with genetic variation and population divergence, and is associated with disruptions of transcription factor binding motifs. Overall, our results provide insights into chromatin variation among humans.

    Science. Oct 17, 2013

96. Reconciliation revisited: handling multiple optima when reconciling with duplication, transfer, and loss (pdf) (scholar)

    Bansal, Alm, Kellis

    Phylogenetic tree reconciliation is a powerful approach for inferring evolutionary events like gene duplication, horizontal gene transfer, and gene loss, which are fundamental to our understanding of molecular evolution. While duplication-loss (DL) reconciliation leads to a unique maximum-parsimony solution, duplication-transfer-loss (DTL) reconciliation yields a multitude of optimal solutions, making it difficult to infer the true evolutionary history of the gene family. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Here, we present an effective, efficient, and scalable method for dealing with these fundamental problems in DTL reconciliation. Our approach works by sampling the space of optimal reconciliations uniformly at random and aggregating the results. We show that even gene trees with only a few dozen genes often have millions of optimal reconciliations and present an algorithm to efficiently sample the space of optimal reconciliations uniformly at random in O(mn(2)) time per sample, where m and n denote the number of genes and species, respectively. We use these samples to understand how different optimal reconciliations vary in their node mappings and event assignments and to investigate the impact of varying event costs. We apply our method to a biological dataset of approximately 4700 gene trees from 100 taxa and observe that 93% of event assignments and 73% of mappings remain consistent across different multiple optima. Our analysis represents the first systematic investigation of the space of optimal DTL reconciliations and has many important implications for the study of gene family evolution.

    RECOMB 2013 and Journal of Computational Biology 20:738-54, Sept 14, 2013.

94. Network deconvolution as a general method to distinguish direct dependencies in networks (pdf) (scholar)

    Feizi, Marbach, Medard, Kellis

    Recognizing direct relationships between variables connected in a network is a pervasive problem in biological, social and information sciences as correlation-based networks contain numerous indirect relationships. Here we present a general method for inferring direct effects from an observed correlation matrix containing both direct and indirect effects. We formulate the problem as the inverse of network convolution, and introduce an algorithm that removes the combined effect of all indirect paths of arbitrary length in a closed-form solution by exploiting eigen-decomposition and infinite-series sums. We demonstrate the effectiveness of our approach in several network applications: distinguishing direct targets in gene expression regulatory networks; recognizing directly interacting amino-acid residues for protein structure prediction from sequence alignments; and distinguishing strong collaborations in co-authorship social networks using connectivity information alone. In addition to its theoretical impact as a foundational graph theoretic tool, our results suggest network deconvolution is widely applicable for computing direct dependencies in network science across diverse disciplines

    Nature Biotechnology, Jul 14, 2013

89. Systematic dissection of regulatory motifs in 2,000 predicted human enhancers using a massively parallel reporter assay (pdf) (scholar)

    Kheradpour, Ernst, Melnikov, Rogov, Wang, Zhang, Alston, Mikkelsen, Kellis

    Genome-wide chromatin maps have permitted the systematic mapping of putative regulatory elements across multiple human cell types, revealing tens of thousands of candidate distal enhancer regions. However, until recently, their experimental dissection by directed regulatory motif disruption has remained unfeasible at the genome scale, due to the technological lag in large-scale DNA synthesis. Here, we employ a massively parallel reporter assay (MPRA) to measure the transcriptional levels induced by 145bp DNA segments centered on evolutionarily-conserved regulatory motif instances and found in enhancer chromatin states. We select five predicted activators (HNF1, HNF4, FOXA, GATA, NFE2L2) and two predicted repressors (GFI1, ZFP161) and measure reporter expression in erythroleukemia (K562) and liver carcinoma (HepG2) cell lines. We test 2,104 wild-type sequences and an additional 3,314 engineered enhancer variants containing targeted motif disruptions, each using 10 barcode tags in two cell lines and 2 replicates. The resulting data strongly confirm the enhancer activity and cell type specificity of enhancer chromatin states, the ability of 145bp segments to recapitulate both, the necessary role of regulatory motifs in enhancer function, and the complementary roles of activator and repressor motifs. We find statistically robust evidence that (1) scrambling, removing, or disrupting the predicted activator motifs abolishes enhancer function, while silent or motif-improving changes maintain enhancer activity; (2) evolutionary conservation, nucleosome exclusion, binding of other factors, and strength of the motif match are all associated with wild-type enhancer activity; (3) scrambling repressor motifs leads to aberrant reporter expression in cell lines where the enhancers are usually not active. Our results suggest a general strategy for deciphering cis-regulatory elements by systematic large-scale experimental manipulation, and provide quantitative enhancer activity measurements across thousands of constructs that can be mined to generate and test predictive models of gene expression

    Genome Research doi:10.1101/gr.144899.112, March 19, 2013

81. Interpreting noncoding genetic variation in complex traits and human disease (pdf) (scholar)

    Ward, Kellis

    Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has focused primarily on protein-coding variants, owing to the difficulty of interpreting noncoding mutations. This picture has changed with advances in the systematic annotation of functional noncoding elements. Evolutionary conservation, functional genomics, chromatin state, sequence motifs and molecular quantitative trait loci all provide complementary information about the function of noncoding sequences. These functional maps can help with prioritizing variants on risk haplotypes, filtering mutations encountered in the clinic and performing systems-level analyses to reveal processes underlying disease associations. Advances in predictive modeling can enable data-set integration to reveal pathways shared across loci and alleles, and richer regulatory models can guide the search for epistatic interactions. Lastly, new massively parallel reporter experiments can systematically validate regulatory predictions. Ultimately, advances in regulatory and systems genomics can help unleash the value of whole-genome sequencing for personalized genomic risk assessment, diagnosis and treatment

    Nature Biotechnology 30:1095-1106, Nov 2012

77. Evidence of Abundant Purifying Selection in Humans for Recently Acquired Regulatory Functions (pdf) (scholar)

    Ward, Kellis

    Although only 5% of the human genome is conserved across mammals, a substantially larger portion is biochemically active, raising the question of whether the additional elements evolve neutrally or confer a lineage-specific fitness advantage. To address this question, we integrate human variation information from the 1000 Genomes Project and activity data from the ENCODE Project. A broad range of transcribed and regulatory nonconserved elements show decreased human diversity, suggesting lineage-specific purifying selection. Conversely, conserved elements lacking activity show increased human diversity, suggesting that some recently became nonfunctional. Regulatory elements under human constraint in nonconserved regions were found near color vision and nerve-growth genes, consistent with purifying selection for recently evolved functions. Our results suggest continued turnover in regulatory regions, with at least an additional 4% of the human genome subject to lineage-specific constraint.

    Science 337:1675-8, Sep 5, 2012

74. An integrated encyclopedia of DNA elements in the human genome (pdf) (scholar)

    ENCODE Project Consortium

    The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

    Nature 489:57-74. Sep 6, 2012

61. A high-resolution map of human evolutionary constraint using 29 mammals (pdf) (scholar)

    Lindblad-Toh, Garber, Zuk, Lin, Parker, Washietl, Kheradpour, Ernst, Jordan, Mauceli, Ward, Lowe, Holloway, Clamp, Gnerre, Alfoldi, Beal, Chang, Clawson, Palma, Fitzgerald, Flicek, Guttman, Hubisz, Jaffe, Jungreis, Kostka, Lara, Martins, Massingham, Moltke, Raney, Rasmussen, Stark, Vilella, Wen, Xie, Zody, Worley, Kovar, Muzny, Gibbs, Warren, Mardis, Weinstock, Wilson, Birney, Margulies, Herrero, Green, Haussler, Siepel, Goldman, Pollard, Pedersen, Lander, Kellis

    The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering 4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for 60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.

    Nature 478:476-82, Oct 12 2011

49. Mapping and analysis of chromatin state dynamics in nine human cell types (pdf) (scholar)

    Ernst, Kheradpour, Mikkelsen, Shoresh, Ward, Epstein, Zhang, Wang, Issner, Coyne, Ku, Durham, Kellis*, Bernstein*

    Chromatin profiling has emerged as a powerful means for annotating genomic elements and detecting regulatory activity. Here we generate and analyze a compendium of epigenomic maps for nine chromatin marks across nine cell types, in order to systematically characterize cis-regulatory elements, their cell type-specificities, and their functional interactions. We first identify recurrent combinations of histone modifications and use them to annotate diverse regulatory elements including promoters, enhancers, transcripts and insulators in each cell type. We next characterize the dynamics of these elements, revealing meaningful patterns of activity for promoter states and exquisite cell type-selectivity for enhancer states. We define multi-cell activity profiles that reflect the patterns of enhancer state activity across cell types, as well as analogous profiles for gene expression, regulatory motif enrichments, and expression of the corresponding regulators. We use correlations between these profiles to link candidate enhancers to putative target genes, to infer cell type-specific activators and repressors, and to predict and validate functional regulator binding motifs in specific chromatin states. These functional annotations and regulatory predictions enable us to revisit intergenic single-nucleotide polymorphisms (SNPs) associated with human disease in genome-wide association studies (GWAS). We find that for several diseases, topscoring SNPs are precisely positioned within enhancer elements specifically active in relevant cell types. In several cases a disease variant affects a motif instance for one of the predicted causal regulators, thus providing a potential mechanistic explanation for the disease association. Our study presents a general framework for applying multi-cell chromatin state analysis to decipher cis-regulatory connections and their role in health and disease.

    Nature, doi:10.1038/nature09906, Epub ahead of print: March 23, 2011

48. A Cis-Regulatory Map of the Drosophila Genome (pdf) (scholar)

    Negre, Brown, Ma, Bristow, Miller, Kheradpour, Loriaux, Sealfon, Li, Ishii, Spokony, Chen, Hwang, Wagner, Auburn, Domanus, Shah, Morrison, Zieba, Suchy, Senderowicz, Victorsen, Bild, Grundstad, Hanley, Mannervik, Venken, Bellen, White, Russell, Grossman, Ren, Posakony, Kellis, White

    Following the sequencing of human and model organism genomes, genome-wide annotation of regulatory information has emerged as a major challenge. Here we describe an initial map of the Drosophila melanogaster regulatory genome based on the developmental dynamics of chromatin modifications and chromatin modifying enzymes, on polymerase occupancy of promoters, on the dynamic binding of enhancer-associated proteins such as the transcriptional co-factor CBP, and on the localization of forty-one site-specific transcription factors at different stages of development. The entire dataset provides protein modification and binding annotations across 94% of the genome along with prediction and validation of 4 classes of regulatory elements: insulators, promoters, silencers and enhancers. This regulatory map reveals several newly discovered properties of genome regulation, including the lack of epigenetic marks at promoters of transiently expressed genes, the association of specific Histone Deacetylases (HDACs) with Polycomb Response Elements, the early role of CBP as a marker of enhancers and the occurence of high-occupancy transcription factor binding sites that correlate with gene expression. Using these data we also generated a combinatorial analysis of transcription factors and DNA sequence motifs that are associated with different sets of developmentally co-expressed genes, providing a database for discovering the sets of regulatory inputs that control regulatory element function. Together, these cis-regulatory annotations serve as a foundation for further detailed analyses of the genomic regulatory code in Drosophila.

    Nature 471:527-531, March 23, 2011.

46. Identification of functional elements and regulatory circuits in Drosophila by large-scale data integration (pdf) (AD&D 1981 ADVANCED DUNGEONS & DRAGONS TSR FIEND FOLIO)

    The modENCODE Consortium, Roy, Ernst, Kharchenko, Kheradpour, Negre, Eaton, Landolin, Bristow, Ma, Lin, Washietl, Arshinoff, Ay, Meyer, Robine, Washington, Di Stefano, Berezikov, Brown, Brown, Candeias, Carlson, Carr, Jungreis, Marbach, Sealfon, Tolstorukov, Alekseyenko, Artieri, Boley, Booth, Brooks, Dai, Davis, Duff, Feng, Gorchakov, Gu, Henikoff, Kapranov, Li, Li, MacAlpine, Malone, Minoda, Nordman, Okamura, Perry, Powell, Riddle, Sakai, Samsonova, Sandler, Schwartz, Sher, Spokony, Sturgill, van Baren, Will, Wan, Yang, Yu, Feingold, Good, Guyer, Lowdon, Ahmad, Andrews, Berger, Bickel, Brenner, Brent, Cherbas, Elgin, Gingeras, Grossman, Hoskins, Kaufman, Kent, Kuroda, Orr-Weaver, Perrimon, Pirrotta, Posakony, Ren, Russell, Cherbas, Graveley, Lewis, Micklem, Oliver, Park, Celniker, Henikoff, Karpen, Lai, MacAlpine, Stein, White, Kellis

    Several years after the initial sequencing of the genomes from human and other organisms, the vast majority of each genome remains unannotated, and it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. To address this question, the Drosophila modENCODE project has undertaken a large-scale effort to comprehensively map transcription, regulator binding, chromatin state, replication, and nucleosome properties across a developmental time-course and in multiple cell lines. Here, we report our initial integrative analysis of the first phase of the project, encompassing more than 1000 datasets generated over four years across six production centers. Our integrated annotation enabled the discovery of new proteincoding, non-coding, RNA regulatory, replication, and chromatin elements that more than triple the annotated portion of the genome. We study correlated activity patterns of these elements to infer a functional regulatory network, which we use to predict putative functions for new genes, reveal stage-specific and tissue-specific regulators, and infer predictive models of gene expression. Our results provide a reference annotation that can inform directed experimental and computational studies in Drosophila and related species, and provide a model for systematic data integration towards the comprehensive genomic and functional annotation of any genome, including the human.

    Science, Dec 24, 2010.

42. Discovery and characterization of chromatin states for systematic annotation of the human genome (pdf) (scholar)

    Ernst, Kellis

    A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal 'chromatin states' in human T cells, based on recurrent and spatially coherent combinations of chromatin marks. We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, large-scale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.

    Nature Biotechnology 2010 Aug;28(8):817-25. Epub 2010 Jul 25. PMCID: PMC2919626 PMID: 20657582

34. Evolution of pathogenicity and sexual reproduction in eight Candida genomes (pdf) (scholar)

    Butler, Rasmussen, Lin, Santos, Sakthikumar, Munro, Rheinbay, Grabherr, Forche, Reedy, Agrafioti, Arnaud, Bates, Brown, Brunke, Costanzo, Fitzpatrick, de, Harris, Hoyer, Hube, Klis, Kodira, Lennard, Logue, Martin, Neiman, Nikolaou, Quail, Quinn, Santos, Schmitzberger, Sherlock, Shah, Silverstein, Skrzypek, Soll, Staggs, Stansfield, Stumpf, Sudbery, Srikantha, Zeng, Berman, Berriman, Heitman, Gow, Lorenz, Birren, Kellis, Cuomo

    Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.

    Nature. 2009 Jun 4;459(7247):657-62. PMCID: PMC2834264 PMID: 19465905

33. Histone modifications at human enhancers reflect global cell-type-specific gene expression (pdf) (scholar)

    Heintzman, Hon, Hawkins, Kheradpour, Stark, Harp, Ye, Lee, Stuart, Ching, Ching, Antosiewicz-Bourget, Liu, Zhang, Green, Lobanenkov, Stewart, Thomson, Crawford, Kellis, Ren

    The human body is composed of diverse cell types with distinct functions. Although it is known that lineage specification depends on cell-specific gene expression, which in turn is driven by promoters, enhancers, insulators and other cis-regulatory DNA sequences for each gene, the relative roles of these regulatory elements in this process are not clear. We have previously developed a chromatin-immunoprecipitation-based microarray method (ChIP-chip) to locate promoters, enhancers and insulators in the human genome. Here we use the same approach to identify these elements in multiple cell types and investigate their roles in cell-type-specific gene expression. We observed that the chromatin state at promoters and CTCF-binding at insulators is largely invariant across diverse cell types. In contrast, enhancers are marked with highly cell-type-specific histone modification patterns, strongly correlate to cell-type-specific gene expression programs on a global scale, and are functionally active in a cell-type-specific manner. Our results define over 55,000 potential transcriptional enhancers in the human genome, significantly expanding the current catalogue of human enhancers and highlighting the role of these elements in cell-type-specific gene expression.

    Nature. 2009 May 7;459(7243):108-12. Epub 2009 Mar 18. PMCID: PMC2910248 PMID: 19295514

32. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals (pdf) (scholar)

    Guttman, Amit, Garber, French, Lin, Feldser, Huarte, Zuk, Carey, Cassady, Cabili, Jaenisch, Mikkelsen, Jacks, Hacohen, Bernstein, Kellis, Regev, Rinn, Lander

    There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts. However, the functional significance of these transcripts has been particularly controversial. Although there are some well-characterized examples, most (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise. Here we report a new approach to identifying large non-coding RNAs using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified approximately 1,600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening non-coding RNAs (lincRNAs) show strong purifying selection in their genomic loci, exonic sequences and promoter regions, with greater than 95% showing clear evolutionary conservation. We also developed a functional genomics approach that assigns putative functions to each lincRNA, demonstrating a diverse range of roles for lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFkappaB, Sox2, Oct4 (also known as Pou5f1) and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.

    Nature. 2009 Mar 12;458(7235):223-7. Epub 2009 Feb 1. PMCID: PMC2754849 PMID: 19182780

28. Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes (pdf) (scholar)

    Lin, Deoras, Rasmussen, Kellis

    Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (< or =240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human.

    PLoS Comput Biol. 2008 Apr 18;4(4):e1000067. PMCID: PMC2291194 PMID: 18421375

17. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes (pdf) (scholar)

    Lin, Carlson, Crosby, Matthews, Yu, Park, Wan, Schroeder, Gramates, St, Roark, Wiley, Kulathinal, Zhang, Myrick, Antone, Celniker, Gelbart, Kellis

    The availability of sequenced genomes from 12 Drosophila species has enabled the use of comparative genomics for the systematic discovery of functional elements conserved within this genus. We have developed quantitative metrics for the evolutionary signatures specific to protein-coding regions and applied them genome-wide, resulting in 1193 candidate new protein-coding exons in the D. melanogaster genome. We have reviewed these predictions by manual curation and validated a subset by directed cDNA screening and sequencing, revealing both new genes and new alternative splice forms of known genes. We also used these evolutionary signatures to evaluate existing gene annotations, resulting in the validation of 87% of genes lacking descriptive names and identifying 414 poorly conserved genes that are likely to be spurious predictions, noncoding, or species-specific genes. Furthermore, our methods suggest a variety of refinements to hundreds of existing gene models, such as modifications to translation start codons and exon splice boundaries. Finally, we performed directed genome-wide searches for unusual protein-coding structures, discovering 149 possible examples of stop codon readthrough, 125 new candidate ORFs of polycistronic mRNAs, and several candidate translational frameshifts. These results affect >10% of annotated fly genes and demonstrate the power of comparative genomics to enhance our understanding of genome organization, even in a model organism as intensively studied as Drosophila melanogaster.

    Genome Res. 2007 Dec;17(12):1823-36. Epub 2007 Nov 7. PMCID: PMC2099591 PMID: 17989253

6. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae (pdf) (scholar)

    Kellis, Birren, Lander

    Whole-genome duplication followed by massive gene loss and specialization has long been postulated as a powerful mechanism of evolutionary innovation. Recently, it has become possible to test this notion by searching complete genome sequence for signs of ancient duplication. Here, we show that the yeast Saccharomyces cerevisiae arose from ancient whole-genome duplication, by sequencing and analysing Kluyveromyces waltii, a related yeast species that diverged before the duplication. The two genomes are related by a 1:2 mapping, with each region of K. waltii corresponding to two regions of S. cerevisiae, as expected for whole-genome duplication. This resolves the long-standing controversy on the ancestry of the yeast genome, and makes it possible to study the fate of duplicated genes directly. Strikingly, 95% of cases of accelerated evolution involve only one member of a gene pair, providing strong support for a specific model of evolution, and allowing us to distinguish ancestral and derived functions.

    Nature. 2004 Apr 8;428(6983):617-24. Epub 2004 Mar 7. PMID: 15004568

2. Sequencing and comparison of yeast species to identify genes and regulatory elements (pdf) (scholar)

    Kellis, Patterson, Endrizzi, Birren, Lander

    Identifying the functional elements encoded in a genome is one of the principal challenges in modern biology. Comparative genomics should offer a powerful, general approach. Here, we present a comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species (S. paradoxus, S. mikatae and S. bayanus). We first aligned the genomes and characterized their evolution, defining the regions and mechanisms of change. We then developed methods for direct identification of genes and regulatory motifs. The gene analysis yielded a major revision to the yeast gene catalogue, affecting approximately 15% of all genes and reducing the total count by about 500 genes. The motif analysis automatically identified 72 genome-wide elements, including most known regulatory motifs and numerous new motifs. We inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions. The results have implications for genome analysis of diverse organisms, including the human.

    Nature. 2003 May 15;423(6937):241-54. PMID: 12748633

C01. Crust: A new Voronoi-Based Surface Reconstruction Algorithm (pdf) (scholar)

    Amenta, Bern, Kellis (Kamvysselis)

    We describe our experience with a new algorithm for the reconstruction of surfaces from unorganized sample points in 3D. The algorithm is the first for this problem with provable guarantees. Given a "good sample" from a smooth surface, the output is guaranteed to be topologically correct and convergent to the original surface as the sampling density increases. The definition of a good sample is itself interesting: the required sampling density varies locally, rigorously capturing the intuitive notion that featureless areas can be reconstructed from fewer samples. The output mesh interpolates, rather than approximates, the input points. Our algorithm is based on the three-dimensional Voronoi diagram. Given a good program for this fundamental subroutine, the algorithm is quite easy to implement.

    ACM SIGGRAPH, v. 32, p. 415-421, Jul 19, 1998.
Group leader: Manolis Kellis
Professor of Computer Science
Karl Van Tassel Career Development Chair

Presidential Early Career Award in Science and Engineering (PECASE), 2008
Alfred P. Sloan Foundation Award, 2008
National Science Foundation Career Award, 2007
Karl Van Tassel Career Development Chair, 2007
Technology Review TR35 Top Young Innovators, 2006
Distinguished Alumnus 1964 Career Development Chair, 2005
Contact: MIT Stata Center, 32D-524
32 Vassar St, Cambridge, MA 02139
Assistant: Debbie Lehto 32G-675A 617-324-7303
D528 (Regulation Office): 617-253-6079
D526 (GWAS office): 617-715-4881
D524 (Manolis office): 617-253-2419
D516 (Networks office): 617-253-8170
D514 (QTL office): 617-324-8406
D512 (RNA/Epigenomics Office): 617-324-8439
D510 (Evolution office): 617-253-3434
D507 (Conference Room): 617-324-0419
Office Map: Stata D5

Massachusetts Institute of Technology

Broad Institute of MIT and Harvard

Computer Science and Artificial Intelligence Lab
Quick Links: Manolis Kellis at MIT - Manolis Kellis at CompBio - Manolis Kellis at MIT EECS - Manolis Kellis at MIT CSAIL - Manolis Kellis on Twitter - Manolis Kellis at TR35 - Manolis Kellis on Google Scholar - Women's Motionwear 7022 Higher-Waisted Dance Shorts Rainbow High - Manolis Kellis on YouTube - Manolis Kellis at TEDx - Manolis Kellis at TEDxCambridge - Manolis Kellis on LinkedIn - Manolis Kellis on Research Gate - Manolis Kellis on MIT Enterprise Forum Athens - Manolis Kellis at MIT Admissions - Manolis Kellis at AIT - Manolis Kellis at IHEC - Manolis Kellis at Broad Institute Interview - Manolis Kellis at Bio IT World - Manolis Kellis at Greek USA Reporter - Manolis Kellis at National Documentation Institute - Manolis Kellis at Epigenomics of Common Diseases 2014 - Manolis Kellis at Broad Midsummer Science Night - Manolis Kellis at CSBi News - Manolis Kellis at Broad Institute search