Highlighted Publications


abn7930_Andrews_Weng_Graphical_Abstract.m4v

Mammalian evolution of human cis-regulatory elements and transcription factor binding sites


Gregory Andrews, Kaili Fan, Henry E. Pratt, Nishigandha Phalke, Zoonomia Consortium, Elinor K. Karlsson, Kerstin Lindblad-Toh, Steven Gazal, Jill E. Moore, and Zhiping Weng

authors contributed equally to this work

Science, 2023 Jan


Understanding the regulatory landscape of the human genome is a long-standing objective of modern biology. Using 241 mammalian genomes recently sequenced by the Zoonomia Consortium, we charted evolutionary trajectories for 0.92 million human candidate cis-regulatory elements (cCREs) and 15.6 million human transcription factor binding sites (TFBSs). We identified 439,461 cCREs and 2,024,062 TFBSs under evolutionary constraint. Genes near constrained elements perform fundamental cellular processes, while genes near primate-specific elements are involved in environmental interaction, including odor perception and immune response. About 20% of TFBSs are transposable element-derived and exhibit intricate patterns of gains and losses during primate evolution, while sequence variants associated with complex traits are enriched in constrained TFBSs. Our annotations illuminate the regulatory functions of the human genome.

Key words: cis-regulatory elements, transcription factor, evolutionary conservation

Toward a comprehensive catalog of regulatory elements 


Kaili Fan, Edith Pfister, and Zhiping Weng

Human Genetics, 2023 Jan


Regulatory elements are the genomic regions that interact with transcription factors to control cell-type-specific gene expression in different cellular environments. A precise and complete catalog of functional elements encoded by the human genome is key to understanding mammalian gene regulation. Here, we review the current state of regulatory element annotation. We first provide an overview of assays for characterizing functional elements, including genome, epigenome, transcriptome, three-dimensional chromatin interaction, and functional validation assays. We then discuss computational methods for defining regulatory elements, including peak-calling and other statistical modeling methods. Finally, we introduce several high-quality lists of regulatory element annotations and suggest potential future directions. 

Key words: regulatory elements, genome annotation, functional annotation, gene regulation, evolution, functional characterization

ubi-PLS_GraphicalAbstract.m4v

Genetic and epigenetic features of promoters with ubiquitous chromatin accessibility support ubiquitous transcription of cell-essential genes 


Kaili Fan, Jill E. Moore, Xiao-ou Zhang, and Zhiping Weng

Nucleic acids research, 2021 May


Gene expression is controlled by regulatory elements within accessible chromatin. Although most regulatory elements are cell type-specific, a subset is accessible in nearly all the 517 human and 94 mouse cell and tissue types assayed by the ENCODE consortium. We systematically analyzed 9000 human and 8000 mouse ubiquitously-accessible candidate cis-regulatory elements (cCREs) with promoter-like signatures (PLSs) from ENCODE, which we denote ubi-PLSs. These are more CpG-rich than non-ubi-PLSs and correspond to genes with ubiquitously high transcription, including a majority of cell-essential genes. ubi-PLSs are enriched with motifs of ubiquitously-expressed transcription factors and preferentially bound by transcriptional cofactors regulating ubiquitously-expressed genes. They are highly conserved between human and mouse at the synteny level but exhibit frequent turnover of motif sites; accordingly, ubi-PLSs show increased variation at their centers compared with flanking regions among the ∼186 thousand human genomes sequenced by the TOPMed project. Finally, ubi-PLSs are enriched in genes implicated in Mendelian diseases, especially diseases broadly impacting most cell types, such as deficiencies in mitochondrial functions. Thus, a set of roughly 9000 mammalian promoters are actively maintained in an accessible state across cell types by a distinct set of transcription factors and cofactors to ensure the transcriptional programs of cell-essential genes.

Key words: promoter, ubiquitous open chromatin, cell essential, cCRE, Mendelian disease

Graphics Abstract - Pachytene.m4v

Long first exons and epigenetic marks distinguish conserved pachytene piRNA clusters from other mammalian genes


Tianxiong Yu, Kaili Fan, Deniz M. Özata, Gen Zhang, Yu Fu, Willian E. Theukauf, Philip D. Zamore, and Zhiping Weng

authors contributed equally to this work

Nature Communications, 2021 Jan



In the male germ cells of placental mammals, 26–30-nt-long PIWI-interacting RNAs (piRNAs) emerge when spermatocytes enter the pachytene phase of meiosis. In mice, pachytene piRNAs derive from ~100 discrete autosomal loci that produce canonical RNA polymerase II transcripts. These piRNA clusters bear 5′ caps and 3′ poly(A) tails, and often contain introns that are removed before nuclear export and processing into piRNAs. What marks pachytene piRNA clusters to produce piRNAs, and what confines their expression to the germline? We report that an unusually long first exon (≥ 10 kb) or a long, unspliced transcript correlates with germline-specific transcription and piRNA production. Our integrative analysis of transcriptome, piRNA, and epigenome datasets across multiple species reveals that a long first exon is an evolutionarily conserved feature of pachytene piRNA clusters. Furthermore, a highly methylated promoter, often containing a low or intermediate level of CG dinucleotides, correlates with germline expression and somatic silencing of pachytene piRNA clusters. Pachytene piRNA precursor transcripts bind THOC1 and THOC2, THO complex subunits known to promote transcriptional elongation and mRNA nuclear export. Together, these features may explain why the major sources of pachytene piRNA clusters specifically generate these unique small RNAs in the male germline of placental mammals.

Key words: piRNA, long first exon, splicing, mammals