知识中心 - 北京概普生物科技有限公司(GapTech)

每周生物信息学文献速递（10.19)

生信干货 sxr2 ·2017年10月14日 17:53

1、一种解码基因转录调控规律的综合方法

An integrative method to decode regulatory logics in gene transcription(Nature Communications )

Abstract

Modeling of transcriptional regulatory networks (TRNs) has been increasingly used to dissect the nature of gene regulation. Inference of regulatory relationships among transcription factors (TFs) and genes, especially among multiple TFs, is still challenging. In this study, we introduced an integrative method, LogicTRN, to decode TF–TF interactions that form TF logics in regulating target genes. By combining cis-regulatory logics and transcriptional kinetics into one single model framework, LogicTRN can naturally integrate dynamic gene expression data and TF-DNA-binding signals in order to identify the TF logics and to reconstruct the underlying TRNs. We evaluated the newly developed methodology using simulation, comparison and application studies, and the results not only show their consistence with existing knowledge, but also demonstrate its ability to accurately reconstruct TRNs in biological complex systems.

2、稗草基因组分析提供了其作为杂草了适应和入侵的机制

Echinochloa crus-galli genome analysis provides insight into its adaptation and invasiveness as a weed(Nature Communications )

Abstract

Barnyardgrass (Echinochloa crus-galli) is a pernicious weed in agricultural fields worldwide. The molecular mechanisms underlying its success in the absence of human intervention are presently unknown. Here we report a draft genome sequence of the hexaploid species E. crus-galli, i.e., a 1.27 Gb assembly representing 90.7% of the predicted genome size. An extremely large repertoire of genes encoding cytochrome P450 monooxygenases and glutathione S-transferases associated with detoxification are found. Two gene clusters involved in the biosynthesis of an allelochemical 2,4-dihydroxy-7-methoxy-1,4-benzoxazin-3-one (DIMBOA) and a phytoalexin momilactone A are found in the E. crus-galli genome, respectively. The allelochemical DIMBOA gene cluster is activated in response to co-cultivation with rice, while the phytoalexin momilactone A gene cluster specifically to infection by pathogenic Pyricularia oryzae. Our results provide a new understanding of the molecular mechanisms underlying the extreme adaptation of the weed.

3、三倍体植物为多发性诱导多倍体提供了直接证据

Triparental plants provide direct evidence for polyspermy induced polyploidy(Nature Communications )

Abstract

It is considered an inviolable principle that sexually reproducing organisms have no more than two parents and fertilization of an egg by multiple sperm (polyspermy) is lethal in many eukaryotes. In flowering plants polyspermy has remained a hypothetical concept, due to the lack of tools to unambiguously identify and trace this event. We established a high-throughput polyspermy detection assay, which uncovered that supernumerary sperm fusion does occur in planta and can generate viable polyploid offspring. Moreover, polyspermy can give rise to seedlings with one mother and two fathers, challenging the bi-organismal concept of parentage. The polyspermy derived triploids are taller and produce bigger organs than plants resulting from a regular monospermic fertilization. In addition, we demonstrate the hybridization potential of polyspermy by instantly combining three different Arabidopsis accessions in one zygote. Our results provide direct evidence for polyspermy as a route towards polyploidy, which is considered a major plant speciation mechanism.

4、使用基因表达进行全基因组预测DNA酶I超敏感位点

Genome-wide prediction of DNase I hypersensitivity using gene expression(Nature Communications )

Abstract

We evaluate the feasibility of using a biological sample’s transcriptome to predict its genome-wide regulatory element activities measured by DNase I hypersensitivity (DH). We develop BIRD, Big Data Regression for predicting DH, to handle this high-dimensional problem. Applying BIRD to the Encyclopedia of DNA Elements (ENCODE) data, we found that to a large extent gene expression predicts DH, and information useful for prediction is contained in the whole transcriptome rather than limited to a regulatory element’s neighboring genes. We show applications of BIRD-predicted DH in predicting transcription factor-binding sites (TFBSs), turning publicly available gene expression samples in Gene Expression Omnibus (GEO) into a regulome database, predicting differential regulatory element activities, and facilitating regulome data analyses by serving as pseudo-replicates. Besides improving our understanding of the regulome–transcriptome relationship, this study suggests that transcriptome-based prediction can provide a useful new approach for regulome mapping.

5、人类肝脏中DNA相关蛋白的全基因组互作

A genome-wide interactome of DNA-associated proteins in the human liver(Genome Research）

Abstract

Large-scale efforts like the ENCODE Project have made tremendous progress in cataloging the genomic binding patterns of DNA-associated proteins (DAPs), such as transcription factors (TFs). However, most chromatin immunoprecipitation-sequencing (ChIP-seq) analyses have focused on a few immortalized cell lines whose activities and physiology differ in important ways from endogenous cells and tissues. Consequently, binding data from primary human tissue are essential to improving our understanding of in vivo gene regulation. Here, we identify and analyze more than 440,000 binding sites using ChIP-seq data for 20 DAPs in two human liver tissue samples. We integrated binding data with transcriptome and phased WGS data to investigate allelic DAP interactions and the impact of heterozygous sequence variation on the expression of neighboring genes. Our tissue-based data set exhibits binding patterns more consistent with liver biology than cell lines, and we describe uses of these data to better prioritize impactful noncoding variation. Collectively, our rich data set offers novel insights into genome function in human liver tissue and provides a valuable resource for assessing disease-related disruptions.

6、共表达网络揭示转录和可变剪接的组织特异性调节

Co-expression networks reveal the tissue-specific regulation of transcription and splicing(Genome Research）

Abstract

Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues

7、Krait：全基因组微卫星鉴定和引物设计的一个超快工具

Krait: an ultrafast tool for genome-wide survey of microsatellites and primer design（Bioinformatics）

Abstract

Microsatellites are found to be related with various diseases and widely used in population genetics as genetic markers. However, it remains a challenge to identify microsatellite from large genome and screen microsatellites for primer design from a huge result dataset. Here, we present Krait, a robust and flexible tool for fast investigation of microsatellites in DNA sequences. Krait is designed to identify all types of perfect or imperfect microsatellites on a whole genomic sequence, and is also applicable to identification of compound microsatellites. Primer3 was seamlessly integrated into Krait so that users can design primer for microsatellite amplification in an efficient way. Additionally, Krait can export microsatellite results in FASTA or GFF3 format for further analysis and generate statistical report as well as plotting.

8、从Hi-C数据中识别癌细胞中的拷贝数和易位变异

Identification of copy number variations and translocations in cancer cells from Hi-C data（Bioinformatics）

Abstract

Motivation

Eukaryotic chromosomes adapt a complex and highly dynamic three-dimensional (3D) structure, which profoundly affects different cellular functions and outcomes including changes in epigenetic landscape and in gene expression. Making the scenario even more complex, cancer cells harbor chromosomal abnormalities (e.g., copy number variations (CNVs) and translocations) altering their genomes both at the sequence level and at the level of 3D organization. High-throughput chromosome conformation capture techniques (e.g., Hi-C), which are originally developed for decoding the 3D structure of the chromatin, provide a great opportunity to simultaneously identify the locations of genomic rearrangements and to investigate the 3D genome organization in cancer cells. Even though Hi-C data has been used for validating known rearrangements, computational methods that can distinguish rearrangement signals from the inherent biases of Hi-C data and from the actual 3D conformation of chromatin, and can precisely detect rearrangement locations de novo have been missing.

Results

In this work, we characterize how intra and inter-chromosomal Hi-C contacts are distributed for normal and rearranged chromosomes to devise a new set of algorithms (i) to identify genomic segments that correspond to CNV regions such as amplifications and deletions (HiCnv), (Nurtdinov et al.) to call inter-chromosomal translocations and their boundaries (HiCtrans) from Hi-C experiments, and (iii) to simulate Hi-C data from genomes with desired rearrangements and abnormalities (AveSim) in order to select optimal parameters for and to benchmark the accuracy of our methods. Our results on 10 different cancer cell lines with Hi-C data show that we identify a total number of 105 amplifications and 45 deletions together with 90 translocations, whereas we identify virtually no such events for two karyotypically normal cell lines. Our CNV predictions correlate very well with whole genome sequencing (WGS) data among chromosomes with CNV events for a breast cancer cell line (r=0.89) and capture most of the CNVs we simulate using Avesim. For HiCtrans predictions, we report evidence from the literature for 30 out of 90 translocations for eight of our cancer cell lines. Furthermore, we show that our tools identify and correctly classify relatively understudied rearrangements such as double minutes (DMs) and homogeneously staining regions (HSRs).

Conclusions

Considering the inherent limitations of existing techniques for karyotyping (i.e., missing balanced rearrangements and those near repetitive regions), the accurate identification of CNVs and translocations in a cost-effective and high-throughput setting is still a challenge. Our results show that the set of tools we develop effectively utilize moderately sequenced Hi-C libraries (100-300 million reads) to identify known and de novo chromosomal rearrangements/abnormalities in well-established cancer cell lines. With the decrease in required number of cells and the increase in attainable resolution, we believe that our framework will pave the way towards comprehensive mapping of genomic rearrangements in primary cells from cancer patients using Hi-C.

欢迎关注生信人