知识中心 - 北京概普生物科技有限公司(GapTech)

每周生物信息学文献速递（09.10）

生信干货 sxr2 ·2017年9月9日 17:45

1.在重复序列元件中预测全基因组DNA甲基化

Prediction of genome-wide DNA methylation in repetitive elements(Nucleic Acids Research)

Abstract

DNA methylation in repetitive elements (RE) suppresses their mobility and maintains genomic stability, and decreases in it are frequently observed in tumor and/or surrogate tissues. Averaging methylation across RE in genome is widely used to quantify global methylation. However, methylation may vary in specific RE and play diverse roles in disease development, thus averaging methylation across RE may lose significant biological information. The ambiguous mapping of short reads by and high cost of current bisulfite sequencing platforms make them impractical for quantifying locus-specific RE methylation. Although microarray-based approaches (particularly Illumina's Infinium methylation arrays) provide cost-effective and robust genome-wide methylation quantification, the number of interrogated CpGs in RE remains limited. We report a random forest-based algorithm (and corresponding R package, REMP) that can accurately predict genome-wide locus-specific RE methylation based on Infinium array profiling data. We validated its prediction performance using alternative sequencing and microarray data. Testing its clinical utility with The Cancer Genome Atlas data demonstrated that our algorithm offers more comprehensively extended locus-specific RE methylation information that can be readily applied to large human studies in a cost-effective manner. Our work has the potential to improve our understanding of the role of global methylation in human diseases, especially cancer

2.MCbiclust：一种从大规模收集的转录组学数据中发现大规模功能相关基因集的新颖算法

MCbiclust: a novel algorithm to discover large-scale functionally related gene sets from massive transcriptomics data collections.（Nucleic Acids Research）

Abstract

The potential to understand fundamental biological processes from gene expression data has grown in parallel with the recent explosion of the size of data collections. However, to exploit this potential, novel analytical methods are required, capable of discovering large co-regulated gene networks. We found current methods limited in the size of correlated gene sets they could discover within biologically heterogeneous data collections, hampering the identification of multi-gene controlled fundamental cellular processes such as energy metabolism, organelle biogenesis and stress responses. Here we describe a novel biclustering algorithm called Massively Correlated Biclustering (MCbiclust) that selects samples and genes from large datasets with maximal correlated gene expression, allowing regulation of complex networks to be examined. The method has been evaluated using synthetic data and applied to large bacterial and cancer cell datasets. We show that the large biclusters discovered, so far elusive to identification by existing techniques, are biologically relevant and thus MCbiclust has great potential in the analysis of transcriptomics data to identify large-scale unknown effects hidden within the data. The identified massive biclusters can be used to develop improved transcriptomics based diagnosis tools for diseases caused by altered gene expression, or used for further network analysis to understand genotype-phenotype correlations.

3.使用miRMaster进行基于网页的NGS数据分析：人类miRNAs的大规模mata分析

Web-based NGS data analysis using miRMaster: a large-scale meta-analysis of human miRNAs（Nucleic Acids Research）

Abstract

The analysis of small RNA NGS data together with the discovery of new small RNAs is among the foremost challenges in life science. For the analysis of raw high-throughput sequencing data we implemented the fast, accurate and comprehensive web-based tool miRMaster. Our toolbox provides a wide range of modules for quantification of miRNAs and other non-coding RNAs, discovering new miRNAs, isomiRs, mutations, exogenous RNAs and motifs. Use-cases comprising hundreds of samples are processed in less than 5 h with an accuracy of 99.4%. An integrative analysis of small RNAs from 1836 data sets (20 billion reads) indicated that context-specific miRNAs (e.g. miRNAs present only in one or few different tissues / cell types) still remain to be discovered while broadly expressed miRNAs appear to be largely known. In total, our analysis of known and novel miRNAs indicated nearly 22 000 candidates of precursors with one or two mature forms. Based on these, we designed a custom microarray comprising 11 872 potential mature miRNAs to assess the quality of our prediction. MiRMaster is a convenient-to-use tool for the comprehensive and fast analysis of miRNA NGS data. In addition, our predicted miRNA candidates provided as custom array will allow researchers to perform in depth validation of candidates interesting to them.

4.双尾RT-qPCR：一种用于高精度miRNA定量的新方法

Two-tailed RT-qPCR: a novel method for highly accurate miRNA quantification（Nucleic Acids Research）

Abstract

MicroRNAs are a class of small non-coding RNAs that serve as important regulators of gene expression at the posttranscriptional level. They are stable in body fluids and pose great potential to serve as biomarkers. Here, we present a highly specific, sensitive and cost-effective system to quantify miRNA expression based on two-step RT-qPCR with SYBR-green detection chemistry called Two-tailed RT-qPCR. It takes advantage of novel, target-specific primers for reverse transcription composed of two hemiprobes complementary to two different parts of the targeted miRNA, connected by a hairpin structure. The introduction of a second probe ensures high sensitivity and enables discrimination of highly homologous miRNAs irrespectively of the position of the mismatched nucleotide. Two-tailed RT-qPCR has a dynamic range of seven logs and a sensitivity sufficient to detect down to ten target miRNA molecules. It is capable to capture the full isomiR repertoire, leading to accurate representation of the complete miRNA content in a sample. The reverse transcription step can be multiplexed and the miRNA profiles measured with Two-tailed RT-qPCR show excellent correlation with the industry standard TaqMan miRNA assays (r2 = 0.985). Moreover, Two-tailed RT-qPCR allows for rapid testing with a total analysis time of less than 2.5 hours.

5.OMTools：用于可视化和处理光学图谱映射数据的软件包

OMTools: a software package for visualizing and processing optical mapping data（Bioinformatics）

Abstract

Summary: Optical mapping is a molecular technique capturing specific patterns of fluorescent labels along DNA molecules. It has been widely applied in assisted-scaffolding in sequence assemblies, microbial strain typing and detection of structural variations. Various computational methods have been developed to analyze optical mapping data. However, existing tools for processing and visualizing optical map data still have many shortcomings. Here, we present OMTools, an efficient and intuitive data processing and visualization suite to handle and explore large-scale optical mapping profiles. OMTools includes modules for visualization (OMView), data processing and simulation. These modules together form an accessible and convenient pipeline for optical mapping analyses.

6. PhyD3：扩展的phyloXML并支持功能基因组学数据可视化的系统发育树软件

PhyD3: a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization（Bioinformatics）

Abstract

Motivation: Comparative and evolutionary studies utilize phylogenetic trees to analyze and visualize biological data. Recently, several web-based tools for the display, manipulation and annotation of phylogenetic trees, such as iTOL and Evolview, have released updates to be compatible with the latest web technologies. While those web tools operate an open server access model with a multitude of registered users, a feature-rich open source solution using current web technologies is not available.

Results: Here, we present an extension of the widely used PhyloXML standard with several new options to accommodate functional genomics or annotation datasets for advanced visualization. Furthermore, PhyD3 has been developed as a lightweight tool using the JavaScript library D3.js to achieve a state-of-the-art phylogenetic tree visualization in the web browser, with support for advanced annotations. The current implementation is open source, easily adaptable and easy to implement in third parties’ web sites.

7.染色体交互活化T细胞，鉴定自身免疫疾病候选基因

Chromosome contacts in activated T cells identify autoimmune disease candidate genes（genome biology）

Abstract

Autoimmune disease-associated variants are preferentially found in regulatory regions in immune cells, particularly CD4+ T cells. Linking such regulatory regions to gene promoters in disease-relevant cell contexts facilitates identification of candidate disease genes.

Within 4 h, activation of CD4+ T cells invokes changes in histone modifications and enhancer RNA transcription that correspond to altered expression of the interacting genes identified by promoter capture Hi-C. By integrating promoter capture Hi-C data with genetic associations for five autoimmune diseases, we prioritised 245 candidate genes with a median distance from peak signal to prioritised gene of 153 kb. Just under half (108/245) prioritised genes related to activation-sensitive interactions. This included IL2RA, where allele-specific expression analyses were consistent with its interaction-mediated regulation, illustrating the utility of the approach.

Our systematic experimental framework offers an alternative approach to candidate causal gene identification for variants with cell state-specific functional effects, with achievable sample sizes.

8. 转座子元件是灵长类动物基因调控的主要新奇之源

Transposable elements are the primary source of novelty in primate gene regulation（Genome Research）

Abstract

Gene regulation shapes the evolution of phenotypic diversity. We investigated the evolution of liver promoters and enhancers in six primate species using ChIP-seq (H3K27ac and H3K4me1) to profile cis-regulatory elements (CREs) and using RNA-seq to characterize gene expression in the same individuals. To quantify regulatory divergence, we compared CRE activity across species by testing differential ChIP-seq read depths directly measured for orthologous sequences. We show that the primate regulatory landscape is largely conserved across the lineage, with 63% of the tested human liver CREs showing similar activity across species. Conserved CRE function is associated with sequence conservation, proximity to coding genes, cell-type specificity, and transcription factor binding. Newly evolved CREs are enriched in immune response and neurodevelopmental functions. We further demonstrate that conserved CREs bind master regulators, suggesting that while CREs contribute to species adaptation to the environment, core functions remain intact. Newly evolved CREs are enriched in young transposable elements (TEs), including Long-Terminal-Repeats (LTRs) and SINE-VNTR-Alus (SVAs), that significantly affect gene expression. Conversely, only 16% of conserved CREs overlap TEs. We tested the cis-regulatory activity of 69 TE subfamilies by luciferase reporter assays, spanning all major TE classes, and showed that 95.6% of tested TEs can function as either transcriptional activators or repressors. In conclusion, we demonstrated the critical role of TEs in primate gene regulation and illustrated potential mechanisms underlying evolutionary divergence among the primate species through the noncoding genome.

9.人935个原代细胞，组织和细胞系样品中构建增强子 - 目标区域网络

Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines（Nature Genetics）

Abstract

We propose a new method for determining the target genes of transcriptional enhancers in specific cells and tissues. It combines global trends across many samples and sample-specific information, and considers the joint effect of multiple enhancers. Our method outperforms existing methods when predicting the target genes of enhancers in unseen samples, as evaluated by independent experimental data. Requiring few types of input data, we are able to apply our method to reconstruct the enhancer–target networks in 935 samples of human primary cells, tissues and cell lines, which constitute by far the largest set of enhancer–target networks. The similarity of these networks from different samples closely follows their cell and tissue lineages. We discover three major co-regulation modes of enhancers and find defense-related genes often simultaneously regulated by multiple enhancers bound by different transcription factors. We also identify differentially methylated enhancers in hepatocellular carcinoma (HCC) and experimentally confirm their altered regulation of HCC-related genes.

10. 鉴定153个与跟骨骨矿物质密度相关的新的基因座，GPC6与骨质疏松症中的功能相关。

Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis（Nature Genetics）

Abstract

Osteoporosis is a common disease diagnosed primarily by measurement of bone mineral density (BMD). We undertook a genome-wide association study (GWAS) in 142,487 individuals from the UK Biobank to identify loci associated with BMD as estimated by quantitative ultrasound of the heel. We identified 307 conditionally independent single-nucleotide polymorphisms (SNPs) that attained genome-wide significance at 203 loci, explaining approximately 12% of the phenotypic variance. These included 153 previously unreported loci, and several rare variants with large effect sizes. To investigate the underlying mechanisms, we undertook (1) bioinformatic, functional genomic annotation and human osteoblast expression studies; (2) gene-function prediction; (3) skeletal phenotyping of 120 knockout mice with deletions of genes adjacent to lead independent SNPs; and (4) analysis of gene expression in mouse osteoblasts, osteocytes and osteoclasts. The results implicate GPC6 as a novel determinant of BMD, and also identify abnormal skeletal phenotypes in knockout mice associated with a further 100 prioritized genes.

欢迎关注生信人