知识中心 - 北京概普生物科技有限公司(GapTech)

近期高水平期刊生物信息文献一览（1.6）

生信干货 sxr2 ·2018年1月5日 07:55

1.IntPred：基于结构的蛋白质 - 蛋白质相互作用位点的预测

IntPred: a structure-based predictor of protein–protein interaction sites (Bioinformatics)

Abstract

Motivation

Protein–protein interactions are vital for protein function with the average protein having between three and ten interacting partners. Knowledge of precise protein–protein interfaces comes from crystal structures deposited in the Protein Data Bank (PDB), but only 50% of structures in the PDB are complexes. There is therefore a need to predict protein–protein interfaces in silico and various methods for this purpose. Here we explore the use of a predictor based on structural features and which exploits random forest machine learning, comparing its performance with a number of popular established methods.

Results

On an independent test set of obligate and transient complexes, our IntPred predictor performs well (MCC = 0.370, ACC = 0.811, SPEC = 0.916, SENS = 0.411) and compares favourably with other methods. Overall, IntPred ranks second of six methods tested with SPPIDER having slightly better overall performance (MCC = 0.410, ACC = 0.759, SPEC = 0.783, SENS = 0.676), but considerably worse specificity than IntPred. As with SPPIDER, using an independent test set of obligate complexes enhanced performance (MCC = 0.381) while performance is somewhat reduced on a dataset of transient complexes (MCC = 0.303). The trade-off between sensitivity and specificity compared with SPPIDER suggests that the choice of the appropriate tool is application-dependent.

2.ChopStitch：基于转录组装和全基因组测序数据的外显子注释和剪接图的构建

ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data (Bioinformatics)

Abstract

Motivation

Sequencing studies on non-model organisms often interrogate both genomes and transcriptomes with massive amounts of short sequences. Such studies require de novo analysis tools and techniques, when the species and closely related species lack high quality reference resources. For certain applications such as de novo annotation, information on putative exons and alternative splicing may be desirable.

Results

Here we present ChopStitch, a new method for finding putative exons de novo and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-Seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also accounts for base substitutions in transcript sequences that may be derived from sequencing or assembly errors, haplotype variations, or putative RNA editing events. The primary output of our tool is a FASTA file containing putative exons. Further, exon edges are interrogated for alternative exon-exon boundaries to detect transcript isoforms, which are represented as splice graphs in DOT output format.

3.PRAPI：Iso-Seq的转录后调控分析流程

PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq (Bioinformatics)

Abstract

Summary

The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results.

4.TCGA-assembler 2：用于检索和处理TCGA / CPTAC数据的流程

TCGA-assembler 2: software pipeline for retrieval and processing of TCGA/CPTAC data (Bioinformatics)

Abstract

Motivation

The Cancer Genome Atlas (TCGA) program has produced huge amounts of cancer genomics data providing unprecedented opportunities for research. In 2014, we developed TCGA-Assembler, a software pipeline for retrieval and processing of public TCGA data. In 2016, TCGA data were transferred from the TCGA data portal to the Genomic Data Commons (GDCs), which is supported by a different set of data storage and retrieval mechanisms. In addition, new proteomics data of TCGA samples have been generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) program, which were not available for downloading through TCGA-Assembler. It is desirable to acquire and integrate data from both GDC and CPTAC.

Results

We develop TCGA-assembler 2 (TA2) to automatically download and integrate data from GDC and CPTAC. We make substantial improvement on the functionality of TA2 to enhance user experience and software performance. TA2 together with its previous version have helped more than 2000 researchers from 64 countries to access and utilize TCGA and CPTAC data in their research. Availability of TA2 will continue to allow existing and new users to conduct reproducible research based on TCGA and CPTAC data.

5.DotAligner：RNA结构motif的鉴定和聚类

DotAligner: identification and clustering of RNA structure motifs(Genome Biology)

Abstract

The diversity of processed transcripts in eukaryotic genomes poses a challenge for the classification of their biological functions. Sparse sequence conservation in non-coding sequences and the unreliable nature of RNA structure predictions further exacerbate this conundrum. Here, we describe a computational method, DotAligner, for the unsupervised discovery and classification of homologous RNA structure motifs from a set of sequences of interest. Our approach outperforms comparable algorithms at clustering known RNA structure families, both in speed and accuracy. It identifies clusters of known and novel structure motifs from ENCODE immunoprecipitation data for 44 RNA-binding proteins.

6.PureCLIP：从单核苷酸CLIP-seq数据中捕获目标特异性蛋白质-RNA相互作用

PureCLIP: capturing target-specific protein–RNA interaction footprints from single-nucleotide CLIP-seq data(Genome Biology)

Abstract

The iCLIP and eCLIP techniques facilitate the detection of protein–RNA interaction sites at high resolution, based on diagnostic events at crosslink sites. However, previous methods do not explicitly model the specifics of iCLIP and eCLIP truncation patterns and possible biases. We developed PureCLIP (https://github.com/skrakau/PureCLIP), a hidden Markov model based approach, which simultaneously performs peak-calling and individual crosslink site detection. It explicitly incorporates a non-specific background signal and, for the first time, non-specific sequence biases. On both simulated and real data, PureCLIP is more accurate in calling crosslink sites than other state-of-the-art methods and has a higher agreement across replicates.

7.DE-kupl：通过k-mer分解算法尽可能捕获RNA-seq数据中的生物变异

DE-kupl: exhaustive capture of biological variation in RNA-seq data through k-mer decomposition(Genome Biology)

Abstract

We introduce a k-mer-based computational protocol, DE-kupl, for capturing local RNA variation in a set of RNA-seq libraries, independently of a reference genome or transcriptome. DE-kupl extracts all k-mers with differential abundance directly from the raw data files. This enables the retrieval of virtually all variation present in an RNA-seq data set. This variation is subsequently assigned to biological events or entities such as differential long non-coding RNAs, splice and polyadenylation variants, introns, repeats, editing or mutation events, and exogenous RNA. Applying DE-kupl to human RNA-seq data sets identified multiple types of novel events, reproducibly across independent RNA-seq experiments.

8.顺式调控元件的两个遗传变化引起了克拉花花瓣斑位置的进化

Two genetic changes in cis-regulatory elements caused evolution of petal spot position in Clarkia(Nature Plants)

Abstract

A major premise in evolutionary developmental biology is that regulatory changes, often involving cis-regulatory elements, are responsible for much morphological evolution. This premise is supported by recent investigations of animal development, but information is just beginning to accumulate regarding whether it also applies to the evolution of plant morphology1,2,3,4. Here, we identify the genetic differences between species in the genus Clarkia that are responsible for evolutionary change in an ecologically important element of floral colour patterns: spot position. The evolutionary shift in spot position was due to two simple genetic changes that resulted in the appearance of a transcription factor binding site mutation in the R2R3 Myb gene that changes spot formation. These genetic changes caused R2R3 Myb to be activated by a different transcription factor that is expressed in a different position in the petal. These results suggest that the regulatory rewiring paradigm is as applicable to plants as it is to animals, and support the hypothesis that cis-regulatory changes may often play a role in plant morphological evolution.

9. 使用MTAG进行全基因组关联统计的多特征分析

Multi-trait analysis of genome-wide association summary statistics using MTAG(Nature Genetics)

Abstract

We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (Neff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.

10. 多血统关联分析确定与免疫细胞增强标记共定位的新的哮喘风险位点

Multiancestry association study identifies new asthma risk loci that colocalize with immune-cell enhancer marks(Nature Genetics)

Abstract

We examined common variation in asthma risk by conducting a meta-analysis of worldwide asthma genome-wide association studies (23,948 asthma cases, 118,538 controls) of individuals from ethnically diverse populations. We identified five new asthma loci, found two new associations at two known asthma loci, established asthma associations at two loci previously implicated in the comorbidity of asthma plus hay fever, and confirmed nine known loci. Investigation of pleiotropy showed large overlaps in genetic variants with autoimmune and inflammatory diseases. The enrichment in enhancer marks at asthma risk loci, especially in immune cells, suggested a major role of these loci in the regulation of immunologically related mechanisms.

点击以下「关键词」，查看往期内容：

一个物种一个家

TCGA | 小工具 | 数据库 |组装| 注释 | 基因家族 | Pvalue

基因预测 |bestorf | sci | NAR | 在线工具 | 生存分析 | 热图

舞台|基因组 | 黄金测序 | 套路 | 杂谈组装 | 进化 | 测序简史

2017-生信人-2018