知识中心 - 北京概普生物科技有限公司(GapTech)

近期高水平生物信息学文献一览

生信干货 sxr3 ·2018年8月25日 19:11

1.用于定量miRNA分析的小RNA-SEQ各方法的综合评价

Comprehensive multi-center assessment of small RNA-seq methods for quantitative miRNA profiling(Nature Biotechnology)

Abstract

RNA-seq is increasingly used for quantitative profiling of small RNAs (for example, microRNAs, piRNAs and snoRNAs) in diverse sample types, including isolated cells, tissues and cell-free biofluids. The accuracy and reproducibility of the currently used small RNA-seq library preparation methods have not been systematically tested. Here we report results obtained by a consortium of nine labs that independently sequenced reference, 'ground truth' samples of synthetic small RNAs and human plasma-derived RNA. We assessed three commercially available library preparation methods that use adapters of defined sequence and six methods using adapters with degenerate bases. Both protocol- and sequence-specific biases were identified, including biases that reduced the ability of small RNA-seq to accurately measure adenosine-to-inosine editing in microRNAs. We found that these biases were mitigated by library preparation methods that incorporate adapters with degenerate bases. MicroRNA relative quantification between samples using small RNA-seq was accurate and reproducible across laboratories and methods.

2. 使用来自32个复杂性状的全基因组关联研究的汇总水平统计方法来估计复杂效应大小分布

Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits(Nature Genetics)

Abstract

We developed a likelihood-based approach for analyzing summary-level statistics and external linkage disequilibrium information to estimate effect-size distributions of common variants, characterized by the proportion of underlying susceptibility SNPs and a flexible normal-mixture model for their effects. Analysis of results available across 32 genome-wide association studies showed that, while all traits are highly polygenic, there is wide diversity in the degree and nature of polygenicity. Psychiatric diseases and traits related to mental health and ability appear to be most polygenic, involving a continuum of small effects. Most other traits, including major chronic diseases, involve clusters of SNPs that have distinct magnitudes of effects. We predict that the sample sizes needed to identify SNPs that explain most heritability found in genome-wide association studies will range from a few hundred thousand to multiple millions, depending on the underlying effect-size distributions of the traits. Accordingly, we project the risk-prediction ability of polygenic risk scores across a wide variety of diseases.

3.Hercules：基于HMM的长读长Read混合纠错算法

Hercules: a profile HMM-based hybrid error correction algorithm for long reads(Nucleic Acids Research)

Abstract

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform’s error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.

4.PAREsnip2：一种使用可配置的靶向规则从降解组测序数据中批量预测小RNA靶标的工具

PAREsnip2: a tool for high-throughput prediction of small RNA targets from degradome sequencing data using configurable targeting rules(Nucleic Acids Research)

Abstract

Small RNAs (sRNAs) are short, non-coding RNAs that play critical roles in many important biological pathways. They suppress the translation of messenger RNAs (mRNAs) by directing the RNA-induced silencing complex to their sequence-specific mRNA target(s). In plants, this typically results in mRNA cleavage and subsequent degradation of the mRNA. The resulting mRNA fragments, or degradome, provide evidence for these interactions, and thus degradome analysis has become an important tool for sRNA target prediction. Even so, with the continuing advances in sequencing technologies, not only are larger and more complex genomes being sequenced, but also degradome and associated datasets are growing both in number and read count. As a result, existing degradome analysis tools are unable to process the volume of data being produced without imposing huge resource and time requirements. Moreover, these tools use stringent, non-configurable targeting rules, which reduces their flexibility. Here, we present a new and user configurable software tool for degradome analysis, which employs a novel search algorithm and sequence encoding technique to reduce the search space during analysis. The tool significantly reduces the time and resources required to perform degradome analysis, in some cases providing more than two orders of magnitude speed-up over current methods.

5. HiGlass：基于网络的基因组交互的可视化探索与分析工具

HiGlass: web-based visual exploration and analysis of genome interaction maps(Genome Biology)

Abstract

We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others. We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. HiGlass is accessible online at http://higlass.io and is also available as a containerized application that can be run on any platform.

6.STRetch：检测和发现致病性短串联重复序列扩增工具

STRetch: detecting and discovering pathogenic short tandem repeat expansions(Genome Biology)

Abstract

Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Most existing tools for detecting STR variation with short reads do so within the read length and so are unable to detect the majority of pathogenic expansions. Here we present STRetch, a new genome-wide method to scan for STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting STR expansions using short-read whole-genome sequencing data at known pathogenic loci as well as novel STR loci. STRetch is open source software, available fromgithub.com/Oshlack/STRetch.

7.染色体尺度基因组序列比较分析揭示两个小麦品种基因组动态变化的分子机制

Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome dynamics between two wheat cultivars(Genome Biology)

Abstract

Background

Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the dynamics of wheat genomes on a megabase scale.

Results

Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes—the old landrace Chinese Spring and the elite Swiss spring wheat line ‘CH Campala Lr22a’. Both chromosomes were assembled into megabase-sized scaffolds. There is a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations reveals four large indels of more than 100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the molecular mechanisms that caused these indels. Three of the large indels affect copy number of NLRs, a gene family involved in plant immunity. Analysis of SNP density reveals four haploblocks of 4, 8, 9 and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Gene content across the two chromosomes was highly conserved. Ninety-nine percent of the genic sequences were present in both genotypes and the fraction of unique genes ranged from 0.4 to 0.7%.

Conclusions

This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations and gene content. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.

8.消费级基因组学会改变你的生活，无论你是否接受过测试

Consumer genomics will change your life, whether you get tested or not(Genome Biology)

Abstract

With more than 10 million genotyped customers, the consumer genomics industry is maturing and becoming a mainstream phenomenon. At last, innovations and applications, some unforeseen, are being brought to the masses.

9.QsRNA-seq：高通量分析和量化小RNAs的一种方法

QsRNA-seq: a method for high-throughput profiling and quantifying small RNAs(Genome Biology)

Abstract

The ability to profile and quantify small non-coding RNAs (sRNAs), specifically microRNAs (miRNAs), using high-throughput sequencing is challenging because of their small size. We developed QsRNA-seq, a method for preparation of sRNA libraries for high-throughput sequencing that overcomes this difficulty by enabling a gel-free separation of fragments shorter than 100 nt that differ only by 20 nt in length. The method allows the use of unique molecular identifiers for quantification and is more amenable to automation than gel-based methods. We show that QsRNA-seq gives very accurate, comprehensive, and reproducible results by looking at miRNAs in Caenorhabditis elegans embryos and larvae.

10.差异基因表达分析工具对长链非编码RNA测序数据应用不合理

Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data(Genome Biology)

Abstract

Background

Long non-coding RNAs (lncRNAs) are typically expressed at low levels and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 25 pipelines for testing DE in RNA-seq data is comprehensively evaluated, with a particular focus on lncRNAs and low-abundance mRNAs. Fifteen performance metrics are used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets.

Results

Gene expression data are simulated using non-parametric procedures in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, results for mRNA and lncRNA were tracked separately. All the pipelines exhibit inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and benchmark RNA-seq datasets. The substandard performance of DE tools for lncRNAs applies also to low-abundance mRNAs. No single tool uniformly outperformed the others. Variability, number of samples, and fraction of DE genes markedly influenced DE tool performance.

Conclusions

Overall, linear modeling with empirical Bayes moderation (limma) and a non-parametric approach (SAMSeq) showed good control of the false discovery rate and reasonable sensitivity. Of note, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in realistic settings such as in clinical cancer research. About half of the methods showed a substantial excess of false discoveries, making these methods unreliable for DE analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, giving guidance on selection of the optimal DE tool (http://statapps.ugent.be/tools/AppDGE/).

欢迎关注生信人

TCGA | 小工具 | 数据库 |组装| 注释 | 基因家族 | Pvalue

基因预测 |bestorf | sci | NAR | 在线工具 | 生存分析 | 热图

舞台|基因组 | 黄金测序 | 套路 | 杂谈组装 | 进化 | 测序简史