转眼已经到了19年1月的bioRxiv生信好文速递了。18年,bioRxiv上发表了20748篇新的preprint,而截止2017年,bioRxiv的文章总数也只有不到2万份【1】。也就是说,在过去的一年里,bioRxiv的文章增长率超100%,总数翻了一番。不仅如此,预印本平台的家庭也迎来了重要的新成员:18年年底chemRxiv的开张,标志着化学专业拥有了自己的预印本服务器。尽管这些增长相对于18年PMC上新增的60万篇paper想比只是很小的一部分【1】,但预印本迅速发展的热潮以及其中发布的诸多高水平研究让我们对它的未来充满期待!
1. 【Bioinformatics】农科院阮珏联手Samtools作者李恒开发三代基因组组装软件wtdbg2声称人类基因组组装提速十倍
Fast and accurate long-read assembly with wtdbg2(CC-BY-ND 4.0)
Existing long-read assemblers require tens of thousands of CPU hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a novel long-read assembler wtdbg2 that, for human data, is tens of times faster than published tools while achieving comparable contiguity and accuracy. It represents a significant algorithmic advance and paves the way for population-scale long-read assembly in future.
BTW:请注意,最新版本为2.3。此外有学者在网上表明默认参数对nanopore测序效果并不明显,可能需要调整参数。
2. 【Sequencing】PacBio 新技术实现平均读长13.5Kb准确率高达99.8%
Highly-accurate long-read sequencing improves variant detection and assembly of a human genome(CC-BY-NC-ND 4.0)
The major DNA sequencing technologies in use today produce either highly-accurate short reads or noisy long reads. We developed a protocol based on single-molecule, circular consensus sequencing (CCS) to generate highly-accurate (99.8%) long reads averaging 13.5 kb and applied it to sequence the well-characterized human HG002/NA24385. We optimized existing tools to comprehensively detect variants, achieving precision and recall above 99.91% for SNVs, 95.98% for indels, and 95.99% for structural variants. We estimate that 2,434 discordances are correctable mistakes in the high-quality Genome in a Bottle benchmark. Nearly all (99.64%) variants are phased into haplotypes, which further improves variant detection. De novo assembly produces a highly contiguous and accurate genome with contig N50 above 15 Mb and concordance of 99.998%. CCS reads match short reads for small variant detection, while enabling structural variant detection and de novo assembly at similar contiguity and markedly higher concordance than noisy long reads.
3. 【Genomics】八种精神疾病相关基因的大型GWAS分析
Genome wide meta-analysis identifies genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders(CC-BY-NC-ND 4.0)
Genetic influences on psychiatric disorders transcend diagnostic boundaries, suggesting substantial pleiotropy of contributing loci. However, the nature and mechanisms of these pleiotropic effects remain unclear. We performed a meta-analysis of 232,964 cases and 494,162 controls from genome-wide studies of anorexia nervosa, attention-deficit/hyperactivity disorder, autism spectrum disorder, bipolar disorder, major depression, obsessive-compulsive disorder, schizophrenia, and Tourette syndrome. Genetic correlation analyses revealed a meaningful structure within the eight disorders identifying three groups of inter-related disorders. We detected 109 loci associated with at least two psychiatric disorders, including 23 loci with pleiotropic effects on four or more disorders and 11 loci with antagonistic effects on multiple disorders. The pleiotropic loci are located within genes that show heightened expression in the brain throughout the lifespan, beginning in the second trimester prenatally, and play prominent roles in a suite of neurodevelopmental processes. These findings have important implications for psychiatric nosology, drug development, and risk prediction.
4. 【Genomics】博德研究所新文描绘多种癌症的mutational landscape并为其起源提供线索
Tumor mutational landscape is a record of the pre-malignant state(CC-BY-NC-ND 4.0)
Chromatin structure has a major influence on the cell-specific density of somatic mutations along the cancer genome. Here, we present a pan-cancer study in which we searched for the putative cancer cell-of-origin of 2,550 whole genomes, representing 32 cancer types by matching their mutational landscape to the regional patterns of chromatin modifications ascertained in 104 normal tissue types. We found that, in almost all cancer types, the cell-of-origin can be predicted solely from their DNA sequences. Our analysis validated the hypothesis that high-grade serous ovarian cancer originates in the fallopian tube and identified distinct origins of breast cancer subtypes. We also demonstrated that the technique is equally capable of identifying the cell-of-origin for a series of 2,044 metastatic samples from 22 of the tumor types available as primaries. Moreover, cancer drivers, whether inherited or acquired, reside in active chromatin regions in the respective cell-of-origin. Taken together, our findings highlight that many somatic mutations accumulate while the chromatin structure of the cell-of-origin is maintained and that this historical record, captured in the DNA, can be used to identify the often elusive cancer cell-of-origin.
5. 【Bioinformatics】昆士兰大学杨剑团队新软件助力多组学背景下的复杂形状分析
OSCA: a tool for omic-data-based complex trait analysis(CC-BY-NC-ND 4.0)
The rapid increase of omic data in the past decades has greatly facilitated the investigation of associations between omic profiles such as DNA methylation (DNAm) and complex traits in large cohorts. Here, we proposed a mixed-linear-model-based method (called MOMENT) that tests for association between a DNAm probe and trait with all other distal probes fitted in multiple random-effect components to account for the effects of unobserved confounders as well as the correlations between distal probes induced by the confounders. We demonstrated by simulations that MOMENT showed a lower false positive rate and more robustness than existing methods. MOMENT has been implemented in a versatile software package (called OSCA) together with a number of other implementations for omic-data-based analysis including the estimation of variance in a trait captured by all measures of multiple omic profiles, omic-data-based quantitative trait locus (xQTL) analysis, and meta-analysis of xQTL data.
6. 【Evolution】密歇根大学学者对叶绿体基因为基础的植物进化推断表达新观点
Characterizing gene tree conflict in plastome-inferred phylogenies(CC-BY-NC 4.0)
Premise of the study: Evolutionary relationships among plants have been inferred primarily using chloroplast data. To date, no study has comprehensively examined the plastome for gene tree conflict. Methods: Using a broad sampling of angiosperm plastomes, we characterized gene tree conflict among plastid genes at various time scales and explore correlates to conflict (e.g., evolutionary rate, gene length, molecule type). Key results: We uncover notable gene tree conflict against a backdrop of largely uninformative genes. We find gene length is the strongest correlate to concordance, and that nucleotides outperform amino acids. Of the most commonly used markers, matK greatly outperforms rbcL; however, the rarely used gene rpoC2 is the top-performing gene in every analysis. We find that rpoC2 reconstructs angiosperm phylogeny as well as the entire concatenated set of protein-coding chloroplast genes. Conclusions: Our results suggest that longer genes are superior for phylogeny reconstruction. The alleviation of some conflict through the use of nucleotides suggests that systematic error is likely the root of most of the observed conflict, but further research on biological conflict within plastome is warranted given the documented cases of heteroplasmic recombination. We suggest rpoC2 as a useful marker for reconstructing angiosperm phylogeny, reducing the effort and expense of assembling and analyzing entire plastomes.
7. 【Omics】瑞典卡罗林斯卡医学院利用GWAS加sc-seq对帕金森症病因提供新线索
Genetic Identification of Cell Types Underlying Brain Complex Traits Yields Novel Insights Into the Etiology of Parkinson's Disease(CC-BY-NC-ND 4.0)
Genome-wide association studies (GWAS) have discovered hundreds of loci associated with complex brain disorders, and provide the best current insights into the etiology of these idiopathic traits. However, it remains unclear in which cell types these variants may be active, which is essential for understanding disease etiology and for disease modelling. Here we integrate GWAS results with single-cell transcriptomic data from the entire nervous system to systematically identify cell types underlying psychiatric disorders, neurological conditions, and other brain complex traits. We show that psychiatric disorders are predominantly associated with excitatory neurons from the cortex/hippocampus, medium spiny neurons from the striatum, diverse sets of midbrain neurons, and inhibitory neurons from the cortex/hippocampus. Cognitive traits were generally associated with similar cell types but their associations were driven by different genes. Neurological disorders were associated with different cell types, consistent with other lines of evidence. Notably, we found that Parkinsons disease is not only genetically associated with dopaminergic neurons but also with serotonergic neurons and cells from the oligodendrocyte lineage. Using post-mortem brain transcriptomic data, we confirmed alterations in these cells, even at the earliest stages of disease progression. Altogether, our study provides a solid framework for understanding the cellular basis of complex brain disorders and reveals a new unexpected role of oligodendrocytes in Parkinsons disease.
8. 【Genomics】维生素D与抑郁症到底有何关系?看一个有趣的基因组学研究
Vitamin D: marker, cause or consequence of depression? An exploration using genomics(CC-BY-NC-ND 4.0)
Background: Observational studies suggest an association between circulating vitamin D and depression. Trials testing the effect of vitamin D supplementation on depression reported inconclusive findings. It remains unknown whether the vitamin D-depression association stems from shared etiology or from a direct causal relationship. We explored the nature of the association between 25-hydroxyvitamin D (25-OH-D) and major depressive disorder (MDD) exploiting data and statistical tools from genomics. Methods: Results from the two largest GWAS on 25-OH-D (79,366 samples) and major depressive disorder (MDD; 135,458 cases and 344,901 controls) were applied to individual-level data (>2,000 subjects with measures of genotype, circulating 25-OH-D and DSM-IV lifetime MDD) and summary-level data analyses. A genetic association between 25-OH-D and MDD was tested by polygenic risk scores (PRS) and by estimating genetic correlation between traits. Two-sample Mendelian Randomization (2SMR) analyses tested the potential bidirectional causality between 25-OH-D and depression. Results: In individual-level data, the 25-OH-D PRS was associated (p=1.4e-20) with 25-OH-D level, but not with lifetime MDD. Conversely, the MDD PRS was associated with MDD (p=2.3e-5), but not with 25-OH-D. In summary-level data analyses, the rg between the traits was low and not significant (-0.06, p=0.11). 2SMR analyses provided no evidence of a significant causal role of 25-OH-D for MD and vice versa. Conclusions: The use of genomics tools indicated that shared etiology or direct causality between vitamin D concentrations and depression is unlikely: vitamin D may represent a marker rather than a cause, or consequence, of depression.
9. 【Evolution】大型进化分析解密哺乳动物多样性
Ecological causes of uneven diversification and richness in the mammal tree of life(CC-BY-NC-ND 4.0)
The uneven distribution of species in the tree of life is rooted in unequal speciation and extinction among groups. Yet the causes of differential diversification are little known despite their relevance for sustaining biodiversity into the future. Here we investigate rates of species diversification across extant Mammalia, a compelling system that includes our own closest relatives. We develop a new phylogeny of nearly all ~6000 species using a 31-gene supermatrix and fossil node- and tip-dating approaches to establish a robust evolutionary timescale for mammals. Our findings link the causes of uneven modern species richness with ecologically-driven variation in diversification rates, including 24 detected rate shifts. Speciation rates are a stronger predictor of among-clade richness than clade age, countering claims of clock-like speciation in large phylogenies. Surprisingly, rate heterogeneity in recent radiations shows limited association with latitude, despite the well-known richness increase toward the equator. Instead, we find a deeper-time association where clades of high-latitude species have the highest speciation rates, suggesting that species durations are shorter outside than inside the tropics. At shallower timescales (i.e., young clades), diurnality and low vagility are both linked to greater speciation rates and extant richness. High turnover among small-ranged allopatric species may erase the signal of vagility in older clades, while diurnality may adaptively reduce competition and extinction. These findings highlight the underappreciated joint roles of ephemeral (turnover-based) and adaptive (persistence-based) diversification processes, which manifest as speciation gradients in recent and more ancient radiations to explain the evolution of mammal diversity.
10. 【Bioinformatics】加拿大多伦多大学Hoffman实验室开发新genomic set enrichment analysis工具
BEHST: genomic set enrichment analysis enhanced through integration of chromatin long-range interactions
Transforming data from genome-scale assays into knowledge of affected molecular functions and pathways is a key challenge in biomedical research. Using vocabularies of functional terms and databases annotating genes with these terms, pathway enrichment methods can identify terms enriched in a gene list. With data that can refer to intergenic regions, however, one must first connect the regions to the terms, which are usually annotated only to genes. To make these connections, existing pathway enrichment approaches apply unwarranted assumptions such as annotating non-coding regions with the terms from adjacent genes. We developed a computational method that instead links genomic regions to annotations using data on long-range chromatin interactions. Our method, Biological Enrichment of Hidden Sequence Targets (BEHST), finds Gene Ontology (GO) terms enriched in genomic regions more precisely and accurately than existing methods. We demonstrate BEHST's ability to retrieve more pertinent and less ambiguous GO terms associated with results of in vivo mouse enhancer screens or enhancer RNA assays for multiple tissue types. BEHST will accelerate the discovery of affected pathways mediated through long-range interactions that explain non-coding hits in genome-wide association study (GWAS) or genome editing screens. BEHST is free software with a command-line interface for Linux or macOS and a web interface (http://behst.hoffmanlab.org/).
引文
1. Morrison, Heather. 2018: best year yet for net growth of open access
https://poeticeconomics.blogspot.com/2018/12/2018-best-year-yet-for-net-growth-of.html
更多生信分析需求,请联系电话(同微信号):13120220117