大家好!让我们看看在刚刚过去的六月里有哪些值得一看的生信预印本(preprint)文稿吧!此外,我们特别在每篇preprint的摘要前注明版权信息。
By The Way:17年《自然》杂志上的一篇题为Biologists debate how to license preprints的新闻关注了bioRxiv上preprint版权的问题,有29%的文章未添加任何许可(见下图)。
1. 基因组测序助力拉菲葡萄酒酿制
A high-quality grapevine downy mildew genome assembly reveals rapidly evolving and lineage-specific putative host adaptation genes (1) CC-BY-NC-ND 4.0
对葡萄酒有所了解的读者一定对法国的波尔多不太陌生,因为这里是著名的拉菲葡萄酒(Château Lafite-Rothschild)的故乡。作为酿酒的原料,葡萄的栽培并不容易,除了要祈求天公作美,还要对于各种病菌的侵扰。其中,一种被称作卵菌(oomycete)的真核微生物(亦称水霉,外观像真菌,进化关系却并不密切),是导致葡萄的重要疾病葡萄露菌病的罪魁祸首。近日,来自波尔多的科学家对来自Plasmopara属的两种葡萄病原卵菌进行了全基因组测序。作者发现secreted protein-encoding genes和RXLR cytoplasmic effectors通常具有较快的进化速率,并且展现出较高的dN/dS比值。作者找到了270个可能受到了正选择的基因,其中的一部分编码transporters和components of the RNA machinery potentially involved in host specialization。(更多故事请看dn/ds如何与正选择扯上关系)。这些受到正选择的candidate genes,以及卵菌比较基因组学(comparative genomics)研究,将为接下来的功能研究提供功能线索,并有望对葡萄陆菌病的防治有所帮助。
(1) CC-BY-NC-ND 4.0
2. 反式剪切的全基因组分析
Trans-splicing of mRNAs links gene transcription to translational control regulated by mTOR (2) CC-BY 4.0
反式剪切(trans-splicing)是一种有趣的现象。同一般的可变剪切不同,反式剪切是将不同的RNA的片段剪切在一起。该现象在包括线虫在内的少数生物中较为普遍。然而,同顺式剪切想比,反式剪切的功能有太多的谜团尚未揭开。日前,来自挪威卑尔根大学(University of Bergen)的研究人员研究了尾索动物O. dioica在mTOR(mammalian target of rapamycin;重要的生长调控基因)被抑制情况下全基因组范围内基因的反式剪切。作者们的结论支持TOP基因和其他反式剪切基因的表达在翻译水平上的调控,作为对环境变化的快速应对。
可变剪切不同模式示意图,摘自(3) ;更多相关细节请见武汉大学周荣家(3)以及中科院钱文锋老师的论文(4)
In phylogenetically diverse organisms, the 5' ends of a subset of mRNAs are trans-spliced with a spliced leader (SL) RNA. The functions of SL trans-splicing, however, remain largely enigmatic. Here, we quantified translation genome-wide in the marine chordate, Oikopleura dioica, under inhibition of mTOR, a central growth regulator. Translation of trans-spliced TOP mRNAs was suppressed, showing that the SL sequence permits nutrient-dependent translational control of growth-related mRNAs. Under crowded, nutrient-limiting conditions, O. dioica continues to filter-feed, but arrests growth until favorable conditions return. Upon release from such conditions, initial recovery was independent of nutrient-responsive, trans-spliced genes, suggesting animal density sensing as a first trigger for resumption of development. Our results demonstrate a role for trans-splicing in the coordinated translational down-regulation of nutrient-responsive genes under limiting conditions and suggest an innovative strategy for rapid evolution of mTOR targets in genomes of metazoans whose reproduction is tightly linked to nutritional cues.
3. 丛林狩猎者的趋同进化
Polygenic adaptation and convergent evolution across both growth and cardiac genetic pathways in African and Asian rainforest hunter-gatherers (5) CC-BY-NC-ND 4.0
趋同进化(convergent evolution),是指不同起源的生物彼此独立进化出相同或相近性状的现象,并往往因为相似的生活环境或生态位所致。生活在亚洲和非洲丛林里的狩猎采集土著居民,长期以来被认为是人类趋同进化的一个重要实证。近日,来自宾州州立大学(Penn State University)的研究人员通过外显子组测序对非洲和亚洲热带丛林里生活的身材矮小的狩猎采集者(hunter-gatherer)的这一现象进行了基因层面的分析。他们找寻到生活在不同地区的从列狩猎者们的一些基因不约而同地经历了正向选择(positive selection)。这些基因的功能包括growth factor binding和cardiac development。最后,作者也对从事农耕作业的人的这些基因进行了分析,未找到正选择证据,从而表示这些相关基因的趋同进化是针对于亚洲和非洲不同地区生活的hunter-gatherer的特殊生活方式的趋同适应。
Different human populations facing similar environmental challenges have sometimes evolved convergent biological adaptations, for example hypoxia resistance at high altitudes and depigmented skin in northern latitudes on separate continents. The pygmy phenotype (small adult body size), a characteristic of hunter-gatherer populations inhabiting both African and Asian tropical rainforests, is often highlighted as another case of convergent adaptation in humans. However, the degree to which phenotypic convergence in this polygenic trait is due to convergent vs. population-specific genetic changes is unknown. To address this question, we analyzed high-coverage sequence data from the protein-coding portion of the genomes (exomes) of two pairs of populations, Batwa rainforest hunter-gatherers and neighboring Bakiga agriculturalists from Uganda, and Andamanese rainforest hunter-gatherers (Jarawa and Onge) and Brahmin agriculturalists from India. We observed signatures of convergent positive selection between the Batwa and Andamanese rainforest hunter-gatherers across the set of genes with annotated 'growth factor binding' functions (p<0.001). Unexpectedly, for the rainforest groups we also observed convergent and population-specific signatures of positive selection in pathways related to cardiac development (e.g. 'cardiac muscle tissue development'; p=0.001). We hypothesize that the growth hormone sub-responsiveness likely underlying the pygmy phenotype may have led to compensatory changes in cardiac pathways, in which this hormone also plays an essential role. Importantly, in the agriculturalist populations we did not observe similar patterns of positive selection on sets of genes associated with either growth or cardiac development, indicating that our results most likely reflect a history of convergent adaptation to the similar ecology of rainforest hunter-gatherers rather than a more common or general evolutionary pattern for human populations.
4. 八个果蝇基因组的重新注释
Reannotation of eight Drosophila genomes (6) CC0
黑腹果蝇(Drosophila melanogaster)是历史久远的经典模式生物。上世纪早期,遗传学大师摩尔根(Thomas Hunt Morgan)正是通过它,发现了无数经典的遗传学原理。目前Drosophila有很多物种有全基因序列和注释。然而,除黑腹果蝇因研究深入外,其他物种的测序和注释参差补不齐,在一定程度上限制了对于果蝇比较基因组学(comparative genomics)的基因一部研究。上个月,来自美国国家健康中心(NIH)的科学家分析了9种果蝇包含8种组织器官的584个样品的RNA-seq数据,对除黑腹果蝇之外的其他8种果蝇进行了新的基因注释,分别是D. yakuba (Dyak), D. ananassae (Dana), D. pseudoobscura (Dpse), D. persimilis (Dper), D. willistoni (Dwil), D. mojavensis (Dmoj), D. virilis (Dvir), and D. grimshawi (Dgri)。
The sequenced genomes in the Drosophila phylogeny is a central resource for comparative work supporting the understanding of the Drosophila melanogaster non-mammalian model system. These have also facilitated studying the selected and random differences that distinguish the thousands of extant species of Drosophila. However, full utility has been hampered by uneven genome annotation. We have generated a large expression profile dataset for nine species of Drosophila and trained a transcriptome assembly approach on Drosophila melanogaster to develop a pipeline that best matched the extensively curated annotation. We then applied this to the other species to add tens of thousands of new gene models per species. We also developed new orthologs to facilitate cross-species comparisons. We validated the new annotation of the distantly related Drosophila grimshawi with an extensive collection of newly sequenced cDNAs. This reannoation will facilitate understanding both the core commonalities and the species differences in this important group of model organisms.
5. 全基因组测序分析揭示蒙大拿果蝇(Drosophila motana)对寒冷环境适应的机理
Inter- and intra-specific genomic divergence in Drosophila montana shows evidence for cold adaptation (7) CC-BY 4.0
The genomes of species that are ecological specialists will likely contain signatures of genomic adaptation to their niche. However, distinguishing genes related to ecological specialism from other sources of selection and more random changes is a challenge. Here we describe the genome of Drosophila montana, which is the most extremely cold-adapted Drosophila species. We use branch tests to identify genes showing accelerated divergence in contrasts between cold- and warm adapted species and identify about 250 genes that show differences, possibly driven by a lower synonymous substitution rate in cold-adapted species. We look for evidence of accelerated divergence between D. montana and D. virilis, a previously sequenced relative, and do not find strong evidence for divergent selection on coding sequence variation. Divergent genes are involved in a variety of functions, including cuticular and olfactory processes. We also re-sequenced three populations of D. montana from its ecological and geographic range. Outlier loci were more likely to be found on the X chromosome and there was a greater than expected overlap between population outliers and those genes implicated in cold adaptation between Drosophila species, implying some continuity of selective process at these different evolutionary scales.
6. 日本梨黑星病菌基因组草图
Draft Genome Sequence of the Asian Pear Scab Pathogen, Venturia nashicola (8) CC-BY-ND 4.0
Venturia nashicola, which causes scab disease of Asian pear, is a host-specific, biotrophic fungus, with a sexual stage that occurs during saprobic growth. V. nashicola is endemic to Asia and is regarded as a quarantine threat to Asian pear production outside of this continent. Currently, fungicide applications are routinely used to control scab disease. However, fungicide resistance in V. nashicola, as in other fungal pathogens, is an ongoing challenge and alternative control or prevention measures that include, for example, the deployment of durable host resistance, are required. A close relative of V. nashicola, V. pirina, causes scab disease of European pear. European pear displays non-host resistance (NHR) to V. nashicola and Asian pears are non-hosts of V. pirina. It is anticipated that the host specificity of these two fungi is governed by differences in their effector arsenals, with a subset responsible for activating NHR. The Pyrus-Venturia pathosystems provide a unique opportunity to dissect the underlying genetics of non-host interactions and to understand coevolution in relation to this potentially more durable form of resistance. Here, we present the first V. nashicola draft whole genome sequence (WGS), which is made up of 40,800 scaffolds (totalling 45 Mb) and 11,094 predicted genes. Of these genes, 1,232 are predicted to encode a secreted protein by SignalP, with 273 of these predicted to be effectors by EffectorP. The V. nashicola WGS will enable comparison to the WGSs of other Venturia spp. to identify effectors that potentially activate NHR in the pear scab pathosystems.
7. 单细胞测序为揭秘人体周期性变化提供线索
Single cell RNAseq provides a molecular and cellular cartography of changes to the human endometrium through the menstrual cycle (9) CC-BY-NC-ND 4.0
In a human menstrual cycle, the endometrium undergoes remodeling, shedding, and regeneration which are driven by substantial gene expression changes in the underlying cellular hierarchy. Despite its importance in human fertility and regenerative biology, mechanistic understanding of this unique type of tissue homeostasis remains rudimentary. Here, we characterized the transcriptomic transformation of human endometrium at single cell resolution, dissecting multidimensional cellular heterogeneity of the tissue across the entire natural menstrual cycle. We analyzed 6 endometrial cell types, including a previously uncharacterized ciliated epithelial cell type, during four major phases of endometrial transformation, and found characteristic signatures for each cell type and phase. We discovered that human window of implantation opens up with an abrupt and discontinuous transcriptomic activation in the epithelium, accompanied with widespread decidualized feature in the stroma. These data reveal signatures in the luminal and glandular epithelium during epithelial gland reconstruction, and suggest a mechanism for adult gland formation.
8. 华大基因力作推出最新NGS平台
Reliable Multiplex Sequencing with Rare Index Mis-Assignment on DNB-Based NGS Platform (10) CC-BY-NC-ND 4.0
Accurate next generation sequencing (NGS) is critical for understanding genetic predisposition to human disease and thus aiding clinical diagnosis and personalized precision medicine. Recent breakthroughs in massively parallel sequencing, especially when coupled with sample multiplexing, have driven sequencing cost down and made clinical genetic tests broadly affordable. However, intractable index mis-assignment (commonly exceeds 1%) has been reported on some widely used sequencing platforms. Burdensome unique dual indexing is now used to reduce this problem. Here, we investigated this quality issue on BGI sequencers using three major library preparation methods: whole genome sequencing (WGS) with PCR, PCR-free WGS, and two-step targeted PCR. BGI sequencers utilize a unique DNA nanoball (DNB) technology that is based on rolling circle replication (RCR) for array preparation; this linear amplification is PCR free and can avoid error accumulation. We demonstrate here that single index mis-assignment from free indexed oligos on these sequencers occurs at a rate of only one in 36 million reads, suggesting virtually no index hopping during DNB creation and arraying, as expected for the RCR process. Furthermore, the DNB-based NGS applications have achieved an unprecedentedly low sample-to-sample mis-assignment rate of 0.0001% to 0.0004% using only single indexing. Therefore, single indexing with DNB sequencing technology provides a simple but effective method for sensitive research and clinical genetic assays that require the detection of low abundance sequences in a large number of samples.
9. 原核生物特异read counting软件FADU:超越HTSeq和featurecounts
FADU: A Feature Counting Tool for Prokaryotic RNA-Seq Analysis (11) CC-BY 4.0
Motivation: The major algorithms for quantifying transcriptomics data for differential gene expression analysis were designed for analyzing data from human or human-like genomes, specifically those with single gene transcripts and distinct transcriptional boundaries that extend beyond the coding sequence (CDS) as identified through expressed sequence tags (ESTs) or EST-like sequence data. Some eukaryotic genomes and all, or nearly all, bacterial genomes require alternate methods of quantification since they lack annotation of transcriptional boundaries with EST or EST-like data, have overlapping transcriptional boundaries, and/or have polycistronic transcripts. Results: An algorithm was developed and tested that better quantifies transcriptomics data for differential gene expression analysis in organisms with overlapping transcriptional units and polycistronic transcripts. Using data from standard libraries originating from Escherichia coli and Ehrlichia chaffeensis, and strand-specific libraries from the Wolbachia endosymbiont wBm, FADU can derive counts for genes that are missed by HTSeq and featureCounts. Using the default parameters with the E. coli data, FADU can detect transcription of 51 more genes than HTSeq in union mode and 21 genes more than featureCounts, with 42 and 18 of these features being ≤ 300 bp, respectively. Due to its ability to derive counts for otherwise unrepresented genes without overstating their abundance, we believe FADU to be an improved tool for quantifying transcripts in prokaryotic systems for RNA-Seq analyses. Availability and implementation: FADU is available at https://github.com/adkinsrs/FADU. FADU was implemented using Python3 and requires the PySAM module (version 0.12.0.1 or later).
10. GATK的不同版本对研究有什么影响?
Performance benchmarking of GATK3.8 and GATK4 (12) CC-BY-ND 4.0
Use of the Genome Analysis Toolkit (GATK) continues to be the standard practice in genomic variant calling in both research and the clinic. Recently the toolkit has been rapidly evolving. Significant computational performance improvements have been introduced in GATK3.8 through collaboration with Intel in 2017. The first release of GATK4 in early 2018 revealed significant rewrites in the code base, as the stepping stone toward a Spark implementation. As the software continues to be a moving target for optimal deployment in highly productive environments, we present a detailed analysis of these improvements, to help the community stay abreast with changes in performance. We re-evaluated the options previously identified as advantageous, such as threading, parallel garbage collection, I/O options and data-level parallelization. Based on our results, we consider the performance and cost trade-offs of using GATK3.8 and GATK4 for different types of analyses.
引文
1.Dussert Y, Mazet ID, Couture C, Gouzy J, Piron M-C, Kuchly C, et al. A high-quality grapevine downy mildew genome assembly reveals rapidly evolving and lineage-specific putative host adaptation genes. bioRxiv. 2018.
2.Danks G, Galbiati H, Raasholm M, Torres Cleuren YN, Valen E, Navratilova P, et al. Trans-splicing of mRNAs links gene transcription to translational control regulated by mTOR. bioRxiv. 2018.
3.Lei Q, Li C, Zuo ZX, Huang CH, Cheng HH, Zhou RJ. Evolutionary Insights into RNA trans-Splicing in Vertebrates. Genome Biol Evol. 2016;8(3):562-77.
4.Yang YF, Zhang XQ, Ma XH, Zhao TL, Sun QS, Huan Q, et al. Trans-splicing enhances translational efficiency in C. elegans. Genome Res. 2017;27(9):1525-35.
5.Bergey CM, Lopez M, Harrison GF, Patin E, Cohen J, Quintana-Murci L, et al. Polygenic adaptation and convergent evolution across both growth and cardiac genetic pathways in African and Asian rainforest hunter-gatherers. bioRxiv. 2018.
6.Yang H, Jaime M, Polihronakis M, Kanegawa K, Markow T, Kaneshiro K, et al. Reannotation of eight Drosophila genomes. bioRxiv. 2018.
7.Parker DJ, Wiberg RAW, Trivedi U, Tyukmaeva VI, Gharbi K, Butlin RK, et al. Inter- and intra-specific genomic divergence in <em>Drosophila montana</em> shows evidence for cold adaptation. bioRxiv. 2018.
8.Johnson SJV, Jones D, Thrimawithana AH, Deng CH, Bowen JK, Mesarich CH, et al. Draft Genome Sequence of the Asian Pear Scab Pathogen, <em>Venturia nashicola</em>. bioRxiv. 2018.
9.Wang W, Vilella F, Moreno I, Pan W, Simon C, Quake SR. Single cell RNAseq provides a molecular and cellular cartography of changes to the human endometrium through the menstrual cycle. bioRxiv. 2018.
10.Li Q, Zhao X, Zhang W, Wang L, Wang J, Xu D, et al. Reliable Multiplex Sequencing with Rare Index Mis-Assignment on DNB-Based NGS Platform. bioRxiv. 2018.
11.Chung M, Adkins RS, Shetty AC, Sadzewicz L, Tallon LJ, Fraser CM, et al. FADU: A Feature Counting Tool for Prokaryotic RNA-Seq Analysis. bioRxiv. 2018.
12.Heldenbrand JR, Baheti S, Bockol MA, Drucker TM, Hart SN, Hudson ME, et al. Performance benchmarking of GATK3.8 and GATK4. bioRxiv. 2018.
如有任何问题欢迎大家加入WeGAP讨论社区
扫描二维码即可加入
更多精彩内容,欢迎关注生信人
TCGA | 小工具 | 数据库 |组装| 注释 | 基因家族 | Pvalue
基因预测 |bestorf | sci | NAR | 在线工具 | 生存分析 | 热图
生信不死 | 初学者 | circRNA | 一箭画心| 十二生肖 | circos
舞台|基因组 | 黄金测序 | 套路 | 杂谈组装 | 进化 | 测序简史