之前小编分享了玉米和高粱全长转录组文献,参见下面链接:
下面小编分享今年最新的全长转录组文章
1. 四倍体棉花全长转录组
Wang,M., Wang, P., (2017), A global survey ofalternative splicing in allopolyploid cotton: landscape, complexity andregulation. New Phytol. doi:10.1111/nph.14762
摘要
选择性剪接(AS)是真核生物中重要的调节机制,其作用是大大增加转录本的多样性。在二代测序中已经揭示了AS的广泛性和复杂性。然而在多倍体植物中match,由于亚基因组之间序列的高度相似性,该技术在准确鉴定多倍体物种可变剪接方面效果较差。在这里我们描述了四倍体棉花中的的AS。借助Pacific Biosciences单分子测序(Iso-Seq),我们开发了用于Iso-Seq转录组数据分析(https://github.com/Nextomics/pipeline-for-isoseq)的流程。我们从44 968个基因模型中确定了17,6 849个全长转录本,并更新了相关的基因注释。这些数据帮助我们识别15 102个与纤维有关的AS事件,并估计约 51.4%的同源基因在每个亚基因组中产生不同的可变剪接体。我们发现AS允许miRNA对相同基因的不同剪接体进行差异调控。我们研究还显示DNA甲基化等在染色质水平上对外显子形成起着重要的作用。本研究为AS的复杂性和调控提供了新的见解,并将增强我们对多倍体物种中AS的理解。我们的Iso-Seq数据分析方法可以作为其他物种中AS研究的有用参考。
英文摘要
Alternative splicing (AS) is a crucial regulatory mechanism in eukaryotes, which acts by greatly increasing transcriptome diversity. The extent and complexity of AS has been revealed in model plants using high-throughput next-generation sequencing. However, this technique is less effective in accurately identifying transcript isoforms in polyploid species because of the high sequence similarity between coexisting subgenomes. Here we characterize AS in the polyploid species cotton. Using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq), we developed an integrated pipeline for Iso-Seq transcriptome data analysis (https://github.com/Nextomics/pipeline-for-isoseq). We identified 176 849 full-length transcript isoforms from 44 968 gene models and updated gene annotation. These data led us to identify 15 102 fibre-specific AS events and estimate that c. 51.4% of homoeologous genes produce divergent isoforms in each subgenome. We reveal that AS allows differential regulation of the same gene by miRNAs at the isoform level. We also show that nucleosome occupancy and DNA methylation play a role in defining exons at the chromatin level. This study provides new insights into the complexity and regulation of AS, and will enhance our understanding of AS in polyploid species. Our methodology for Iso-Seq data analysis will be a useful reference for the study of AS in other species.
2.兔全长转录组
Chen S Y, Deng F, Jia X, et al. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing[J]. Scientific Reports, 2017, 7.
摘要
众所周知,转录多样性对真核生物的生物调控有很大贡献。自从第二代测序技术出现以来,大量的RNA测序研究大大提高了我们对转录复杂度的理解。然而,由于短read组装的困难,获得全长转录本仍然是一个巨大的挑战。在本研究中,我们采用PacBio单分子长读长测序技术,用于绘制兔(Oryctolagus cuniculus)的全转录组图谱。我们从14,474个基因座中获得了36,186个高可信度转录本,其中超过23%的基因座和66%的转录本在目前的参考基因组中尚未被注释。此外,约17%的转录本显示为非编码RNA。在此重新构建的转录本中检测到多达24,797个可变剪接(AS)和11,184个可选择性多聚腺苷酸化(APA)事件。结果提供了一整套全面的转录本参考数据集,从而有助于改进兔基因组的注释。
英文摘要
It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.
3.甘蔗全长转录本
Hoang N V, Furtado A, Mason P J, et al. A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing[J]. Bmc Genomics, 2017, 18(1):395.
摘要
对来自22个甘蔗品种的不同发育阶段的叶,节间和根组织的混样RNA样品进行Iso-Seq测序,以探索捕获全长转录本的可能性。共获得107,598种非冗余的的转录本,占预计甘蔗基因总数的71%。大部分数据集(92%)与植物蛋白质数据库相匹配,而超过2%是新的转录本,超过2%是长期非编码RNA。总共序列约56%和23%分别注释到GO和KEGG通路数据库。与来自同一实验的公共数据库中Illumina 二代RNA测序(RNA-Seq)的从头组装结果比较显示,Iso-Seq方法获得更多全长转录本,具有较高的N50和大的平均长度;而在RNA-Seq中捕获了更多的基因和RNA转录本。只有62%的PacBio转录本能够比对到67%的二代从头组装的转录本中,而未比对上的归因于包含叶/根组织和PacBio的归一化,以及二代组装结果中更多的基因和RNA转录本。 约69%PacBio转录本能够比对到高粱基因组上,而二代从头组装转录本约41%能够比对上
英文摘要
Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms.
The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes.
The transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane.
欢迎关注生信人