刚刚过去的7月有很多与预印本(preprint)相关的事件。首先是Nature刊文对预印本这种未经同行评议就快速放在网上的科学文稿形式提出质疑:
作者在文中表示:Weak work that hasn’t been reviewed could get overblown in the media. Conversely, better work could be ignored. 该文迅速在社交媒体上引发热议。有学者表示支持,然而也有许多科学家认为该文针对预印本的看法有失偏颇。两方纷纷亮出自己的理据,在推特上展开热烈讨论。
另外, 著名文献服务器Europe PMC宣布将对包括bioRxiv、PeerJ Preprints、ChemRxiv、F1000Res等预印本服务器所刊载的预印本文稿进行收录。这样一来,未来大家将可以在Europe PMC上检索到preprint的稿件(见下图)。更多消息,请大家访问https://europepmc.org。
关于预印本的新闻先说到这里。下面让我们看看在骄阳似火的7月里有哪些值得一看的预印本文稿在bioRxiv刊出。同往期内容一样,版权信息在标题内容后注明。
1. 【Genome sequencing】首个金鱼基因组问世
De Novo assembly of the goldfish (Carassius auratus) genome and the evolution of genes after whole genome duplication(This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license;也就是随便用的意思)
来自美国国家健康中心(NIH)等科研单位的研究热源通力合作,完成了首个金鱼基因组的draft assembly。作者发现金鱼和其近亲鲤鱼的共同祖先在1500万年前共同经历过一次全基因组倍增事件(whole genome duplication)。采用 71X PacBio long-reads进行全基因组测序,作者得到的金鱼基因组序列包括70324个编码蛋白质的基因和超过11000个非编码RNA转录本。同斑马鱼相比,在基因组倍增事件后金鱼并未发生大规模基因组重排的现象。然而,有趣的是,基因组倍增之后的倍增基因(duplicated gene)中,有14%已经丢失了其中一个paralog copy丢失,这一比例与鲤鱼的28%相比似乎表明金鱼基因组在和鲤鱼在进化上分道扬镳之后有着更少的变动,然而作者指出这也可能是因为鲤鱼基因组的不完整和组装问题导致的。并且,作者通过7个tissue的转录组测序发现30%的基因组倍增形成的duplicate出现表达差异(该比例受方法影响较大,请大家参看文章的methods)。关于基因组倍增的更多话题,可以参看生信人文章:PNAS最新研究揭示多倍体在昆虫进化中的普遍性。
原文Figure 1e:金鱼(goldfish)、鲤鱼(common carp)和斑马鱼(zebrafish)的进化关系。红色箭头表示全基因组加倍时间。Goldfish_1和goldfish_2表示全基因组倍增形成的两个金鱼亚基因组(subgenome)。
原文Figure 3:金鱼与斑马鱼的genome synteny
2. 【Genome editing】哈佛学者:几种CRISPR library的系统比较(1)
Up, down, and out: optimized libraries for CRISPRa, CRISPRi, and CRISPR-knockout genetic screens(CC-BY-NC-ND 4.0 )
Advances in CRISPR-Cas9 technology have enabled the flexible modulation of gene expression at large scale. In particular, the creation of genome-wide libraries for CRISPR knockout (CRISPRko), CRISPR interference (CRISPRi), and CRISPR activation (CRISPRa) has allowed gene function to be systematically interrogated. Here, we evaluate numerous CRISPRko libraries and show that our recently-described CRISPRko library (Brunello) is more effective than previously published libraries at distinguishing essential and non-essential genes, providing approximately the same perturbation-level performance improvement over GeCKO libraries as GeCKO provided over RNAi. Additionally, we developed genome-wide libraries for CRISPRi (Dolcetto) and CRISPRa (Calabrese). Negative selection screens showed that Dolcetto substantially outperforms existing CRISPRi libraries with fewer sgRNAs per gene and achieves comparable performance to CRISPRko in the detection of gold-standard essential genes. We also conducted positive selection CRISPRa screens and show that Calabrese outperforms the SAM library approach at detecting vemurafenib resistance genes. We further compare CRISPRa to genome-scale libraries of open reading frames (ORFs). Together, these libraries represent a suite of genome-wide tools to efficiently interrogate gene function with multiple modalities.
3. 【Transcriptomics】RNAlater法样品贮藏对RNAseq的潜在影响(2)
RNAlater and flash freezing storage methods nonrandomly influence observed gene expression in RNAseq experiments(CC-BY-NC-ND 4.0 )
作为高通量检测基因表达水平的一项技术,RNA-seq已被广泛地引用于各种实验中。由于。目前,有两种广泛使用的方法以规避这种偏差,一时RNAlater浸泡,二是速冻在液氮里。来自美国明尼苏达大学(University of Minnesota)的Courtney N. Passow等科学家对这两种方法进行了比较。他们分别采用RNAlater和液氮处理两种不同的方法对同一批样品进行了处理,随后进行RNA-seq。作者们发现样本储存方法的差异可能影响差异表达基因分析的不同结果。其中,液氮封存的样品中对于长度较短且GC含量较高的基因保存更好,(作者认为)能较好地反应真实的基因表达情况。而RNAlater法中,某些功能的基因会更加富集。作者这一现象可能是因为RNAlater处理过程中可能触发细胞的快速反应导致部分基因表达出现波动所致。据此,作者建议大家使用公共数据库时,对于实验样品的储存方法多加留意,因为其差异可能导致发现“假的”差异表达的基因。值得一提的是,本文甫一发表便在社交媒体上得到大量关注,在短短三天时间里达成571次转推、5068次浏览、677次PDF阅读的数据。不过,也有学者对文章迅速表达了concern。例如,来自芝加哥大学的人类遗传学教授Yoav Gilad就在bioRxiv该文下方的留言板留言表示文章有两点主要问题,并表示其结论不能被其数据支持。
4. 【Genome sequencing】澳中学者联手解析小麦基因组关键序列(3)
Optical and physical mapping with local finishing enables megabase-scale resolution of agronomically important regions in the wheat genome(CC-BY 4.0)
Background: Numerous scaffold-level sequences for wheat are now being released and, in this context, we report on a strategy for improving the overall assembly to a level comparable to that of the human genome. Results: Using chromosome 7A of wheat as a model, sequence-finished megabase scale sections of this chromosome were established by combining a new independent assembly based on a BAC-based physical map, BAC pool paired end sequencing, chromosome arm specific mate-pair sequencing and Bionano optical mapping with the IWGSC RefSeq v1.0 sequence and its underlying raw data. The combined assembly results in 18 super-scaffolds across the chromosome. The value of finished genome regions is demonstrated for two approximately 2.5 Mb regions associated with yield and the grain quality phenotype of fructan carbohydrate grain levels. In addition, the 50 Mb centromere region analysis incorporates cytological data highlighting the importance of non-sequence data in the assembly of this complex genome region. Conclusions: Sufficient genome sequence information is shown to be now available for the wheat community to produce sequence-finished releases of each chromosome of the reference genome. The high-level completion identified that an array of seven fructosyl transferase genes underpins grain quality and yield attributes are affected by five f-box-only-protein-ubiquitin ligase domain and four root-specific lipid transfer domain genes. The completed sequence also includes the centromere.
5. 【Evolution】宏转录组学分析揭开Asgard古菌代谢的神秘面纱(4)
Asgard archaea are diverse, ubiquitous, and transcriptionally active microbes(CC-BY-NC-ND 4.0)
古菌的基因组学和进化研究在这几年来取得了诸多突破性进展,对传统的真核生物的起源理论产生了强烈的冲击。近年来,研究发现了一个新的古菌类群Asgard(包括Lokiarchaeota等多个门phylum),然而我们对其了解却十分有限。来自深圳大学高等研究院的李猛课题组同香港大学的合作者一道,对Asgard古菌进行了深入的分析。研究人员利用16S rRNA将Asgard归类为13个亚群(subgroup),这其中有五个亚群属首次报道,并且有着相当广泛的地理和环境分布。基因组学和宏转录组学(metatranscriptomics)的分析显示Asgard古菌具有非常多样化的代谢途径,可能在代谢上属于混合营养型(mixotrophic)。
6. 【GWAS】针对骨质疏松症的GWAS分析(5)
An Atlas of Human and Murine Genetic Influences on Osteoporosis(CC-BY-NC-ND 4.0)
Osteoporosis is a common debilitating chronic disease diagnosed primarily using bone mineral density (BMD). We undertook a comprehensive assessment of human genetic determinants of bone density in 426,824 individuals, identifying a total of 518 genome-wide significant loci, (301 novel), explaining 20% of the total variance in BMD - as estimated by heel quantitative ultrasound (eBMD). Next, meta-analysis identified 13 bone fracture loci in ~1.2M individuals, which were also associated with BMD. We then identified target genes from cell-specific genomic landscape features, including chromatin conformation and accessible chromatin sites, that were strongly enriched for genes known to influence bone density and strength (maximum odds ratio = 58, P = 10E-75). We next performed rapid throughput skeletal phenotyping of 126 knockout mice lacking eBMD Target Genes and showed that these mice had an increased frequency of abnormal skeletal phenotypes compared to 526 unselected lines (P < 0.0001). In-depth analysis of one such Target Gene, DAAM2, showed a disproportionate decrease in bone strength relative to mineralization. This comprehensive human and murine genetic atlas provides empirical evidence testing how to link associated SNPs to causal genes, offers new insights into osteoporosis pathophysiology and highlights opportunities for drug development.
7. 【NGS】unmapped reads可以透露哪些有价值的信息?(6)
Exploring the unmapped DNA and RNA reads in a songbird genome(CC-BY-NC-ND 4.0)
Mapping是二代测序的重中之重。实际上很多情况下,不论生物信息学收单如何提升,总会有一定比例的reads无法比对到参考序列基因组上。很多时候我们都会直接抛弃,殊不知,其中暗藏的生物学信息也会就这样被忽视了。来自荷兰瓦赫宁根大学(Wageningen University)的研究人员对对大山雀(great tit)的DNA (normal and bisulfite treated) and RNA测序读段进行了分析。作者将unmapped reads进行了从头组装,然后在NCBI non-redundant核酸数据库里进行BLAST比对。其中的很多reads在大山雀基因中找不到,反而和其他鸟类的基因表现出一定的相似度。此外,某些contigs可能来自大山雀的血液寄生动物,比如 Plasmodium(疟原虫)和Trypanosoma(锥虫)。本文向读者展示了如何利用unmapped reads对于基因组参考序列进行纠正,以及鉴定样品中可能存在的污染源。
原文Table 1。Aves:一种雀类。
8. 【Genome sequencing】世界上最大的猴——山魈基因组草图发布(7)
The draft genome sequence of mandrill (Mandrillus sphinx)(CC-BY-NC-ND 4.0)
图片来自维基
山魈是我国古代神话中著名的独角怪。《国语·鲁语》有言:“夔一足,越人谓之山臊。”实际上,山魈主要分布在西非的热带雨林里,是世界上最大的猴类动物。山魈性凶悍,对农作物有较大破坏性,由于人类的猎杀和栖息地的破坏,其生存状况不甚乐观。近日,来自丹麦哥本哈根大学和华大基因的研究人员报道了首个山魈基因组,测序深度96X(基于对山魈基因组大小为2.9Gb的估算)。山魈基因组含40%以上的重复序列。相信山魈基因组的发布,会为人类和灵长目动物比较基因组学研究提供新的资源,也希望可以为山魈的保护尽一份力。
Findings: Here we sequenced 284 Gb data, providing 96-fold coverage (considering the estimate genome size of 2.9 Gb), to construct a reference genome for mandrill. The assembled draft genome was 2.79 Gb with contig N50 of 20.48 Kb and scaffold N50 of 3.56 Mb. We annotated the mandrill genome to find 43.83% repeat elements, as well as 21,906 protein coding genes. We found good quality of the draft genome and gene annotation by BUSCO analysis which revealed 98% coverage of the BUSCOs. Conclusions: We established the first draft genome sequence of mandrill, which is valuable resource for future evolutionary and human diseases studies.
9. 【Genome sequencing】32Gb大小蝾螈基因组组装提升至染色体水平(8)
A Chromosome-Scale Assembly of the Enormous (32 Gb) Axolotl Genome(CC-BY-NC-ND 4.0)
还记得前不久发布史上最大基因组——蝾螈基因组吗?今年年初,来自德国和奥地利的科研人员在Nature上发表了史上最大基因组蝾螈的基因组的文章(9)。生信人当时曾做了专门的报道(详见:Nature|32Gb超大蝾螈基因组发表)。该版本基因组的N50是3Mb,实际上,大部分contig的长度相当于基因组总长的千分之一左右(9)。近日,来自美国肯塔基大学(university of Kentucky)的科学家们通过SNP Typing and Linkage Analysis等方法,将蝾螈基因组的组装提升到了染色体水平。作者们将蝾螈基因组组装成为14个染色体,并且通过荧光原位杂交对组装进行了检测。新组装的基因组中27.3Gb,94%的基因落在这14条染色体上。更多有关染色体水平组装的话题,请见生信人文章:逆天了|中科院遗传所开发了染色体级别的Contig组装方法。
原文Figure 1
10. 【Single-cell】新软件助力单细胞测序中的细胞类型划分(10)
Massive single-cell RNA-seq analysis and imputation via deep learning(CC-BY-NC-ND 4.0)
Recent advances in large-scale RNA-seq enable fine-grained characterization of phenotypically distinct cellular states within heterogeneous tissues. We present scScope, a scalable deep-learning based approach that can accurately and rapidly identify cell-type composition from millions of noisy single-cell gene-expression profiles.(第一次见如此简短的摘要,就两句话)
11. 勘误:
我们前两天报道了蘑菇基因组学的bioRxiv文章(同为真菌,为啥蘑菇可以长这么大),有朋友在下方留言说关于俄勒冈州数人之高的巨型蜜环菌的图片真实性存疑。小编位次咨询了来自俄勒冈州的真菌专家,给出的答案是俄勒冈州确实有巨型蘑菇,但并非子实体部分巨大,因此数人高巨型蘑菇的图片应为PS;这种巨型蘑菇地下菌丝巨大,研究人员将在不同地方采集的样品通过barcode确定是否属于同一个体并估算其个体大小发现蜜环菌的菌丝可以在地下蔓延覆盖很大的区域。小编在表示抱歉,并感谢热心读者纠正我们的错误。
引文
1.Sanson KR, Hanna RE, Hegde M, Donovan KF, Strand C, Sullender ME, et al. Up, down, and out: optimized libraries for CRISPRa, CRISPRi, and CRISPR-knockout genetic screens. bioRxiv. 2018.
2.Passow CN, Kono TJY, Stahl BA, Jaggard JB, Keene AC, McGaugh SE. RNAlater and flash freezing storage methods nonrandomly influence observed gene expression in RNAseq experiments. bioRxiv. 2018.
3.Keeble-Gagnère G, Rigault P, Tibbits J, Pasam R, Hayden M, Forrest K, et al. Optical and physical mapping with local finishing enables megabase-scale resolution of agronomically important regions in the wheat genome. bioRxiv. 2018.
4.Cai M, Liu Y, Zhou Z, Yang Y, Pan J, Gu J-D, et al. Asgard archaea are diverse, ubiquitous, and transcriptionally active microbes. bioRxiv. 2018.
5.Morris JA, Kemp JP, Youlten SE, Laurent L, Logan JG, Chai R, et al. An Atlas of Human and Murine Genetic Influences on Osteoporosis. bioRxiv. 2018.
6.Laine V, Gossmann TI, van Oers K, Visser ME, Groenen MAM. Exploring the unmapped DNA and RNA reads in a songbird genome. bioRxiv. 2018.
7.Yin Y, Yang T, Liu H, Huang Z, Zhang Y, Song Y, et al. The draft genome sequence of mandrill (Mandrillus sphinx). bioRxiv. 2018.
8.Smith JJ, Timoshevskaya N, Timoshevskiy VA, Keinath MC, Hardy D, Voss SR. A Chromosome-Scale Assembly of the Enormous (32 Gb) Axolotl Genome. bioRxiv. 2018.
9.Nowoshilow S, Schloissnig S, Fei JF, Dahl A, Pang AWC, Pippel M, et al. The axolotl genome and the evolution of key tissue formation regulators. Nature. 2018;554(7690):50-+.
10.Deng Y, Bao F, Dai Q, Wu L, Altschuler S. Massive single-cell RNA-seq analysis and imputation via deep learning. bioRxiv. 2018.
欢迎关注生信人
TCGA | 小工具 | 数据库 |组装| 注释 | 基因家族 | Pvalue
基因预测 |bestorf | sci | NAR | 在线工具 | 生存分析 | 热图
生信不死 | 初学者 | circRNA | 一箭画心| 十二生肖 | circos
舞台|基因组 | 黄金测序 | 套路 | 杂谈组装 | 进化 | 测序简史