11月28日中午,全球无数人的目光都聚焦在香港大学的李兆基会议中心——贺建奎教授正在第二届国际人类基因组编辑国际峰会上对他主导的基因编辑婴儿的研究发表演讲。尽管有着13个小时的时差,小编还是熬夜收看了贺教授的报告。本期的bioRxiv好文速览由该事件引出不单是为了蹭热点,也因为这场报告本身同我们的主题相关:在提问环节,有一个问题涉及到了bioRxiv!让我们看看这个问题是怎么说的。
Question 11(翻译源自钛媒体):回到透明度问题上,您是否愿意将知情同意书和您的稿件发布在一个公共论坛上,以便可以在biorxiv.org或知情同意书公共网站上进行审阅,以便业内能够详细阅读您所做的工作?
贺建奎:是的,事实上知情同意书已经在网站上了。搜索我的名字,你就会找到它。第二,对于知情同意书,在我起草的时候,大约有10个人看过,其中有些来自美国。我会发给几个人,让他们来评论。我可能(不?)提交到biorxiv。在发布到biorxiv之前,应该先进行同行评审。
我们对整个事件不做评价。单从这个提问来看,似乎也从一个侧面反映出bioRxiv等预印本服务器已在生物医学研究中具有很强的号召力,甚至渗透到了临床研究。对这一现象,不久前,来自纽约的著名科学记者Ivan Oransky提出了一些反对的声音。他指出,过往的很多学术丑闻来自于经过同行评议的文章【1】。我们进一步追问:那么何况bioRxiv上这些尚未经同行评议的manuscript呢?预印本的初衷应该是加速科研结果的分享、促进交流而非各自闭门造车、并从中获取反馈以提升文章质量。然而,从另一方面讲,一旦读者有所疏忽,未经同行评议就刊出的预印本文章中的错误结果也可能对科学家产生误导。因此,当我们阅读bioRxiv上面文章的时候,或许需要格外critical。
1. 【genome editing】先来蹭一下基因组编辑的热点
1.1 中科院神经所等单位报道CRISPR单碱基编辑的大量脱靶现象
Base editing generates substantial off-target single nucleotide variants(CC-BY-NC-ND 4.0)
Genome editing tools including CRISPR/Cas9 and base editors hold great promise for correcting pathogenic mutations. Unbiased genome-wide off-target effects of the editing in mammalian cells is required before clinical applications, but determination of the extent of off-target effects has been difficult due to the existence of single nucleotide polymorphisms (SNPs) in individuals. Here, we developed a method named GOTI (Genome-wide Off-target analysis by Two-cell embryo Injection) to detect off-target mutations without interference of SNPs. We applied GOTI to both the CRISPR-Cas9 and base editing (BE3) systems by editing one blastomere of the two-cell mouse embryo and then compared whole genome sequences of progeny-cell populations at E14.5 stage. Sequence analysis of edited and non-edited cell progenies showed that undesired off-target single nucleotide variants (SNVs) are rare (average 10.5) in CRISPR-edited mouse embryos, with a frequency close to the spontaneous mutation rate. By contrast, BE3 editing induced over 20-fold higher SNVs (average 283), raising the concern of using base-editing approaches for biomedical application.
针对此文,来自澳大利亚国立大学的著名基因组编辑学者Burgio教授也在第一时间通过推特发表了自己的看法:
1.2 CRISPR/Cas9和非同源末端重组(NHEJ)介导的DNA双链断裂(DSB)修复的新启示
Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair (CC-BY-NC-ND 4.0)
Non-homologous end-joining (NHEJ) plays an important role in double-strand break (DSB) repair of DNA. Recent studies have shown that the error patterns of NHEJ are strongly biased by sequence context, but these studies were based on relatively few templates. To investigate this more thoroughly, we systematically profiled ~1.16 million independent mutational events resulting from CRISPR/Cas9-mediated cleavage and NHEJ-mediated DSB repair of 6,872 synthetic target sequences, introduced into a human cell line via lentiviral infection. We find that: 1) insertions are dominated by 1 bp events templated by sequence immediately upstream of the cleavage site, 2) deletions are predominantly associated with microhomology, and 3) targets exhibit variable but reproducible diversity with respect to the number and relative frequency of the mutational outcomes to which they give rise. From these data, we trained a model that uses local sequence context to predict the distribution of mutational outcomes. Exploiting the bias of NHEJ outcomes towards microhomology mediated events, we demonstrate the programming of deletion patterns by introducing microhomology to specific locations in the vicinity of the DSB site. We anticipate that our results will inform investigations of DSB repair mechanisms as well as the design of CRISPR/Cas9 experiments for diverse applications including genome-wide screens, gene therapy, lineage tracing and molecular recording.
2. 【Bioinformatics】转录因子结合位点分析新工具GimmeMotifs
GimmeMotifs: an analysis framework for transcription factor motif analysis (CC-BY 4.0)
Background: Transcription factors (TFs) bind to specific DNA sequences, TF motifs, in cis-regulatory sequences and control the expression of the diverse transcriptional programs encoded in the genome. The concerted action of TFs within the chromatin context enables precise temporal and spatial expression patterns. To understand how TFs control gene expression it is essential to model TF binding. TF motif information can help to interpret the exact role of individual regulatory elements, for instance to predict the functional impact of non-coding variants. Findings: Here we present GimmeMotifs, a comprehensive computational framework for TF motif analysis. Compared to the previously published version, this release adds a whole range of new functionality and analysis methods. It now includes tools for de novo motif discovery, motif scanning and sequence analysis, motif clustering, calculation of performance metrics and visualization. Included with GimmeMotifs is a non-redundant database of clustered motifs. Compared to other motif databases, this collection of motifs shows competitive performance in discriminating bound from unbound sequences. Using our de novo motif discovery pipeline we find large differences in performance between de novo motif finders on ChIP-seq data. Using an ensemble method such as implemented in GimmeMotifs will generally result in improved motif identification compared to a single motif finder. Finally, we demonstrate maelstrom, a new ensemble method that enables comparative analysis of TF motifs between multiple high-throughput sequencing experiments, such as ChIP-seq or ATAC-seq. Using a collection of ~200 H3K27ac ChIP-seq data sets we identify TFs that play a role in hematopoietic differentiation and lineage commitment. Conclusion: GimmeMotifs is a fully-featured and flexible framework for TF motif analysis. It contains both command-line tools as well as a Python API and is freely available at: https://github.com/vanheeringen-lab/gimmemotifs.
3. 【Evolution】IQ-tree作者裴广明携手澳大利亚国立大学同事阐述选择合适替代模型对进化树构建的重要意义
The Prevalence and Impact of Model Violations in Phylogenetics (CC-BY 4.0)
In phylogenetic inference, we commonly use models which assume that sequence evolution is stationary, reversible and homogeneous (SRH). Although such assumptions are often criticized, the extent of SRH violations and their effects on phylogenetic inference are not well understood. Here, we extend the matched-pairs test of symmetry to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic datasets. We show that many partitions (39.5%) reject the SRH assumptions, and that in most datasets, phylogenies inferred from all partitions differ significantly from those inferred using the subset of partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. They also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org).
4. 【Single-cell】新算法LIGER助力单细胞测序中的cell-type definition
Integrative inference of brain cell similarities and differences from single-cell genomics (CC-BY-NC-ND 4.0)
Defining cell types requires integrating diverse measurements from multiple experiments and biological contexts. Recent technological developments in single-cell analysis have enabled high-throughput profiling of gene expression, epigenetic regulation, and spatial relationships amongst cells in complex tissues, but computational approaches that deliver a sensitive and specific joint analysis of these datasets are lacking. We developed LIGER, an algorithm that delineates shared and dataset-specific features of cell identity, allowing flexible modeling of highly heterogeneous single-cell datasets. We demonstrated its broad utility by applying it to four diverse and challenging datasets from human and mouse brain cells. First, we defined both cell-type-specific and sexually dimorphic gene expression in the mouse bed nucleus of the stria terminalis, an anatomically complex brain region that plays important roles in sex-specific behaviors. Second, we analyzed gene expression in the substantia nigra of seven postmortem human subjects, comparing cell states in specific donors, and relating cell types to those in the mouse. Third, we jointly leveraged in situ gene expression and scRNA-seq data to spatially locate fine subtypes of cells present in the mouse frontal cortex. Finally, we integrated mouse cortical scRNA-seq profiles with single-cell DNA methylation signatures, revealing mechanisms of cell-type-specific gene regulation. Integrative analyses using the LIGER algorithm promise to accelerate single-cell investigations of cell-type definition, gene regulation, and disease states.
5. 【Genomics】非编码RNA在蜜蜂社会性进化中的影响
Rate variation in conserved noncoding DNA reveals regulatory pathways associated with social evolution (CC-BY-NC-ND 4.0)
The evolution of eusociality represents an increase in complexity from individual to caste-based, group reproduction. These behavioral transitions have been hypothesized to go hand-in-hand with an increased ability to regulate when and where genes are expressed. Bees have convergently evolved eusociality up to five times, providing a powerful framework to test this hypothesis. Here, we compare conserved, non-coding sequences in eleven bee species, encompassing three independent origins of reproductive division of labor and two elaborations of eusocial complexity to examine potential links between these putatively regulatory sequences and social evolution. We find that rates of evolution in a number of these loci are correlated with social transitions, suggesting they have played a role in the evolution of these behaviors. Interestingly, loci associated with social origins represent distinct molecular pathways to those associated with subsequent elaborations. We also find many novel non-coding regions that appear to have been recruited alongside the origin of sociality in corbiculate bees; these regions are enriched for cell development and nervous system functions. Thus, our results highlight the potential importance of non-coding change in the evolution of eusociality and are consistent with the idea that regulatory innovations play a key role in insect behavioral complexity.
6. 【Transcriptomics】Nanopore最新黑科技使得对RNA分子的直接测序成为可能(且声称具有很高精度)
Nanopore native RNA sequencing of a human poly(A) transcriptome (CC-BY 4.0)
High throughput RNA sequencing technologies have dramatically advanced our understanding of transcriptome complexity and regulation. However, these cDNA-based methods lose information contained in biological RNA because the copied reads are short or because modifications are not carried forward in cDNA. Here we address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies (ONT). Our study focused on poly(A) RNA isolated from the human cell line GM12878, from which we sequenced approximately 9.9 million individual aligned strands. These native RNA sequence reads had an N50 length of 1334 bases, and a maximum length of 22,000 bases. A total of 78,199 high-confidence isoforms were identified by combining long nanopore reads with short higher accuracy Illumina reads. Among these isoforms, over 50% are not present in GENCODE v24. We describe strategies for assessing 3'poly(A) tail length, base modifications and transcript haplotypes using this single molecule technology. Together, these nanopore-based techniques are poised to deliver new insights into RNA biology.
7. 【ncRNA】测序发现植物胞外囊泡中蕴藏多种small RNA及超微型RNA(tyRNA)
Plant Extracellular Vesicles Contain Diverse Small RNA Species and Are Enriched in 10 to 17 Nucleotide "Tiny" RNAs (CC-BY-NC-ND 4.0)
Small RNAs (sRNAs) that are 21 to 24 nucleotides (nt) in length are found in most eukaryotic organisms and regulate numerous biological functions, including transposon silencing, development, reproduction, and stress responses, typically via control of the stability and/or translation of target mRNAs. Major classes of sRNAs in plants include microRNAs (miRNAs) and small interfering RNAs (siRNAs); sRNAs are known to travel as a silencing signal from cell to cell, root to shoot, and even between host and pathogen. In mammals, sRNAs are transported inside extracellular vesicles (EVs), which are mobile lipid compartments that participate in intercellular communication. In addition to sRNAs, EVs carry proteins, lipids, metabolites, and potentially other types of nucleic acids. Here we report that plant EVs also contain diverse species of sRNA. We found that specific miRNAs and siRNAs are preferentially loaded into plant EVs. We also report a previously overlooked class of "tiny RNAs" (10 to 17 nt) that are highly enriched in EVs. This new RNA category of unknown function has a broad and very diverse genome origin and might correspond to degradation products.
8. 【single-cell】加州理工学院Lior Pachter课题组推出scRNA-seq比对的新格式规范BUS
The Barcode, UMI, Set format and BUStools (CC-BY-NC-ND 4.0)
We introduce the Barcode-UMI-Set format (BUS) for representing pseudoalignments of reads from single-cell RNA-seq experiments. The format can be used with all single-cell RNA-seq technologies, and we show that BUS files can be efficiently generated. BUStools is a suite of tools for working with BUS files and facilitates rapid quantification and analysis of single-cell RNA-seq data. The BUS format therefore makes possible the development of modular, technology-specific, and robust workflows for single-cell RNA-seq analysis.
9. 【Bioinformatics】新软件帮助对unaligned reads的再利用
Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads (CC-BY-NC-ND 4.0)
Motivation: Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for further downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Results: We have developed Scavenger, a pipeline for recovering unaligned reads using a novel mechanism which utilises information from aligned reads. Scavenger performs recovery of unaligned reads by re-aligning unaligned reads against a putative location derived from aligned reads with sequence similarity against unaligned reads. We show that Scavenger can successfully recover unaligned reads in both simulated and real RNA-seq datasets, including single-cell RNA-seq data. The reads recovered contain more genetic variants compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. We also explored the impact of read recovery on downstream analyses, in particular gene expression analysis, and showed that Scavenger is able to both recover genes which were previously non-expressed and also increase gene expression, with lowly expressed genes having the most impact from the addition of recovered reads. We also found that the majority of genes with >1 fold change in expression after recovery are categorised as pseudogenes, indicating that pseudogene expression can be affected by the false-negative non-alignment problem. Scavenger helps to solve the false-negative non-alignment problem through recovery of unaligned reads using information from previously aligned reads.
10. 【Evolution】BEAST(Bayesian Evolutionary Analysis Sampling Trees)新版本和大家见面
BEAST 2.5: An Advanced Software Platform for Bayesian Evolutionary Analysis (CC-BY 4.0)
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
引文
1. Oransky I., 2018, Preprints are coming to clinical research. Reporters, are you ready? http://fnpi.org/es/blog/periodismosalud/los-preprints-estan-llegando-la-investigacion-clinica-reporteros-estan-listos
欢迎关注生信人
TCGA | 小工具 | 数据库 |组装| 注释 | 基因家族 | Pvalue
基因预测 |bestorf | sci | NAR | 在线工具 | 生存分析 | 热图
生信不死 | 初学者 | circRNA | 一箭画心| 十二生肖 | circos
舞台|基因组 | 黄金测序 | 套路 | 杂谈组装 | 进化 | 测序简史