知识中心 - 北京概普生物科技有限公司(GapTech)

2021年岁末biorxiv生信好文速览

生信干货 Montreal ·2022年1月25日 09:57

2021年年底的bioRxiv生信好文速览，我们的引子当然少不了年度盘点。这次，小编决定将舞台交给Richard Sever。理查德是大名鼎鼎的bioRxiv、medrxiv，同时也是美国冷泉港实验室出版社的的联合创始人。新年伊始，理查德就在推特上对biorxiv过去的一年做了盘点。另外，有读者向小编建议说每期推荐的文章应该加上翻译，小编决定部分地满足大家的这一愿望：在引子部分，对理查德的biorxiv总结来一个翻译吧。

去年疫情这么严重，各个行业都往下走，只有bioRxiv和medRxiv业绩越来越好，文章越来越多啊，而且频频在各大媒体的显要位置出现。没办法，谁让新冠的文章都首先在我们这里投稿呢。看以后谁敢瞧不起我们？

看看上面这个截图吧，25号你们这帮人也不知道休息，一个劲往我们bioxiv投稿，搞得我们后台工作人员圣诞节都要忙的不亦乐乎。顺便希望来自中国的作者多多包涵，如果你发现12月底biorxiv稿件的处理速度慢了，那是因为在美国圣诞节就相当于农历新年。

不论怎样，新年了，给大家拜个年。看到的赶快给我点赞。

【奥密克戎】蒙大拿州立大学：新冠病毒演化分析表示棘蛋白的突变有机会削弱疫苗效果

The rise and fall of SARS-CoV-2 variants and the mutational profile of Omicron

Omicron is the fifth SARS-CoV-2 variant to be designated a Variant of Concern (VOC) by the World Health Organization (WHO). Here we provide a retrospective analysis of SARS-CoV-2 variants and explain how the Omicron variant is distinct. Our work shows that the spike protein is a ‘hotspot’ for viral evolution in all variants, suggesting that existing vaccines and diagnostics that target this protein may become less effective against Omicron and that our therapeutic and public health strategies will have to evolve along with the virus.

【大王乌贼】德布鲁因图构建工具Cuttlefish（乌贼）升级了

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

We present Cuttlefish 2, significantly advancing the existing state-of-the-art methods for construction of this graph. On a typical shared-memory machine, it reduces the construction of the compacted de Bruijn graph for 661K bacterial genomes (2.58 Tbp of input reference genomes) from about 4.5 days to 17–23 hours. Similarly on sequencing data, it constructs the graph for a 1.52 Tbp white spruce read set in about 10 hours, while the closest competitor, which also uses considerably more memory, requires 54–58 hours.

【巨型病毒】巴黎萨克雷大学（Université Paris-Saclay）及日本京都大学（Kyoto University）联合团队发现海洋宏基因组数据中的新型巨型病毒

Discovery of a class of giant virus relatives displaying unusual functional traits and prevalent within plankton: the Mirusviricetes

Large and giant DNA viruses of the phylum Nucleocytoviricota have a profound influence on the ecology and evolution of planktonic eukaryotes. Recently, various Nucleocytoviricota genomes have been characterized from environmental metagenomes based on the occurrence of hallmark genes identified from cultures. However, lineages diverging from the culture genomics functional principles have been overlooked thus far. Here, we developed a phylogeny-guided genome-resolved metagenomic framework using a single hallmark gene as compass, a subunit of DNA-dependent RNA polymerase encoded by most Nucleocytoviricota. We applied this method to large metagenomic data sets from the surface of five oceans and two seas and characterized 697 non-redundant Nucleocytoviricota genomes up to 1.45 Mbp in length. This database expands the known diversity of the class Megaviricetes and revealed two additional putative classes we named Proculviricetes and Mirusviricetes. Critically, the diverse and prevalent Mirusviricetes population genomes seemingly lack several hallmark genes, in particular those related to viral particle morphogenesis. Instead, they share various genes of known (e.g., TATA-binding proteins, histones, proteases and viral rhodopsins) and unknown functions rarely detected if not entirely missing in other characterized Nucleocytoviricota classes. Phylogenomics, comparative genomics, functional trends and the signal among planktonic cellular size fractions point to Mirusviricetes being a major, functionally divergent class of large DNA viruses that actively infect eukaryotes in the sunlit ocean using an enigmatic functional life style. Finally, we built a comprehensive marine genomic database for Nucleocytoviricota by combining multiple environmental surveys that might contribute to future endeavors exploring the ecology and evolution of plankton.

【一字之差】Deepmed，医学图像学研究的deepmind？

DeepMed: A unified, modular pipeline for end-to-end deep learning in computational pathology

The interpretation of digitized histopathology images has been transformed thanks to artificial intelligence (AI). End-to-end AI algorithms can infer high-level features directly from raw image data, extending the capabilities of human experts. In particular, AI can predict tumor subtypes, genetic mutations and gene expression directly from hematoxylin and eosin (H&E) stained pathology slides. However, existing end-to-end AI workflows are poorly standardized and not easily adaptable to new tasks. Here, we introduce DeepMed, a Python library for predicting any high-level attribute directly from histopathological whole slide images alone, or from images coupled with additional meta-data (https://github.com/KatherLab/deepmed). Unlike earlier computational pipelines, DeepMed is highly developer-friendly: its structure is modular and separates preprocessing, training, deployment, statistics, and visualization in such a way that any one of these processes can be altered without affecting the others. Also, DeepMed scales easily from local use on laptop computers to multi-GPU clusters in cloud computing services and therefore can be used for teaching, prototyping and for large-scale applications. Finally, DeepMed is user-friendly and allows researchers to easily test multiple hypotheses in a single dataset (via cross-validation) or in multiple datasets (via external validation). Here, we demonstrate and document DeepMed’s abilities to predict molecular alterations, histopathological subtypes and molecular features from routine histopathology images, using a large benchmark dataset which we release publicly. In summary, DeepMed is a fully integrated and broadly applicable end-to-end AI pipeline for the biomedical research community.

5. 【华盖朵朵】匈牙利学者：比较基因组显示蘑菇基因组中约10%的基因与子实体发育有关

Lessons on fruiting body morphogenesis from genomes and transcriptomes of Agaricomycetes

Altogether, our discussions cover 1480 genes of Coprinopsis cinerea, and their orthologs in Agaricus bisporus, Cyclocybe aegerita, Armillaria ostoyae, Auriculariopsis ampla, Laccaria bicolor, Lentinula edodes, Lentinus tigrinus, Mycena kentingensis, Phanerochaete chrysosporium, Pleurotus ostreatus, and Schizophyllum commune, providing functional hypotheses for ∼10% of genes in the genomes of these species. Although experimental evidence for the role of these genes will need to be established in the future, our data provide a roadmap for guiding functional analyses of fruiting related genes in the Agaricomycetes. We anticipate that the gene compendium presented here, combined with developments in functional genomics approaches will contribute to uncovering the genetic bases of one of the most spectacular multicellular developmental processes in fungi.

【冰原求生】植物如何适应极地环境？来看看转录组分析能带来哪些线索。来自挪威奥斯陆大学

What can the cold-induced transcriptomes of Arctic Brassicaceae tell us about the evolution of cold tolerance?

We found that the cold response is highly species-specific. Among thousands of differentially expressed genes, ∼200 genes were shared among the three Arctic species and A. thaliana, and only ∼100 genes were specific to the three Arctic species alone. This pattern was also reflected in the functional comparison. Our results show that the cold response of Arctic plant species has mainly evolved independently, although it likely builds on a conserved basis found across Brassicaceae. The findings also confirm that highly polygenic traits, such as cold tolerance, may show less repeatable patterns of adaptation than traits involving only a few genes.

【小脑发育】德国海德堡大学（Heidelberg University）：单核RNA测序分析为脑瘤的发生提供新思路

Mapping pediatric brain tumors to their origins in the developing cerebellum

Understanding the cellular origins of childhood brain tumors is key for discovering novel tumor-specific therapeutic targets. Previous strategies mapping cellular origins typically involved comparing human tumors to murine embryonal tissues1,2, a potentially imperfect approach due to spatio-temporal gene expression differences between species3. Here we use an unprecedented single-nucleus atlas of the developing human cerebellum (Sepp, Leiss, et al) and extensive bulk and single-cell transcriptome tumor data to map their cellular origins with focus on three most common pediatric brain tumors – pilocytic astrocytoma, ependymoma, and medulloblastoma. Using custom bioinformatics approaches, we postulate the astroglial and glial lineages as the origins for posterior fossa ependymomas and radiation-induced gliomas (secondary tumors after medulloblastoma treatment), respectively. Moreover, we confirm that SHH, Group3 and Group4 medulloblastomas stem from granule cell/unipolar brush cell lineages, whereas we propose pilocytic astrocytoma to originate from the oligodendrocyte lineage. We also identify genes shared between the cerebellar lineage of origin and corresponding tumors, and genes that are tumor specific; both gene sets represent promising therapeutic targets. As a common feature among most cerebellar tumors, we observed compositional heterogeneity in terms of similarity to normal cells, suggesting that tumors arise from or differentiate into multiple points along the cerebellar “lineage of origin”.

【扩增子】amplicon分析再填利器，英国Quadram Institute出品

LotuS2: An ultrafast and highly accurate tool for amplicon sequencing analysis

In LotuS2, six different sequence clustering algorithms as well as extensive pre- and post-processing options allow for flexible data analysis by both experts, where parameters can be fully adjusted, and novices, where defaults are provided for different scenarios. We benchmarked three independent gut and soil datasets, where LotuS2 was on average 29 times faster compared to other pipelines - yet could better reproduce the alpha- and beta-diversity of technical replicate samples. Further benchmarking a mock community with known taxa composition showed that, compared to the other pipelines, LotuS2 recovered a higher fraction of correctly identified genera and species (98% and 57%, respectively). At ASV/OTU level, precision and F-score were highest for LotuS2, as was the fraction of correctly reconstructed 16S sequences.

【博闻强识】英国生物医药公司Astrazeneca科学家James Hadfield博文：2022伊始BGI进军北美，illumina终遇挑战

illumina are finally getting some NGS competition

http://enseqlopedia.com/2021/12/illumina-finally-getting-ngs-competition/?utm_campaign=coregenomicstwitter&utm_medium=twitter&utm_source=twitter

【不说生信】斯坦福大学：核糖体的直接观测揭示真核生物翻译的扫描和调控过程

Rapid 40S scanning and its regulation by mRNA structure during eukaryotic translation initiation

How the eukaryotic 43S preinitiation complex scans along the 5′ untranslated region (5′UTR) of a capped mRNA to locate the correct start codon remains elusive. Here, we directly track yeast 43S-mRNA binding, scanning, and 60S subunit joining by real-time single-molecule fluorescence spectroscopy. Once engaged with the mRNA, 43S scanning occurs at >100 nucleotides per second, independent of multiple cycles of ATP-hydrolysis by RNA helicases. The scanning ribosomes can proceed through RNA secondary structures, but 5′UTR hairpin sequences near start codons drive scanning ribosomes at start codons back in the 5′ direction, requiring rescanning to arrive once more at a start codon. Direct observation of scanning ribosomes provides a mechanistic framework for translational regulation by 5′UTR structures and upstream near-cognate start codons.