知识中心 - 北京概普生物科技有限公司(GapTech)

MUMER--近缘物种共线性分析工具

生信干货 struggle ·2016年12月2日 04:31

一. Aligning two draft sequences

对两个draft genome作比对一般使用MUMER下面的nucmer

1、NUCMER

基本命令：

nucmer --prefix=ref_qry ref.fasta qry.fasta

参数：

USAGE: nucmer [options] <Reference> <Query>

Reference 参考基因组文件

Query 要比对基因组文件

OPTIONS:（两个重要参数）

--mum Use anchor matches that are unique in both the reference

and query###比对区块参考与query都要唯一

--mumreference Use anchor matches that are unique in in the

reference but not necessarily unique in the query (default behavior)

delta-filter过滤（可选）：

基本命令：

delta-filter [options] <deltafile>

参数：

-1 1-to-1 alignment allowing for rearrangements

(intersection of -r and -q alignments)

-g 1-to-1 global alignment not allowing rearrangements

-h Display help information

-i float minimum alignment identity [0, 100], default 0##identity

-l int Set the minimum alignment length, default 0###最低比对长度

-m Many-to-many alignment allowing for rearrangements

(union of -r and -q alignments)

-q Maps each position of each query to its best hit in

the reference, allowing for reference overlaps

-r Maps each position of each reference to its best hit

in the query, allowing for query overlaps

-u float Set the minimum alignment uniqueness, i.e. percent of

the alignment matching to unique reference AND query

sequence [0, 100], default 0

-o float Set the maximum alignment overlap for -r and -q options

as a percent of the alignment length [0, 100], default 100

解释：
1）-r：去掉参考中有overlap的比对结果

2）-q：去掉query中有overlap的比对结果

3）-r -q 去掉参考中有overlap的比对结果和去掉query中均有overlap的比对结果。

4）-o 上述-r 和 -q中定义的overlap的距离占比对距离的百分比。超过此参数才认为是有overlap

5） -g option and the -1 and -m区别

A. -g 英文解释：requires the alignments to be mutually consistent

in their order, 翻译过来就是两者比对时比对的区块order必须一致；不允许倒位和易位。举例如下：可见加入-g把48489那一行去掉了。

未加入-g

加入-g

B. -1 and -m options are not required to be

mutually consistent and therefore tolerate translocations,

inversions, etc.

C. In general cases, the -m option is the best choice,

D. -1 can be handy for applications such as SNP finding which

require a 1-to-1 mapping.

E.for mapping query contigs, or sequencing reads, to a reference genome, use -q.

6)设置严格程度依次为-i -l -u -q -r -g -m -1参数

2、show-coords

对上一步产生的结果进行过滤及输出格式转换

基本命令：

show-coords -rclT ref_qry.delta > ref_qry.coords

参数：

-b Merges overlapping alignments regardless of match dir

or frame and does not display any idenitity information.###overlap合并

-c Include percent coverage information in the output###输出结果包含覆盖度信息

-H Do not print the output header

-I float Set minimum percent identity to display##identity设置

-l Include the sequence length information in the output###输出内容中包括query和ref序列长度

-L long Set minimum alignment length to display###输出最低比对长度的比对结果

-q Sort output lines by query IDs and coordinates##按照query的ID排序输出

-r Sort output lines by reference IDs and coordinates##按照ref的ID排序输出

-T Switch output to tab-delimited format#####输出tab格式的

3、可视化

1）show-aligns基本命令：

show-aligns ref_qry.delta refname qryname > ref_qry.aligns

refname and qryname 是上述基因组序列中ID

结果图片：

2）mummerplot 基本命令：

mummerplot ref_qry.delta -R ref.fasta -Q qry.fasta --png --filter --layout

产生one-to-one mapping 的比对结果共线性图.

输出图片格式有三种选择：

--x11 --postscript --png

结果图片：

二. Mapping a draft sequence to a finished sequence

主要命令如下：

nucmer --prefix=ref_qry ref.fasta qry.fasta

delta-filter过滤（可选）：delta-filter [options] <deltafile>

show-coords -rcl ref_qry.delta > ref_qry.coords

show-aligns ref_qry.delta refname qryname > ref_qry.aligns

show-tiling ref_qry.delta > ref_qry.tiling（与前述差异点）

draft来比对组装较好的参考基因组，主要命令差别在于show-tiling这一步。

其主要参数如下：

上述参数主要用来对结果进行过滤这一步会将draft genome定位到参考基因组上，给出一个列表如下

各列含义如下：

start in ref,end in ref, distance to next contig, length of this contig, alignment coverage, identity, orientation, and ID