对两个draft genome作比对一般使用MUMER下面的nucmer
1、NUCMER
基本命令:
nucmer --prefix=ref_qry ref.fasta qry.fasta
参数:
USAGE: nucmer [options] <Reference> <Query>
Reference 参考基因组文件
Query 要比对基因组文件
OPTIONS:(两个重要参数)
--mum Use anchor matches that are unique in both the reference
and query###比对区块参考与query都要唯一
--mumreference Use anchor matches that are unique in in the
reference but not necessarily unique in the query (default behavior)
delta-filter过滤(可选):
基本命令:
delta-filter [options] <deltafile>
参数:
-1 1-to-1 alignment allowing for rearrangements
(intersection of -r and -q alignments)
-g 1-to-1 global alignment not allowing rearrangements
-h Display help information
-i float minimum alignment identity [0, 100], default 0##identity
-l int Set the minimum alignment length, default 0###最低比对长度
-m Many-to-many alignment allowing for rearrangements
(union of -r and -q alignments)
-q Maps each position of each query to its best hit in
the reference, allowing for reference overlaps
-r Maps each position of each reference to its best hit
in the query, allowing for query overlaps
-u float Set the minimum alignment uniqueness, i.e. percent of
the alignment matching to unique reference AND query
sequence [0, 100], default 0
-o float Set the maximum alignment overlap for -r and -q options
as a percent of the alignment length [0, 100], default 100
解释:
1)-r:去掉参考中有overlap的比对结果
2)-q:去掉query中有overlap的比对结果
3)-r -q 去掉参考中有overlap的比对结果和去掉query中均有overlap的比对结果。
4)-o 上述-r 和 -q中定义的overlap的距离占比对距离的百分比。超过此参数才认为是有overlap
5) -g option and the -1 and -m区别
A. -g 英文解释:requires the alignments to be mutually consistent
in their order, 翻译过来就是两者比对时比对的区块order必须一致;不允许倒位和易位。举例如下:可见加入-g把48489那一行去掉了。
未加入-g
加入-g
B. -1 and -m options are not required to be
mutually consistent and therefore tolerate translocations,
inversions, etc.
C. In general cases, the -m option is the best choice,
D. -1 can be handy for applications such as SNP finding which
require a 1-to-1 mapping.
E.for mapping query contigs, or sequencing reads, to a reference genome, use -q.
6)设置严格程度依次为-i -l -u -q -r -g -m -1参数
2、show-coords
对上一步产生的结果进行过滤及输出格式转换
基本命令:
show-coords -rclT ref_qry.delta > ref_qry.coords
参数:
-b Merges overlapping alignments regardless of match dir
or frame and does not display any idenitity information.###overlap合并
-c Include percent coverage information in the output###输出结果包含覆盖度信息
-H Do not print the output header
-I float Set minimum percent identity to display##identity设置
-l Include the sequence length information in the output###输出内容中包括query和ref序列长度
-L long Set minimum alignment length to display###输出最低比对长度的比对结果
-q Sort output lines by query IDs and coordinates##按照query的ID排序输出
-r Sort output lines by reference IDs and coordinates##按照ref的ID排序输出
-T Switch output to tab-delimited format#####输出tab格式的
3、可视化
1)show-aligns基本命令:
show-aligns ref_qry.delta refname qryname > ref_qry.aligns
refname and qryname 是上述基因组序列中ID
结果图片:
2)mummerplot 基本命令:
mummerplot ref_qry.delta -R ref.fasta -Q qry.fasta --png --filter --layout
产生one-to-one mapping 的比对结果共线性图.
输出图片格式有三种选择:
--x11 --postscript --png
结果图片:
主要命令如下:
nucmer --prefix=ref_qry ref.fasta qry.fasta
delta-filter过滤(可选):delta-filter [options] <deltafile>
show-coords -rcl ref_qry.delta > ref_qry.coords
show-aligns ref_qry.delta refname qryname > ref_qry.aligns
show-tiling ref_qry.delta > ref_qry.tiling(与前述差异点)
draft来比对组装较好的参考基因组,主要命令差别在于show-tiling这一步。
其主要参数如下:
上述参数主要用来对结果进行过滤这一步会将draft genome定位到参考基因组上,给出一个列表如下
各列含义如下:
start in ref,end in ref, distance to next contig, length of this contig, alignment coverage, identity, orientation, and ID