I'm trying to align pe reads to our reference genome (Tribolium), which has a couple thousand unmapped contigs. The reference has 10 chromosomes , each around 10 - 30 Mb in size, while the rest of the contigs are around 5kb in size.
I ran this command on the reference with unknown contigs:
and without the unknown contains:
Why is the insert size distribution different between these alignments? How can the insert size distribution be so large for the alignment that contains 2000+ contigs?
I ran this command on the reference with unknown contigs:
bwa mem -k 5 -t 10 -Ma Ref.fa reads_1.fastq reads_2.fastq > map.sam
[M::main_mem] read 1164592 sequences (100000019 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (23, 23, 12, 19)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (2763, 6434, 8256)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 19242)
[M::mem_pestat] mean and std.dev: (5558.52, 3049.04)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 24735)
[M::main_mem] read 1164592 sequences (100000019 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (23, 23, 12, 19)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (2763, 6434, 8256)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 19242)
[M::mem_pestat] mean and std.dev: (5558.52, 3049.04)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 24735)
bwa mem -k 5 -t 10 -Ma Ref_noUnknown.fa reads_1.fastq reads_2.fastq > map_noUnknown.sam
[M::main_mem] read 1164592 sequences (100000019 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (64, 83, 49, 61)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (69, 198, 546)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1500)
[M::mem_pestat] mean and std.dev: (244.98, 238.31)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1977)
[M::main_mem] read 1164592 sequences (100000019 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (64, 83, 49, 61)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (69, 198, 546)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1500)
[M::mem_pestat] mean and std.dev: (244.98, 238.31)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1977)
Comment