I am attempting to speed up the sam to bam conversion of a whole genome (paired end) alignment with Samtools0-1.19. Alignment was done with bwa-0.7.7 mem algorithm
Commands:
samtools view -bS -o .bam pe.sam (default)
samtools view -bS -@ 10 -m 2G -o .bam pe.sam (threaded)
Comparing the output .bam files there is a 0.4G difference in file size.
I ran samtools flagstat on both bam files.
Differences:
6,026,490 QC passed reads
6,026,490 paired in sequencing
779,134 read 1
5,247,356 read 2
all other metrics are identical
Can anyone explain why threading would give less reads on the same input .sam file? I assume it may have to do with the merging of thread data?
Is there a way to correct this issue without losing the speed increase provided by threading? (Currently without threading conversion takes 6 hours. With threading Conversion takes 1 hour 15 minutes)
Commands:
samtools view -bS -o .bam pe.sam (default)
samtools view -bS -@ 10 -m 2G -o .bam pe.sam (threaded)
Comparing the output .bam files there is a 0.4G difference in file size.
I ran samtools flagstat on both bam files.
Differences:
6,026,490 QC passed reads
6,026,490 paired in sequencing
779,134 read 1
5,247,356 read 2
all other metrics are identical
Can anyone explain why threading would give less reads on the same input .sam file? I assume it may have to do with the merging of thread data?
Is there a way to correct this issue without losing the speed increase provided by threading? (Currently without threading conversion takes 6 hours. With threading Conversion takes 1 hour 15 minutes)
Comment