Hi,
I am using Illumina paired-end data (read length 250 bp) for producing a bam file using BWA and Samtools. As BWA does not support read length greater than 200 bp. I had to use BWA-MEM in this case with some testing parameters (such as gap opening penalty 30, gap extension penalty 5, etc.). However, one of my colleagues used BWA from Miseq machine and got a totally different alignment and mapping.
As I am trying to replicate her work process, I thought of using the same parameters as she used. However, when I went through Miseq manual, it appears that Miseq-BWA automatically adjusts parameters based on read lengths and error rates, and then estimates insert size distribution (MiSeq manual page 22: BWA). From the bam file, from Miseq-BWA, we found that the genome has some deletions and this is actually authenticated by Sanger sequencing. So in this case, I am more or less sure that MiSeq-BWA is doing the right thing. However, the bam file created by my pipeline using BWA-MEM does not show the same result.
Now my questions are:
1. Is Miseq-BWA and the available BWA tools differently implemented? If not, then which one I should be using, BWA-SW or BWA-MEM?
2. Is there any way of adjusting parameters based on read lengths and error rates, and then estimating insert size distribution? If I do not specify any parameter, BWA should be taking the default one, I guess.
I know the questions are quite broad. But it would be great help if anyone has some advice for me.
I am using Illumina paired-end data (read length 250 bp) for producing a bam file using BWA and Samtools. As BWA does not support read length greater than 200 bp. I had to use BWA-MEM in this case with some testing parameters (such as gap opening penalty 30, gap extension penalty 5, etc.). However, one of my colleagues used BWA from Miseq machine and got a totally different alignment and mapping.
As I am trying to replicate her work process, I thought of using the same parameters as she used. However, when I went through Miseq manual, it appears that Miseq-BWA automatically adjusts parameters based on read lengths and error rates, and then estimates insert size distribution (MiSeq manual page 22: BWA). From the bam file, from Miseq-BWA, we found that the genome has some deletions and this is actually authenticated by Sanger sequencing. So in this case, I am more or less sure that MiSeq-BWA is doing the right thing. However, the bam file created by my pipeline using BWA-MEM does not show the same result.
Now my questions are:
1. Is Miseq-BWA and the available BWA tools differently implemented? If not, then which one I should be using, BWA-SW or BWA-MEM?
2. Is there any way of adjusting parameters based on read lengths and error rates, and then estimating insert size distribution? If I do not specify any parameter, BWA should be taking the default one, I guess.
I know the questions are quite broad. But it would be great help if anyone has some advice for me.
Comment