![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
BWA and sub-optimal alignments | LauraR | Bioinformatics | 0 | 02-10-2012 08:46 AM |
multiple alignments with BWA | efoss | Bioinformatics | 0 | 08-07-2011 12:37 PM |
BWA: suboptimal alignments | ElMichael | Bioinformatics | 2 | 03-29-2011 02:31 PM |
merging bwa alignments | arnkas | Genomic Resequencing | 5 | 07-07-2010 08:26 AM |
BWA alignments and time | cggj | Bioinformatics | 5 | 05-13-2010 11:41 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Houston, TX Join Date: Aug 2010
Posts: 10
|
![]()
Hi All,
When BWA aligns a read that spans over two chromosomes, it sets the flag as "unmapped" but leaves mapping quality (MAPQ) and CIGAR unchanged. Picard's validator tool ValidateSamFile throws a validation error for such reads because it expects an unmapped read to have MAPQ of zero, and CIGAR value of "*". I would like to understand the generally acceptable way of handling this type of reads. Should I modify these reads to have MAPQ 0 and CIGAR value of * ? I would appreciate any suggestions and best practices to follow for such reads. Thanks in advance. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Most people turn the stringency down on Picard to lenient. It'll yell at you about those reads, but it will go through. You'll want to do that with every Picard command.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Houston, TX Join Date: Aug 2010
Posts: 10
|
![]()
Hi swbarnes2,
Yes, we can have picard continue through these errors by the lenient stringency. But I was wondering what would be the correct way to handle the reads themselves so that no other downstream tool (not necessarily picard) will choke. |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
I guess what he wants to know is if there is anyone out there that is doing more than ignoring those cases. Maybe patching BWA or rewriting the fields in the BAM. I can see the big sequencing centers doing something like before submitting their data.
__________________
-drd |
![]() |
![]() |
![]() |
#5 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
It would be very interesting to understand what proportion of these type of reads are generated by chimerism introduced in the library prep steps, versus biologically significant events.
My guess is that the fraction caused by the library prep is fairly high, and people just haven't understood that yet because 1) the read lengths haven't been long enough to map both halves, and 2) most people are already overwhelmed by the reads they can map so throwing out a few % that map ambiguously is pretty common practice. Cool stuff. |
![]() |
![]() |
![]() |
#6 |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]()
Spanning two chromosomes in BWA (short-read) comes from concatenating all chromosomes into one long sequence for use with the BWT and FM-index. So the case where a read is unmapped but has a chr/pos occurs when an alignment spans a chr/chr junction, where the two chromosomes are consecutive. This is different from the read spanning two random locations in the genome on different chromosomes (inter-chromosomal translocations etc.). The BWA (long-read) algorithm will actually produce the latter on longer reads (>100bp), giving two non-overlapping mappings for a read.
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
Now that I think more about it, BWA is following the SAM spec. If the unmapped flag is set, it doesn't matter what you have in the other fields. Downstream tools should take that into account.
So I'd suggest you tell picard to don't report those warnings, or just ignore them. I guess that is what other people do.
__________________
-drd |
![]() |
![]() |
![]() |
#8 |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]()
The maintainers of the spec and the Picard developers have nonetheless decided to call it invalid. Better to patch BWA.
|
![]() |
![]() |
![]() |
#9 |
Member
Location: Houston, TX Join Date: Aug 2010
Posts: 10
|
![]()
I wrote a simple tool to handle this based on the API provided by Picard. Basically, it is an extension of Picard's CleanSam.jar with the change that if a read has unmapped flag, CIGAR is set to * and mapping quality to zero.
If anyone is interested in looking at it / using it, I can have it available in a public place. |
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: northeast Join Date: Apr 2009
Posts: 8
|
![]()
Hi nirav99,
Could you post or make your code available? I think that I am having this problem on the GATK GenomeAnalysisTK.jar -T IndelRealigner command, which doesn't allow the VALIDATION_STRINGENCY=LENIENT parameter. Thanks, Vince |
![]() |
![]() |
![]() |
#11 |
Member
Location: Houston, TX Join Date: Aug 2010
Posts: 10
|
![]()
Hi Vince,
Download and build the java code at https://github.com/nirav99/Illumina-.../java/FixCIGAR This should help you. Nirav |
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: LA USA Join Date: Aug 2011
Posts: 2
|
![]()
Hi Nirav
I would also like to fix my reads to have MAPQ 0 and CIGAR value of * using your FixCIGAR tool. however Java is so overwhelming for me ![]() i also tried filtering only the mapped reads using ViewSam.jar ALIGNMENT_STATUS=Aligned but it didn't work, don't these type of reads have a flag of 4? Thank you! Daphna |
![]() |
![]() |
![]() |
#13 |
Member
Location: Houston, TX Join Date: Aug 2010
Posts: 10
|
![]()
Hi Daphna,
I am assuming that you are running on Unix/ Linux and you have the latest java compiler. If so, you can follow these steps : 1) Download and untar / (unzip) latest Picard. We need two Jar files sam-*.jar and picard-*.jar. 2) Download the files from the FixCIGAR directory 3) Open GenerateJar.sh and edit the picardPath variable to the location where you have installed the picard library. You should point to the directory path where sam-*.jar and picard-*.jar are located. 4) Run GenerateJar.sh, it should build the tool for you. 5) It will create a .jar file FixCIGAR.jar. You can run it using the command java -jar FixCIGAR.jar INPUT=bamToFix or java -jar FixCIGAR.jar INPUT=bamToFix OUTPUT=FixedBAM |
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: LA USA Join Date: Aug 2011
Posts: 2
|
![]()
Thank you so much for your help!
|
![]() |
![]() |
![]() |
Tags |
align, boundary, bwa, chromosome, unmapped |
Thread Tools | |
|
|