SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA and sub-optimal alignments LauraR Bioinformatics 0 02-10-2012 08:46 AM
multiple alignments with BWA efoss Bioinformatics 0 08-07-2011 12:37 PM
BWA: suboptimal alignments ElMichael Bioinformatics 2 03-29-2011 02:31 PM
merging bwa alignments arnkas Genomic Resequencing 5 07-07-2010 08:26 AM
BWA alignments and time cggj Bioinformatics 5 05-13-2010 11:41 AM

Reply
 
Thread Tools
Old 12-15-2010, 09:26 AM   #1
nirav99
Member
 
Location: Houston, TX

Join Date: Aug 2010
Posts: 10
Smile Handling alignments spanning chromosomes boundaries in BWA

Hi All,

When BWA aligns a read that spans over two chromosomes, it sets the flag as "unmapped" but leaves mapping quality (MAPQ) and CIGAR unchanged.

Picard's validator tool ValidateSamFile throws a validation error for such reads because it expects an unmapped read to have MAPQ of zero, and CIGAR value of "*".

I would like to understand the generally acceptable way of handling this type of reads. Should I modify these reads to have MAPQ 0 and CIGAR value of * ?

I would appreciate any suggestions and best practices to follow for such reads.

Thanks in advance.
nirav99 is offline   Reply With Quote
Old 12-15-2010, 10:24 AM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Most people turn the stringency down on Picard to lenient. It'll yell at you about those reads, but it will go through. You'll want to do that with every Picard command.
swbarnes2 is offline   Reply With Quote
Old 12-15-2010, 10:55 AM   #3
nirav99
Member
 
Location: Houston, TX

Join Date: Aug 2010
Posts: 10
Default

Hi swbarnes2,

Yes, we can have picard continue through these errors by the lenient stringency. But I was wondering what would be the correct way to handle the reads themselves so that no other downstream tool (not necessarily picard) will choke.
nirav99 is offline   Reply With Quote
Old 12-15-2010, 01:16 PM   #4
drio
Senior Member
 
Location: 41°17'49"N / 2°4'42"E

Join Date: Oct 2008
Posts: 323
Default

I guess what he wants to know is if there is anyone out there that is doing more than ignoring those cases. Maybe patching BWA or rewriting the fields in the BAM. I can see the big sequencing centers doing something like before submitting their data.
__________________
-drd
drio is offline   Reply With Quote
Old 12-15-2010, 01:26 PM   #5
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

It would be very interesting to understand what proportion of these type of reads are generated by chimerism introduced in the library prep steps, versus biologically significant events.

My guess is that the fraction caused by the library prep is fairly high, and people just haven't understood that yet because 1) the read lengths haven't been long enough to map both halves, and 2) most people are already overwhelmed by the reads they can map so throwing out a few % that map ambiguously is pretty common practice.

Cool stuff.
ECO is offline   Reply With Quote
Old 12-15-2010, 01:58 PM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Spanning two chromosomes in BWA (short-read) comes from concatenating all chromosomes into one long sequence for use with the BWT and FM-index. So the case where a read is unmapped but has a chr/pos occurs when an alignment spans a chr/chr junction, where the two chromosomes are consecutive. This is different from the read spanning two random locations in the genome on different chromosomes (inter-chromosomal translocations etc.). The BWA (long-read) algorithm will actually produce the latter on longer reads (>100bp), giving two non-overlapping mappings for a read.
nilshomer is offline   Reply With Quote
Old 12-16-2010, 09:36 AM   #7
drio
Senior Member
 
Location: 41°17'49"N / 2°4'42"E

Join Date: Oct 2008
Posts: 323
Default

Now that I think more about it, BWA is following the SAM spec. If the unmapped flag is set, it doesn't matter what you have in the other fields. Downstream tools should take that into account.

So I'd suggest you tell picard to don't report those warnings, or just ignore them. I guess that is what other people do.
__________________
-drd
drio is offline   Reply With Quote
Old 12-16-2010, 09:11 PM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

The maintainers of the spec and the Picard developers have nonetheless decided to call it invalid. Better to patch BWA.
nilshomer is offline   Reply With Quote
Old 01-11-2011, 12:49 PM   #9
nirav99
Member
 
Location: Houston, TX

Join Date: Aug 2010
Posts: 10
Default

I wrote a simple tool to handle this based on the API provided by Picard. Basically, it is an extension of Picard's CleanSam.jar with the change that if a read has unmapped flag, CIGAR is set to * and mapping quality to zero.

If anyone is interested in looking at it / using it, I can have it available in a public place.
nirav99 is offline   Reply With Quote
Old 03-23-2011, 08:11 AM   #10
vschulz
Junior Member
 
Location: northeast

Join Date: Apr 2009
Posts: 8
Default Handling alignments spanning chromosomes boundaries in BWA

Hi nirav99,

Could you post or make your code available? I think that I am having this problem on the GATK GenomeAnalysisTK.jar -T IndelRealigner command, which doesn't allow the VALIDATION_STRINGENCY=LENIENT parameter.

Thanks,

Vince
vschulz is offline   Reply With Quote
Old 03-24-2011, 01:51 PM   #11
nirav99
Member
 
Location: Houston, TX

Join Date: Aug 2010
Posts: 10
Default

Hi Vince,

Download and build the java code at https://github.com/nirav99/Illumina-.../java/FixCIGAR

This should help you.

Nirav
nirav99 is offline   Reply With Quote
Old 08-07-2011, 06:52 AM   #12
daphnawv
Junior Member
 
Location: LA USA

Join Date: Aug 2011
Posts: 2
Default

Hi Nirav
I would also like to fix my reads to have MAPQ 0 and CIGAR value of * using your FixCIGAR tool. however Java is so overwhelming for me could you please explain in more detail how to build your script and show an example of a command line?

i also tried filtering only the mapped reads using ViewSam.jar ALIGNMENT_STATUS=Aligned but it didn't work, don't these type of reads have a flag of 4?

Thank you!
Daphna
daphnawv is offline   Reply With Quote
Old 08-12-2011, 08:23 AM   #13
nirav99
Member
 
Location: Houston, TX

Join Date: Aug 2010
Posts: 10
Default

Hi Daphna,

I am assuming that you are running on Unix/ Linux and you have the latest java compiler. If so, you can follow these steps :

1) Download and untar / (unzip) latest Picard. We need two Jar files sam-*.jar and picard-*.jar.

2) Download the files from the FixCIGAR directory

3) Open GenerateJar.sh and edit the picardPath variable to the location where you have installed the picard library. You should point to the directory path where sam-*.jar and picard-*.jar are located.

4) Run GenerateJar.sh, it should build the tool for you.

5) It will create a .jar file FixCIGAR.jar. You can run it using the command

java -jar FixCIGAR.jar INPUT=bamToFix

or

java -jar FixCIGAR.jar INPUT=bamToFix OUTPUT=FixedBAM
nirav99 is offline   Reply With Quote
Old 08-14-2011, 01:48 AM   #14
daphnawv
Junior Member
 
Location: LA USA

Join Date: Aug 2011
Posts: 2
Default

Thank you so much for your help!
daphnawv is offline   Reply With Quote
Reply

Tags
align, boundary, bwa, chromosome, unmapped

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO