SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa mem -R option dakin Bioinformatics 2 05-02-2013 07:42 AM
why can't I get alignment with BWA? gigigou Bioinformatics 11 09-12-2012 11:43 PM
BWA produces odd alignment results dandyrilla Bioinformatics 2 11-27-2011 11:28 PM
velveth_de running time and mem Ori Bioinformatics 10 11-03-2011 04:39 PM
BWA odd behaviors jmartin Bioinformatics 4 04-07-2010 07:53 AM

Reply
 
Thread Tools
Old 05-13-2013, 01:18 PM   #1
myronpeto
Member
 
Location: Portland, OR

Join Date: Sep 2011
Posts: 10
Default Odd bwa-mem alignment

I'm trying to align a whole genome sample using bwa-mem. I've previously aligned using bwa aln and bwa sampe and it went through fine. The command I'm using is:

bwa mem -t 8 -R '@RG\tID:Label\tLB:Label\tSM:Label' reference_v37.fasta read_1.fq.gz read_2.fq.gz > Sample.sam

Followed by:

samtools import reference_v37.fasta.fai Sample.sam Sample_unsorted.bam
samtools sort Sample_unsorted.bam Sample_sorted

The error I got from picard MarkDuplicates was the following:

Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: 0749_7455:HS2000-1111A_168:5:1203:11571:84654
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:294)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)



I found the offending reads from the bam file:

HS2000-1111A_168:5:1203:11571:84654 81 2 243164263 1 100M 1 231669 0 AGCCTATGTGATGACTACATGTCGTGCGGGATCCTGGATGGGATCCTGGGTCAGAGTAAGATAGAACTAAGGGAATCCAAATGAAATATGAACTTTAGTT DDDDDCCCCDEEDDCEDDB?CDEE8IJHFBIJIGBBJIHDHGHGF=JIIG@HIJIJJGJJIGDIHEB?IGIHIHEEGJJJHEGHEG>DHFDADDBFD@C@ NM:i:0 AS:i:100 XS:i:95 RG:Z:0749_7455
HS2000-1111A_168:5:1203:11571:84654 161 1 231669 0 51M49S 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:51 XS:i:51 RG:Z:0749_7455 XP:Z:1,+231860,51S49M,0,0;
HS2000-1111A_168:5:1203:11571:84654 161 1 231860 0 51S49M 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:49 XS:i:49 RG:Z:0749_7455 XP:Z:1,+231669,51M49S,0,0;

It seems that bwa-mem aligned the second of the pair twice. In addition, I got an insert size of zero. From a post a few years back (somewhere) I read that this indicates the alignment is unpaired or the mate reference ID is invalid.

I've tried using FixMateInformation, which shuffled some of the information around but still left me with two mappings of what appears to be the same read.

Anyone able to shed some light on this? I did a search but found very few threads related to bwa-mem, as it's a relatively new tool.

Myron Peto

Last edited by myronpeto; 05-13-2013 at 01:21 PM. Reason: Minor formatting issue.
myronpeto is offline   Reply With Quote
Old 05-13-2013, 03:38 PM   #2
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

The 2nd read cannot be aligned as a whole, if, for example, it contains a long deletion or bridges a break point due to reference assembly or structural variations. Bwa-mem gives you the opportunity to identify such events. You can use "bwa mem -M". As the command line help says, this option is for picard/gatk compatibility.
lh3 is offline   Reply With Quote
Old 05-24-2013, 02:31 PM   #3
myronpeto
Member
 
Location: Portland, OR

Join Date: Sep 2011
Posts: 10
Default

Thanks for the response. I tried your suggestion and although it took a while to figure it out (30X whole genome samples) it did work. I'll know for future reference to use that parameter.
myronpeto is offline   Reply With Quote
Old 07-17-2013, 12:23 PM   #4
bwubb
Member
 
Location: Philadelphia

Join Date: Jan 2012
Posts: 58
Default

I have been struggling with this error.

I re-aligned with bwa mem using -M option. I assigned readgroups for each sequencing run before merging files. I still receive the 'Value was put into PairInfoMap more than once.' error

I need to take a look at the reads themselves still, but Im in the process adding lane number info into the RGID field to distinguish reads from different lanes of the same sequencing run.

If that does not work I am not sure what else I can try. Any thoughts? Thank you.
bwubb is offline   Reply With Quote
Old 07-17-2013, 12:47 PM   #5
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

You can drop all the secondary alignments in the file (also use the latest version). If that does not work, please show the part of SAM that causes the problem. Thanks.
lh3 is offline   Reply With Quote
Old 07-17-2013, 02:39 PM   #6
bwubb
Member
 
Location: Philadelphia

Join Date: Jan 2012
Posts: 58
Default

I dont think I have any. I used -M with bwa mem and I did not use -a to output all alignments.

Here is an example:
Code:
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  1: KTR308-FGC0298:HWI-ST970:298:C0MUAACXX:5:1310:1225:35905


HWI-ST970:298:C0MUAACXX:5:1310:1225:35905       97      1       63484   0       100M    2       65480858        0       GTGATTTTCCTCGGGTTATTAAACTTGCATGCATGGACACTTATGGGCTAGAATTTGTGGTCACTGCCAACAGTGGATTCATATCGATGGGCACCTTCTT  @@@ADDFFHHHHHDGCFB@FGHEGCHIIIIIGGHIIIFFAFCHGCII@;FHIIHHGGFGGCHEHIGHI@DGEHHC@>B?@?BCAA?=A>@?==<?CCDDC  RG:Z:KTR308-FGC0298     NM:i:0  AS:i:100        XS:i:100
HWI-ST970:298:C0MUAACXX:5:1310:1225:35905       177     1       63638   2       100M    2       120882793       0       AAATGATTTATCCAAAGCATTCTTCACTTCGTCGGCTCACATCACCGTAGTGGTTTTGTTTTTTGCTCCATGCATGTTTCTCTACGTGTGGCCTTTCCCT  >:CA4DCDDCACAA:>@CACCA@:>8B=;DBB@CCFEB?BEHHEBGHGECG@DF@HIHGJIHCGGEFECIHGBGF?C<EEAF>?EA<<+<:C;?DBD1@?  RG:Z:KTR308-FGC0298     NM:i:0  AS:i:100        XS:i:95

picard-tools v 1.94

I see they recently came out with 1.95 but I suspect I am not running into a bug. Ive analyzed some of this data a while back without issue. Only recently with additional sequencing data included.

Last edited by bwubb; 07-17-2013 at 02:41 PM.
bwubb is offline   Reply With Quote
Old 07-17-2013, 04:43 PM   #7
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

I think your read file contains reads with identical names. You should check that first.
lh3 is offline   Reply With Quote
Old 07-17-2013, 05:53 PM   #8
bwubb
Member
 
Location: Philadelphia

Join Date: Jan 2012
Posts: 58
Default

I do not understand how that could have happened though...
bwubb is offline   Reply With Quote
Old 07-17-2013, 06:41 PM   #9
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

The two reads you are showing have different sequences. They cannot be the same read.
lh3 is offline   Reply With Quote
Old 07-18-2013, 06:40 AM   #10
bwubb
Member
 
Location: Philadelphia

Join Date: Jan 2012
Posts: 58
Default

Yes that is the part I am confused about. I have checked the fastq files for that sample and run. There is only 1 read with that ID in the fastq, but two come up in the resulting sam file.

Perhaps I have set up my bwa mem command wrong?

Code:
bwa mem -M -t 16 ~/human_g1k_v37.fasta KTR308_FGC0298_5_1_GTCCGC.fastq.gz KTR308_FGC0298_5_2_GTCCGC.fastq.gz > DATA/KTR308.7.FGC0298.GTCCGC.sam
bwubb is offline   Reply With Quote
Old 07-18-2013, 06:52 AM   #11
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You should get two reads in the SAM file with the same name, one from each of the fastq files.
dpryan is offline   Reply With Quote
Old 07-18-2013, 06:57 AM   #12
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Quote:
Originally Posted by dpryan View Post
You should get two reads in the SAM file with the same name, one from each of the fastq files.
I am also thinking about this. Apparently bwubb's two read files are not paired together. John has provided a pull request to check read names, such that the users may know their input is wrong. Unfortunately, it is not in the released package.
lh3 is offline   Reply With Quote
Old 07-18-2013, 06:59 AM   #13
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by lh3 View Post
I am also thinking about this. Apparently bwubb's two read files are not paired together. John has provided a pull request to check read names, such that the users may know their input is wrong. Unfortunately, it is not in the released package.
Ah, that'll cause no end of problems.
dpryan is offline   Reply With Quote
Old 07-18-2013, 07:07 AM   #14
bwubb
Member
 
Location: Philadelphia

Join Date: Jan 2012
Posts: 58
Default

Have I done something wrong in the pairing?

I was use to using bwa aln to align each fastq and then use sampe to do the pairing. Once I ran into the picard-tools error, I went back and used bwa mem -M instead, but still am receiving the error.

If Im following correctly, I have the appropriate number of reads in my files. Am I missing an option or step for proper pairing?

After bwa mem alignment I import to bam and then sort, before Read groups and merge.

Edit:
I guess I should ask how the pairing is displayed in the sam files resulting from bwa mem. I do not know how to identify if my pairing is proper.

Last edited by bwubb; 07-18-2013 at 07:37 AM.
bwubb is offline   Reply With Quote
Old 07-18-2013, 10:15 AM   #15
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

The i-th read in your first file and the i-th read in the second must be in a read pair, i.e. they have the same read name.
lh3 is offline   Reply With Quote
Old 10-26-2013, 08:46 AM   #16
g781
Member
 
Location: Taiwan

Join Date: May 2010
Posts: 25
Default

Hi everyone,

I have the same problem as below:
Code:
SEA055486C:139:C06FAACXX:6:2207:3530:3130       83      chrVII  675196  60      59M42S  =       675159  -96     TTGTTCGTGATGCTTCCGAAATTCCCGAAGGTTGCGTATTGCAATCTGTTAACCCAGAATCAGCCAGAACAAGAAGGACAAAGTGATCAAGACATAGGGAA  C>59<DCCDCDDDDB?CCCCCAB9,.HHHJJJIGIJIJIIJIGGE@FGIGHDEGEHFHGEGCG<EHDHHIIJIIHEHDGHHHGGGIGGFHHHFFDDFDCC@   NM:i:1  AS:i:54 XS:i:0  XP:Z:chrXVI,-908080,52S49M,60,0;
SEA055486C:139:C06FAACXX:6:2207:3530:3130       83      chrXVI  908080  60      52S49M  chrVII  675159  0       TTGTTCGTGATGCTTCCGAAATTCCCGAAGGTTGCGTATTGCAATCTGTTAACCCAGAATCAGCCAGAACAAGAAGGACAAAGTGATCAAGACATAGGGAA  C>59<DCCDCDDDDB?CCCCCAB9,.HHHJJJIGIJIJIIJIGGE@FGIGHDEGEHFHGEGCG<EHDHHIIJIIHEHDGHHHGGGIGGFHHHFFDDFDCC@   NM:i:0  AS:i:49 XS:i:0  XP:Z:chrVII,-675196,59M42S,60,1;
SEA055486C:139:C06FAACXX:6:2207:3530:3130       163     chrVII  675159  60      96M5S   =       675196  96      ATTGTCTCGTTGATCAAGGCCATCGACGAAGTCACTGTTGTTCGTGATGCTTCCGAAATTCCAGAAGGTTGCGTATTGCAATCTGTTAACCCAGAATCAGC  B@@DBDDFHBFHDGIGGIIGHGGDCGIGEHH?BGHICGGHIIGI?FEGGEFA@FGGGEHHHGHEDEFF>AE@BB?BDDDDDDDD@DDCE>CBBDBDA@CCD   NM:i:0  AS:i:96 XS:i:0
The read marked in red was aligned on different chromosomes. I have no idea what's happened to this case. Is it caused by the split algorithm?

Thanks.
g781 is offline   Reply With Quote
Old 03-12-2015, 11:40 AM   #17
shuoguo
Member
 
Location: Memphis

Join Date: Sep 2012
Posts: 23
Default

I had the same error when I use an older version of picard (1.45). After downloading the newest picard (1.129) I no longer have the issue. Even without the -M option while mapping with bwa mem
shuoguo is offline   Reply With Quote
Reply

Tags
bwa mem illumina

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO