SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to merge multiple sequencing runs vinay052003 Bioinformatics 4 01-31-2012 04:34 AM
a question about merge bam files camelbbs Bioinformatics 2 10-24-2011 10:00 AM
How can I remove singletons from a *.bam-file? Azazel Bioinformatics 0 06-22-2011 11:58 PM
Picard - MakeDuplicates (remove pcr duplicates) dmb Bioinformatics 2 03-16-2011 08:56 AM
Use of IMAGE for gap filling using SOLiD runs. kasutubh Bioinformatics 1 05-07-2010 01:43 AM

Reply
 
Thread Tools
Old 12-10-2010, 03:25 AM   #1
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default merge BAM files from 2 SOLiD PE runs and remove duplicates

I mapped two runs of SOLiD paired end reads with BFAST, converted into BAMs, merged them and now want to remove duplicates using Picard, but it gives the following error:

[Fri Dec 10 11:32:00 CET 2010] net.sf.picard.sam.MarkDuplicates done.
Runtime.totalMemory()=18422956032
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: null:656_1000_1619
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:273)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:113)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:97)

I guess the problem is that the same read ID (656_1000_1619) was present in both runs. How can I tell Picard that it's from different runs? Would it help to add read group information before merging? Or would I need to make unique IDs - and how to do that in a BAM file?! (I'll also be given BAM files from the BioScope pipeline so no way for me to change anything before that.)

Thanks in advance for any useful suggestions.

Barbara
epigen is offline   Reply With Quote
Old 12-10-2010, 01:02 PM   #2
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

You want to add the read group information before merging. Bfast has that option in postprocessing. You can also try with picard.
__________________
-drd
drio is offline   Reply With Quote
Old 01-03-2011, 06:37 AM   #3
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

Quote:
Originally Posted by drio View Post
You want to add the read group information before merging. Bfast has that option in postprocessing. You can also try with picard.
Thanks and a happy new year!
But how to do add the read group to a BAM file? The methods mentioned here
http://seqanswers.com/forums/showthread.php?p=26635
don't work for me and I couldn't find out which Picard tool to use.
epigen is offline   Reply With Quote
Old 01-03-2011, 07:16 AM   #4
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Explore bfast's -r postprocessing option:

Code:
-r        RGFileName    Specifies to add the RG in the specified file to the SAM
                          header and updates the RG tag (and LB/PU tags if present) in
                          the reads (SAM only)
__________________
-drd
drio is offline   Reply With Quote
Old 01-03-2011, 07:59 AM   #5
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

Quote:
Originally Posted by drio View Post
Explore bfast's -r postprocessing option
That's very nice, but (1) I don't want to run postprocess again on all my files and (2) I also have BAM files from BioScope to deal with.
For some test data samtools merge worked, although not in an optimal way since it infers the RG tag from the BAM file name. At least this option would kill two birds with one stone: merging and adding RG in one go.
epigen is offline   Reply With Quote
Old 01-03-2011, 11:44 AM   #6
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Explore samtools reheader and picard's ReplaceSamHeader (command line tools).
Check if they modify the RG field once you add your new header (with your RG).
I don't think they'll change it. In that case, you will have to use picard (or samtools lib
in any of their flavours) to iterate over all your reads and changes the RG field manually.
__________________
-drd
drio is offline   Reply With Quote
Old 01-04-2011, 07:04 AM   #7
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

Thank you again. I'll use samtools merge and try if Picard works with the RG tag. (If not, I'll go through the BAM file and add unique read names and RG.) For future BFAST aligments I'll surely use the postprocess -r option.
epigen is offline   Reply With Quote
Old 01-13-2011, 12:26 PM   #8
Milan Radovich
Junior Member
 
Location: Indianapolis

Join Date: Oct 2009
Posts: 8
Default

I had the same issue. With BioScope produced bam files, using Picard to merge the .bam files works while using samtools does not.
Milan Radovich is offline   Reply With Quote
Old 01-19-2011, 04:03 AM   #9
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

BioScope produced bam files have RG information in the header and Picard keeps it there. I don't know what samtools does, it might ignore it unless you use the -r and -h options (which will change the RG tag). At least samtools merge with -r and -h worked to solve my problem with the BFAST files.
epigen is offline   Reply With Quote
Old 03-22-2012, 02:45 AM   #10
dnusol
Senior Member
 
Location: Spain

Join Date: Jul 2009
Posts: 133
Default

Hi, sorry to get active this old thread. I am having the same problem with my BWA illumina aligned reads when trying to remove duplicates but only with some specific files. I don't think is a problem of the headers because the files that do work have the same headers as files that do not:

@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:1 LN:249250621
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:M LN:16571
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@PG ID:bwa PN:bwa VN:0.6.1-r104


Any suggestions?

Also, I am just wondering, if I want to merge bam files from different lanes and/or runs, do they have to have different RG headers? could I merge two bams with the headers above?

Thanks for any help

D.
dnusol is offline   Reply With Quote
Old 03-22-2012, 11:57 AM   #11
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

You need a @RG line in the header for each single BAM file to merge then with Picard. You can supply it to bwa sampe as an option. If you missed to do that, there is now a Picard tool "add or replace read groups".
epigen is offline   Reply With Quote
Old 03-23-2012, 01:39 AM   #12
dnusol
Senior Member
 
Location: Spain

Join Date: Jul 2009
Posts: 133
Default

Thanks epigen,

can @RG lines be included during merging using -r -h options as in the example in

http://sourceforge.net/apps/mediawik...rged_alignment


I guess that would save me from rerunning sampe samse commands
dnusol is offline   Reply With Quote
Old 03-23-2012, 02:55 AM   #13
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

That is more or less what I did before using the new Picard tool. The -r option will automatically create the RG tag for each read, the name is inferred from the BAM file name so that in the example

samtools merge -rh rg.txt [merged.bam] ga.bam 454.bam

these will be ga and 454, respectively. If your BAM files do not have such nice names, you can trick samtools merge with setting symbolic links like that:

ln -s your_long_named_file.bam desired_rg_name.bam
(same for the BAM index file)

and then use that desired_rg_name.bam files as input for samtools merge
epigen is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO