![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to merge multiple sequencing runs | vinay052003 | Bioinformatics | 4 | 01-31-2012 04:34 AM |
a question about merge bam files | camelbbs | Bioinformatics | 2 | 10-24-2011 10:00 AM |
How can I remove singletons from a *.bam-file? | Azazel | Bioinformatics | 0 | 06-22-2011 11:58 PM |
Picard - MakeDuplicates (remove pcr duplicates) | dmb | Bioinformatics | 2 | 03-16-2011 08:56 AM |
Use of IMAGE for gap filling using SOLiD runs. | kasutubh | Bioinformatics | 1 | 05-07-2010 01:43 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Germany Join Date: May 2010
Posts: 101
|
![]()
I mapped two runs of SOLiD paired end reads with BFAST, converted into BAMs, merged them and now want to remove duplicates using Picard, but it gives the following error:
[Fri Dec 10 11:32:00 CET 2010] net.sf.picard.sam.MarkDuplicates done. Runtime.totalMemory()=18422956032 Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: null:656_1000_1619 at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124) at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78) at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61) at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:273) at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:113) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:97) I guess the problem is that the same read ID (656_1000_1619) was present in both runs. How can I tell Picard that it's from different runs? Would it help to add read group information before merging? Or would I need to make unique IDs - and how to do that in a BAM file?! (I'll also be given BAM files from the BioScope pipeline so no way for me to change anything before that.) Thanks in advance for any useful suggestions. Barbara |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
You want to add the read group information before merging. Bfast has that option in postprocessing. You can also try with picard.
__________________
-drd |
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: Germany Join Date: May 2010
Posts: 101
|
![]() Quote:
But how to do add the read group to a BAM file? The methods mentioned here http://seqanswers.com/forums/showthread.php?p=26635 don't work for me and I couldn't find out which Picard tool to use. |
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
Explore bfast's -r postprocessing option:
Code:
-r RGFileName Specifies to add the RG in the specified file to the SAM header and updates the RG tag (and LB/PU tags if present) in the reads (SAM only)
__________________
-drd |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Germany Join Date: May 2010
Posts: 101
|
![]()
That's very nice, but (1) I don't want to run postprocess again on all my files and (2) I also have BAM files from BioScope to deal with.
For some test data samtools merge worked, although not in an optimal way since it infers the RG tag from the BAM file name. At least this option would kill two birds with one stone: merging and adding RG in one go. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
Explore samtools reheader and picard's ReplaceSamHeader (command line tools).
Check if they modify the RG field once you add your new header (with your RG). I don't think they'll change it. In that case, you will have to use picard (or samtools lib in any of their flavours) to iterate over all your reads and changes the RG field manually.
__________________
-drd |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Germany Join Date: May 2010
Posts: 101
|
![]()
Thank you again. I'll use samtools merge and try if Picard works with the RG tag. (If not, I'll go through the BAM file and add unique read names and RG.) For future BFAST aligments I'll surely use the postprocess -r option.
|
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: Indianapolis Join Date: Oct 2009
Posts: 8
|
![]()
I had the same issue. With BioScope produced bam files, using Picard to merge the .bam files works while using samtools does not.
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Germany Join Date: May 2010
Posts: 101
|
![]()
BioScope produced bam files have RG information in the header and Picard keeps it there. I don't know what samtools does, it might ignore it unless you use the -r and -h options (which will change the RG tag). At least samtools merge with -r and -h worked to solve my problem with the BFAST files.
|
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Spain Join Date: Jul 2009
Posts: 133
|
![]()
Hi, sorry to get active this old thread. I am having the same problem with my BWA illumina aligned reads when trying to remove duplicates but only with some specific files. I don't think is a problem of the headers because the files that do work have the same headers as files that do not:
@SQ SN:10 LN:135534747 @SQ SN:11 LN:135006516 @SQ SN:12 LN:133851895 @SQ SN:13 LN:115169878 @SQ SN:14 LN:107349540 @SQ SN:15 LN:102531392 @SQ SN:16 LN:90354753 @SQ SN:17 LN:81195210 @SQ SN:18 LN:78077248 @SQ SN:19 LN:59128983 @SQ SN:1 LN:249250621 @SQ SN:20 LN:63025520 @SQ SN:21 LN:48129895 @SQ SN:22 LN:51304566 @SQ SN:2 LN:243199373 @SQ SN:3 LN:198022430 @SQ SN:4 LN:191154276 @SQ SN:5 LN:180915260 @SQ SN:6 LN:171115067 @SQ SN:7 LN:159138663 @SQ SN:8 LN:146364022 @SQ SN:9 LN:141213431 @SQ SN:M LN:16571 @SQ SN:X LN:155270560 @SQ SN:Y LN:59373566 @PG ID:bwa PN:bwa VN:0.6.1-r104 Any suggestions? Also, I am just wondering, if I want to merge bam files from different lanes and/or runs, do they have to have different RG headers? could I merge two bams with the headers above? Thanks for any help D. |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Germany Join Date: May 2010
Posts: 101
|
![]()
You need a @RG line in the header for each single BAM file to merge then with Picard. You can supply it to bwa sampe as an option. If you missed to do that, there is now a Picard tool "add or replace read groups".
|
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: Spain Join Date: Jul 2009
Posts: 133
|
![]()
Thanks epigen,
can @RG lines be included during merging using -r -h options as in the example in http://sourceforge.net/apps/mediawik...rged_alignment I guess that would save me from rerunning sampe samse commands |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Germany Join Date: May 2010
Posts: 101
|
![]()
That is more or less what I did before using the new Picard tool. The -r option will automatically create the RG tag for each read, the name is inferred from the BAM file name so that in the example
samtools merge -rh rg.txt [merged.bam] ga.bam 454.bam these will be ga and 454, respectively. If your BAM files do not have such nice names, you can trick samtools merge with setting symbolic links like that: ln -s your_long_named_file.bam desired_rg_name.bam (same for the BAM index file) and then use that desired_rg_name.bam files as input for samtools merge |
![]() |
![]() |
![]() |
Thread Tools | |
|
|