Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • epigen
    Senior Member
    • May 2010
    • 101

    merge BAM files from 2 SOLiD PE runs and remove duplicates

    I mapped two runs of SOLiD paired end reads with BFAST, converted into BAMs, merged them and now want to remove duplicates using Picard, but it gives the following error:

    [Fri Dec 10 11:32:00 CET 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=18422956032
    Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: null:656_1000_1619
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
    at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:273)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:113)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:97)

    I guess the problem is that the same read ID (656_1000_1619) was present in both runs. How can I tell Picard that it's from different runs? Would it help to add read group information before merging? Or would I need to make unique IDs - and how to do that in a BAM file?! (I'll also be given BAM files from the BioScope pipeline so no way for me to change anything before that.)

    Thanks in advance for any useful suggestions.

    Barbara
  • drio
    Senior Member
    • Oct 2008
    • 323

    #2
    You want to add the read group information before merging. Bfast has that option in postprocessing. You can also try with picard.
    -drd

    Comment

    • epigen
      Senior Member
      • May 2010
      • 101

      #3
      Originally posted by drio View Post
      You want to add the read group information before merging. Bfast has that option in postprocessing. You can also try with picard.
      Thanks and a happy new year!
      But how to do add the read group to a BAM file? The methods mentioned here
      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

      don't work for me and I couldn't find out which Picard tool to use.

      Comment

      • drio
        Senior Member
        • Oct 2008
        • 323

        #4
        Explore bfast's -r postprocessing option:

        Code:
        -r        RGFileName    Specifies to add the RG in the specified file to the SAM
                                  header and updates the RG tag (and LB/PU tags if present) in
                                  the reads (SAM only)
        -drd

        Comment

        • epigen
          Senior Member
          • May 2010
          • 101

          #5
          Originally posted by drio View Post
          Explore bfast's -r postprocessing option
          That's very nice, but (1) I don't want to run postprocess again on all my files and (2) I also have BAM files from BioScope to deal with.
          For some test data samtools merge worked, although not in an optimal way since it infers the RG tag from the BAM file name. At least this option would kill two birds with one stone: merging and adding RG in one go.

          Comment

          • drio
            Senior Member
            • Oct 2008
            • 323

            #6
            Explore samtools reheader and picard's ReplaceSamHeader (command line tools).
            Check if they modify the RG field once you add your new header (with your RG).
            I don't think they'll change it. In that case, you will have to use picard (or samtools lib
            in any of their flavours) to iterate over all your reads and changes the RG field manually.
            -drd

            Comment

            • epigen
              Senior Member
              • May 2010
              • 101

              #7
              Thank you again. I'll use samtools merge and try if Picard works with the RG tag. (If not, I'll go through the BAM file and add unique read names and RG.) For future BFAST aligments I'll surely use the postprocess -r option.

              Comment

              • Milan Radovich
                Junior Member
                • Oct 2009
                • 8

                #8
                I had the same issue. With BioScope produced bam files, using Picard to merge the .bam files works while using samtools does not.

                Comment

                • epigen
                  Senior Member
                  • May 2010
                  • 101

                  #9
                  BioScope produced bam files have RG information in the header and Picard keeps it there. I don't know what samtools does, it might ignore it unless you use the -r and -h options (which will change the RG tag). At least samtools merge with -r and -h worked to solve my problem with the BFAST files.

                  Comment

                  • dnusol
                    Senior Member
                    • Jul 2009
                    • 136

                    #10
                    Hi, sorry to get active this old thread. I am having the same problem with my BWA illumina aligned reads when trying to remove duplicates but only with some specific files. I don't think is a problem of the headers because the files that do work have the same headers as files that do not:

                    @SQ SN:10 LN:135534747
                    @SQ SN:11 LN:135006516
                    @SQ SN:12 LN:133851895
                    @SQ SN:13 LN:115169878
                    @SQ SN:14 LN:107349540
                    @SQ SN:15 LN:102531392
                    @SQ SN:16 LN:90354753
                    @SQ SN:17 LN:81195210
                    @SQ SN:18 LN:78077248
                    @SQ SN:19 LN:59128983
                    @SQ SN:1 LN:249250621
                    @SQ SN:20 LN:63025520
                    @SQ SN:21 LN:48129895
                    @SQ SN:22 LN:51304566
                    @SQ SN:2 LN:243199373
                    @SQ SN:3 LN:198022430
                    @SQ SN:4 LN:191154276
                    @SQ SN:5 LN:180915260
                    @SQ SN:6 LN:171115067
                    @SQ SN:7 LN:159138663
                    @SQ SN:8 LN:146364022
                    @SQ SN:9 LN:141213431
                    @SQ SN:M LN:16571
                    @SQ SN:X LN:155270560
                    @SQ SN:Y LN:59373566
                    @PG ID:bwa PN:bwa VN:0.6.1-r104


                    Any suggestions?

                    Also, I am just wondering, if I want to merge bam files from different lanes and/or runs, do they have to have different RG headers? could I merge two bams with the headers above?

                    Thanks for any help

                    D.

                    Comment

                    • epigen
                      Senior Member
                      • May 2010
                      • 101

                      #11
                      You need a @RG line in the header for each single BAM file to merge then with Picard. You can supply it to bwa sampe as an option. If you missed to do that, there is now a Picard tool "add or replace read groups".

                      Comment

                      • dnusol
                        Senior Member
                        • Jul 2009
                        • 136

                        #12
                        Thanks epigen,

                        can @RG lines be included during merging using -r -h options as in the example in

                        Download SAM tools for free. SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format.



                        I guess that would save me from rerunning sampe samse commands

                        Comment

                        • epigen
                          Senior Member
                          • May 2010
                          • 101

                          #13
                          That is more or less what I did before using the new Picard tool. The -r option will automatically create the RG tag for each read, the name is inferred from the BAM file name so that in the example

                          samtools merge -rh rg.txt [merged.bam] ga.bam 454.bam

                          these will be ga and 454, respectively. If your BAM files do not have such nice names, you can trick samtools merge with setting symbolic links like that:

                          ln -s your_long_named_file.bam desired_rg_name.bam
                          (same for the BAM index file)

                          and then use that desired_rg_name.bam files as input for samtools merge

                          Comment

                          Latest Articles

                          Collapse

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, 06-05-2026, 10:09 AM
                          0 responses
                          14 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-04-2026, 08:59 AM
                          0 responses
                          24 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 12:03 PM
                          0 responses
                          29 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 11:40 AM
                          0 responses
                          23 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...