Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • merge BAM files from 2 SOLiD PE runs and remove duplicates

    I mapped two runs of SOLiD paired end reads with BFAST, converted into BAMs, merged them and now want to remove duplicates using Picard, but it gives the following error:

    [Fri Dec 10 11:32:00 CET 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=18422956032
    Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: null:656_1000_1619
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
    at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:273)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:113)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:97)

    I guess the problem is that the same read ID (656_1000_1619) was present in both runs. How can I tell Picard that it's from different runs? Would it help to add read group information before merging? Or would I need to make unique IDs - and how to do that in a BAM file?! (I'll also be given BAM files from the BioScope pipeline so no way for me to change anything before that.)

    Thanks in advance for any useful suggestions.

    Barbara

  • #2
    You want to add the read group information before merging. Bfast has that option in postprocessing. You can also try with picard.
    -drd

    Comment


    • #3
      Originally posted by drio View Post
      You want to add the read group information before merging. Bfast has that option in postprocessing. You can also try with picard.
      Thanks and a happy new year!
      But how to do add the read group to a BAM file? The methods mentioned here
      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

      don't work for me and I couldn't find out which Picard tool to use.

      Comment


      • #4
        Explore bfast's -r postprocessing option:

        Code:
        -r        RGFileName    Specifies to add the RG in the specified file to the SAM
                                  header and updates the RG tag (and LB/PU tags if present) in
                                  the reads (SAM only)
        -drd

        Comment


        • #5
          Originally posted by drio View Post
          Explore bfast's -r postprocessing option
          That's very nice, but (1) I don't want to run postprocess again on all my files and (2) I also have BAM files from BioScope to deal with.
          For some test data samtools merge worked, although not in an optimal way since it infers the RG tag from the BAM file name. At least this option would kill two birds with one stone: merging and adding RG in one go.

          Comment


          • #6
            Explore samtools reheader and picard's ReplaceSamHeader (command line tools).
            Check if they modify the RG field once you add your new header (with your RG).
            I don't think they'll change it. In that case, you will have to use picard (or samtools lib
            in any of their flavours) to iterate over all your reads and changes the RG field manually.
            -drd

            Comment


            • #7
              Thank you again. I'll use samtools merge and try if Picard works with the RG tag. (If not, I'll go through the BAM file and add unique read names and RG.) For future BFAST aligments I'll surely use the postprocess -r option.

              Comment


              • #8
                I had the same issue. With BioScope produced bam files, using Picard to merge the .bam files works while using samtools does not.

                Comment


                • #9
                  BioScope produced bam files have RG information in the header and Picard keeps it there. I don't know what samtools does, it might ignore it unless you use the -r and -h options (which will change the RG tag). At least samtools merge with -r and -h worked to solve my problem with the BFAST files.

                  Comment


                  • #10
                    Hi, sorry to get active this old thread. I am having the same problem with my BWA illumina aligned reads when trying to remove duplicates but only with some specific files. I don't think is a problem of the headers because the files that do work have the same headers as files that do not:

                    @SQ SN:10 LN:135534747
                    @SQ SN:11 LN:135006516
                    @SQ SN:12 LN:133851895
                    @SQ SN:13 LN:115169878
                    @SQ SN:14 LN:107349540
                    @SQ SN:15 LN:102531392
                    @SQ SN:16 LN:90354753
                    @SQ SN:17 LN:81195210
                    @SQ SN:18 LN:78077248
                    @SQ SN:19 LN:59128983
                    @SQ SN:1 LN:249250621
                    @SQ SN:20 LN:63025520
                    @SQ SN:21 LN:48129895
                    @SQ SN:22 LN:51304566
                    @SQ SN:2 LN:243199373
                    @SQ SN:3 LN:198022430
                    @SQ SN:4 LN:191154276
                    @SQ SN:5 LN:180915260
                    @SQ SN:6 LN:171115067
                    @SQ SN:7 LN:159138663
                    @SQ SN:8 LN:146364022
                    @SQ SN:9 LN:141213431
                    @SQ SN:M LN:16571
                    @SQ SN:X LN:155270560
                    @SQ SN:Y LN:59373566
                    @PG ID:bwa PN:bwa VN:0.6.1-r104


                    Any suggestions?

                    Also, I am just wondering, if I want to merge bam files from different lanes and/or runs, do they have to have different RG headers? could I merge two bams with the headers above?

                    Thanks for any help

                    D.

                    Comment


                    • #11
                      You need a @RG line in the header for each single BAM file to merge then with Picard. You can supply it to bwa sampe as an option. If you missed to do that, there is now a Picard tool "add or replace read groups".

                      Comment


                      • #12
                        Thanks epigen,

                        can @RG lines be included during merging using -r -h options as in the example in

                        http://sourceforge.net/apps/mediawik...rged_alignment


                        I guess that would save me from rerunning sampe samse commands

                        Comment


                        • #13
                          That is more or less what I did before using the new Picard tool. The -r option will automatically create the RG tag for each read, the name is inferred from the BAM file name so that in the example

                          samtools merge -rh rg.txt [merged.bam] ga.bam 454.bam

                          these will be ga and 454, respectively. If your BAM files do not have such nice names, you can trick samtools merge with setting symbolic links like that:

                          ln -s your_long_named_file.bam desired_rg_name.bam
                          (same for the BAM index file)

                          and then use that desired_rg_name.bam files as input for samtools merge

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          18 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          47 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X