Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cow_girl
    Junior Member
    • Dec 2010
    • 6

    RG tags in merged BAM files, muliple lanes one sample

    I know there are a few threads on this topic but I can't seem to find the answer to my particular problem!

    I am merging BAM files from 8 lanes of sequencing, they are all the same sample, I want to end up with one file with all my sample data in it to use GATK downstream, which requires the RG tags.

    So how to I generate the RG tags at samtools merge when all my files are the same sample? The example in the documentation :

    perl -e 'print "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina\n@RG\tID:454\tSM:hs\tLB:454\tPL:454\n"' > rg.txt
    samtools merge -rh rg.txt - ga.bam 454.bam | samtools rmdup - - | samtools rmdup -s - aln.bam

    tags alignments from each input file separately,

    where I have lane1_sorted.bam, lane2_sorted.bam... etc

    I'm sure its something quite simple, any tips would be appreciated!
    Thanks.
  • cow_girl
    Junior Member
    • Dec 2010
    • 6

    #2
    Just found the answer to my own question... it was right there under my nose in the documentation of course! I will post here in case anyone else encounters the same thing

    A concrete example may be instructive. Support I have a trio of samples: MOM, DAD, and KID. Each has two DNA libraries prepared, one with 400 bp inserts and another with 200 bp inserts. Each of these libraries is run on two lanes of an illumina hiseq, requiring 3 x 2 x 2 = 12 lanes of data. When the data come off the sequencer, I would create 12 bam files, with the following @RG fields in the header:

    Dad's data:
    @RG ID:FLOWCELL1.LANE1 PL:illumina LB:LIB-DAD-1 SMAD PI:200
    @RG ID:FLOWCELL1.LANE2 PL:illumina LB:LIB-DAD-1 SMAD PI:200
    @RG ID:FLOWCELL1.LANE3 PL:illumina LB:LIB-DAD-2 SMAD PI:400
    @RG ID:FLOWCELL1.LANE4 PL:illumina LB:LIB-DAD-2 SMAD PI:400

    Mom's data:
    @RG ID:FLOWCELL1.LANE5 PL:illumina LB:LIB-MOM-1 SM:MOM PI:200
    @RG ID:FLOWCELL1.LANE6 PL:illumina LB:LIB-MOM-1 SM:MOM PI:200
    @RG ID:FLOWCELL1.LANE7 PL:illumina LB:LIB-MOM-2 SM:MOM PI:400
    @RG ID:FLOWCELL1.LANE8 PL:illumina LB:LIB-MOM-2 SM:MOM PI:400

    Kid's data:
    @RG ID:FLOWCELL2.LANE1 PL:illumina LB:LIB-KID-1 SM:KID PI:200
    @RG ID:FLOWCELL2.LANE2 PL:illumina LB:LIB-KID-1 SM:KID PI:200
    @RG ID:FLOWCELL2.LANE3 PL:illumina LB:LIB-KID-2 SM:KID PI:400
    @RG ID:FLOWCELL2.LANE4 PL:illumina LB:LIB-KID-2 SM:KID PI:400

    Note the hierarchical relationship between read groups (unique for each lane) to libraries (sequenced on two lanes) and samples (across four lanes, two lanes for each library)

    Comment

    • cow_girl
      Junior Member
      • Dec 2010
      • 6

      #3
      Ok I thought I had fixed the problem but its still not working!

      When running a GATK tool I still get a read group error

      ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/586/586_merged.bam} is malformed: Read HWI-ST240R_178:5:1206:13820:25121 lacks read group information; Please associate all reads with read groups

      I thought I had attached RG info to all reads with the following commands:

      perl -e 'print "@RG\tID:586_1_sorted\tSM:586\tLB:586_1\tPL:Illumina\n@RG\tID:586_2_sorted\tSM:586\tLB:586_2\tPL:Illumina\n@RG\tID:586_3_sorted\tSM:586\tLB:586_3\tPL:Illumina\n@RG\tID:586_4_sorted\tSM:586\tLB:586_4\tPL:Illumina\n@RG\tID:586_5_sorted\tSM:586\tLB:586_5\tPL:Illumina\n@RG\tID:586_6_sorted\tSM:586\tLB:586_6\tPL:Illumina\n@RG\tID:586_7_sorted\tSM:586\tLB:586_7\tPL:Illumina\n@RG\tID:586_8_sorted\tSM:586\tLB:586_8\tPL:Illumina\n"' >rg.txt


      samtools merge -rh rg.txt 586_merged.bam 586_1_sorted.bam 586_2_sorted.bam 586_3_sorted.bam 586_4_sorted.bam 586_5_sorted.bam 586_6_sorted.bam 586_7_sorted.bam 586_8_sorted.bam

      So why I am getting this error and what can I do to get GATK tools to run on my file!

      Comment

      • aaronh
        Member
        • Sep 2008
        • 46

        #4
        Use picard

        This works great for GATK downstream analysis

        Comment

        • aforntacc
          Member
          • Jun 2011
          • 48

          #5
          Originally posted by cow_girl View Post
          Ok I thought I had fixed the problem but its still not working!

          When running a GATK tool I still get a read group error

          ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/586/586_merged.bam} is malformed: Read HWI-ST240R_178:5:1206:13820:25121 lacks read group information; Please associate all reads with read groups

          I thought I had attached RG info to all reads with the following commands:

          perl -e 'print "@RG\tID:586_1_sorted\tSM:586\tLB:586_1\tPL:Illumina\n@RG\tID:586_2_sorted\tSM:586\tLB:586_2\tPL:Illumina\n@RG\tID:586_3_sorted\tSM:586\tLB:586_3\tPL:Illumina\n@RG\tID:586_4_sorted\tSM:586\tLB:586_4\tPL:Illumina\n@RG\tID:586_5_sorted\tSM:586\tLB:586_5\tPL:Illumina\n@RG\tID:586_6_sorted\tSM:586\tLB:586_6\tPL:Illumina\n@RG\tID:586_7_sorted\tSM:586\tLB:586_7\tPL:Illumina\n@RG\tID:586_8_sorted\tSM:586\tLB:586_8\tPL:Illumina\n"' >rg.txt


          samtools merge -rh rg.txt 586_merged.bam 586_1_sorted.bam 586_2_sorted.bam 586_3_sorted.bam 586_4_sorted.bam 586_5_sorted.bam 586_6_sorted.bam 586_7_sorted.bam 586_8_sorted.bam

          So why I am getting this error and what can I do to get GATK tools to run on my file!

          Hello @cowgirl
          please how did you solve this problem, because i have the same issue now
          please can you share it. i remain grateful

          here is the error i got

          hello every one please i got this error while running this command

          elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam

          MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem

          ERROR ------------------------------------------------------------------------------------------
          how can i add the missing read group for HWI-ST132_0461:3:2201:1211:140854#GTCCTA or define it in the header. thanks

          thanks

          Comment

          • aforntacc
            Member
            • Jun 2011
            • 48

            #6
            hello every one please i got this error while running this command

            elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam

            MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem

            ERROR ------------------------------------------------------------------------------------------
            how can i add the missing read group for HWI-ST132_0461:3:2201:1211:140854#GTCCTA or define it in the header. thanks

            thanks

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              Originally posted by aforntacc View Post
              how can i add the missing read group for HWI-ST132_0461:3:2201:1211:140854#GTCCTA or define it in the header. thanks
              That error message has a link, which tells you to just use the AddOrReplaceReadGroups command in Picard. Have you done that?

              Comment

              • aforntacc
                Member
                • Jun 2011
                • 48

                #8
                Originally posted by dpryan View Post
                That error message has a link, which tells you to just use the AddOrReplaceReadGroups command in Picard. Have you done that?
                this is what i am doing now and i am applying stringency to "Lenient hope it works. my main aim is to merge my 3 libraries(3bam files into 1bam) together after adding the read group individually is this possible? if yes please how can i do

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  Originally posted by aforntacc View Post
                  this is what i am doing now and i am applying stringency to "Lenient hope it works. my main aim is to merge my 3 libraries(3bam files into 1bam) together after adding the read group individually is this possible? if yes please how can i do
                  Sure, that's easily possible. Replace the headers with picard and then merge them with samtools or picard (MergeSamFiles, which is a somewhat poorly named command since it also says it can handle BAM files).

                  Comment

                  • aforntacc
                    Member
                    • Jun 2011
                    • 48

                    #10
                    Hi all i was able to add the readgroups fine. but now i am running the somaticindeldetector and i am having this warning below
                    please what does the warning mean?
                    here is the command line i used. elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R V.fasta -T SomaticIndelDetector -o indelswt1vmut4.vcf -verbose indels.txt -I:normal wty1.bam -I:tumor mut4.bam

                    WARNING: a count of 10000 reads reached in a window chr1:20955379-20955578 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:20955680-20955879 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21076423-21076622 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21077327-21077526 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21077618-21077817 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21077909-21078108 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21078412-21078611 in tumor sample. The whole window will be dropped.
                    INFO 12:35:18,002 TraversalEngine - chr1:21174588 9.45e+06 4.5 m 28.6 s 4.4% 103.4 m 98.9 m


                    thanks a lot

                    Comment

                    • dpryan
                      Devon Ryan
                      • Jul 2011
                      • 3478

                      #11
                      Originally posted by aforntacc View Post
                      Hi all i was able to add the readgroups fine. but now i am running the somaticindeldetector and i am having this warning below
                      please what does the warning mean?
                      Have a look at your alignments in that region in IGV or some other genome browser. I expect you'll find the depth goes crazy high. Think about the things that could cause that and how they might affect your Indel calls.

                      Comment

                      • aforntacc
                        Member
                        • Jun 2011
                        • 48

                        #12
                        Originally posted by dpryan View Post
                        Have a look at your alignments in that region in IGV or some other genome browser. I expect you'll find the depth goes crazy high. Think about the things that could cause that and how they might affect your Indel calls.

                        Ok, yes i had a look at the alignment but i am confused because i do not really understand to be honest what the effect of this warning will be on the output.
                        please if you can shed more light or explain more because i think i am a novice on this one.

                        thanks a lot and kindly help

                        Comment

                        • jp.
                          Senior Member
                          • Jul 2013
                          • 142

                          #13
                          Hi there
                          I got the same error, can someone explain me please.
                          Thank you

                          Comment

                          • dpryan
                            Devon Ryan
                            • Jul 2011
                            • 3478

                            #14
                            Originally posted by jp. View Post
                            Hi there
                            I got the same error, can someone explain me please.
                            Thank you
                            Which one? There are multiple error messages in this thread, most of which with solutions.

                            Comment

                            • jp.
                              Senior Member
                              • Jul 2013
                              • 142

                              #15
                              Thank you for your kind and fast reply. I got errror:
                              WARNING: a count of 10000 reads reached in a window chr1:20955379-20955578 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:20955680-20955879 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21076423-21076622 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21077327-21077526 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21077618-21077817 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21077909-21078108 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21078412-21078611 in tumor sample. The whole window will be dropped


                              Originally posted by dpryan View Post
                              Which one? There are multiple error messages in this thread, most of which with solutions.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              28 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              39 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              61 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...