Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RG tags in merged BAM files, muliple lanes one sample

    I know there are a few threads on this topic but I can't seem to find the answer to my particular problem!

    I am merging BAM files from 8 lanes of sequencing, they are all the same sample, I want to end up with one file with all my sample data in it to use GATK downstream, which requires the RG tags.

    So how to I generate the RG tags at samtools merge when all my files are the same sample? The example in the documentation :

    perl -e 'print "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina\n@RG\tID:454\tSM:hs\tLB:454\tPL:454\n"' > rg.txt
    samtools merge -rh rg.txt - ga.bam 454.bam | samtools rmdup - - | samtools rmdup -s - aln.bam

    tags alignments from each input file separately,

    where I have lane1_sorted.bam, lane2_sorted.bam... etc

    I'm sure its something quite simple, any tips would be appreciated!
    Thanks.

  • #2
    Just found the answer to my own question... it was right there under my nose in the documentation of course! I will post here in case anyone else encounters the same thing

    A concrete example may be instructive. Support I have a trio of samples: MOM, DAD, and KID. Each has two DNA libraries prepared, one with 400 bp inserts and another with 200 bp inserts. Each of these libraries is run on two lanes of an illumina hiseq, requiring 3 x 2 x 2 = 12 lanes of data. When the data come off the sequencer, I would create 12 bam files, with the following @RG fields in the header:

    Dad's data:
    @RG ID:FLOWCELL1.LANE1 PL:illumina LB:LIB-DAD-1 SMAD PI:200
    @RG ID:FLOWCELL1.LANE2 PL:illumina LB:LIB-DAD-1 SMAD PI:200
    @RG ID:FLOWCELL1.LANE3 PL:illumina LB:LIB-DAD-2 SMAD PI:400
    @RG ID:FLOWCELL1.LANE4 PL:illumina LB:LIB-DAD-2 SMAD PI:400

    Mom's data:
    @RG ID:FLOWCELL1.LANE5 PL:illumina LB:LIB-MOM-1 SM:MOM PI:200
    @RG ID:FLOWCELL1.LANE6 PL:illumina LB:LIB-MOM-1 SM:MOM PI:200
    @RG ID:FLOWCELL1.LANE7 PL:illumina LB:LIB-MOM-2 SM:MOM PI:400
    @RG ID:FLOWCELL1.LANE8 PL:illumina LB:LIB-MOM-2 SM:MOM PI:400

    Kid's data:
    @RG ID:FLOWCELL2.LANE1 PL:illumina LB:LIB-KID-1 SM:KID PI:200
    @RG ID:FLOWCELL2.LANE2 PL:illumina LB:LIB-KID-1 SM:KID PI:200
    @RG ID:FLOWCELL2.LANE3 PL:illumina LB:LIB-KID-2 SM:KID PI:400
    @RG ID:FLOWCELL2.LANE4 PL:illumina LB:LIB-KID-2 SM:KID PI:400

    Note the hierarchical relationship between read groups (unique for each lane) to libraries (sequenced on two lanes) and samples (across four lanes, two lanes for each library)

    Comment


    • #3
      Ok I thought I had fixed the problem but its still not working!

      When running a GATK tool I still get a read group error

      ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/586/586_merged.bam} is malformed: Read HWI-ST240R_178:5:1206:13820:25121 lacks read group information; Please associate all reads with read groups

      I thought I had attached RG info to all reads with the following commands:

      perl -e 'print "@RG\tID:586_1_sorted\tSM:586\tLB:586_1\tPL:Illumina\n@RG\tID:586_2_sorted\tSM:586\tLB:586_2\tPL:Illumina\n@RG\tID:586_3_sorted\tSM:586\tLB:586_3\tPL:Illumina\n@RG\tID:586_4_sorted\tSM:586\tLB:586_4\tPL:Illumina\n@RG\tID:586_5_sorted\tSM:586\tLB:586_5\tPL:Illumina\n@RG\tID:586_6_sorted\tSM:586\tLB:586_6\tPL:Illumina\n@RG\tID:586_7_sorted\tSM:586\tLB:586_7\tPL:Illumina\n@RG\tID:586_8_sorted\tSM:586\tLB:586_8\tPL:Illumina\n"' >rg.txt


      samtools merge -rh rg.txt 586_merged.bam 586_1_sorted.bam 586_2_sorted.bam 586_3_sorted.bam 586_4_sorted.bam 586_5_sorted.bam 586_6_sorted.bam 586_7_sorted.bam 586_8_sorted.bam

      So why I am getting this error and what can I do to get GATK tools to run on my file!

      Comment


      • #4
        Use picard

        This works great for GATK downstream analysis

        Comment


        • #5
          Originally posted by cow_girl View Post
          Ok I thought I had fixed the problem but its still not working!

          When running a GATK tool I still get a read group error

          ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/586/586_merged.bam} is malformed: Read HWI-ST240R_178:5:1206:13820:25121 lacks read group information; Please associate all reads with read groups

          I thought I had attached RG info to all reads with the following commands:

          perl -e 'print "@RG\tID:586_1_sorted\tSM:586\tLB:586_1\tPL:Illumina\n@RG\tID:586_2_sorted\tSM:586\tLB:586_2\tPL:Illumina\n@RG\tID:586_3_sorted\tSM:586\tLB:586_3\tPL:Illumina\n@RG\tID:586_4_sorted\tSM:586\tLB:586_4\tPL:Illumina\n@RG\tID:586_5_sorted\tSM:586\tLB:586_5\tPL:Illumina\n@RG\tID:586_6_sorted\tSM:586\tLB:586_6\tPL:Illumina\n@RG\tID:586_7_sorted\tSM:586\tLB:586_7\tPL:Illumina\n@RG\tID:586_8_sorted\tSM:586\tLB:586_8\tPL:Illumina\n"' >rg.txt


          samtools merge -rh rg.txt 586_merged.bam 586_1_sorted.bam 586_2_sorted.bam 586_3_sorted.bam 586_4_sorted.bam 586_5_sorted.bam 586_6_sorted.bam 586_7_sorted.bam 586_8_sorted.bam

          So why I am getting this error and what can I do to get GATK tools to run on my file!

          Hello @cowgirl
          please how did you solve this problem, because i have the same issue now
          please can you share it. i remain grateful

          here is the error i got

          hello every one please i got this error while running this command

          elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam

          MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem

          ERROR ------------------------------------------------------------------------------------------
          how can i add the missing read group for HWI-ST132_0461:3:2201:1211:140854#GTCCTA or define it in the header. thanks

          thanks

          Comment


          • #6
            hello every one please i got this error while running this command

            elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam

            MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem

            ERROR ------------------------------------------------------------------------------------------
            how can i add the missing read group for HWI-ST132_0461:3:2201:1211:140854#GTCCTA or define it in the header. thanks

            thanks

            Comment


            • #7
              Originally posted by aforntacc View Post
              how can i add the missing read group for HWI-ST132_0461:3:2201:1211:140854#GTCCTA or define it in the header. thanks
              That error message has a link, which tells you to just use the AddOrReplaceReadGroups command in Picard. Have you done that?

              Comment


              • #8
                Originally posted by dpryan View Post
                That error message has a link, which tells you to just use the AddOrReplaceReadGroups command in Picard. Have you done that?
                this is what i am doing now and i am applying stringency to "Lenient hope it works. my main aim is to merge my 3 libraries(3bam files into 1bam) together after adding the read group individually is this possible? if yes please how can i do

                Comment


                • #9
                  Originally posted by aforntacc View Post
                  this is what i am doing now and i am applying stringency to "Lenient hope it works. my main aim is to merge my 3 libraries(3bam files into 1bam) together after adding the read group individually is this possible? if yes please how can i do
                  Sure, that's easily possible. Replace the headers with picard and then merge them with samtools or picard (MergeSamFiles, which is a somewhat poorly named command since it also says it can handle BAM files).

                  Comment


                  • #10
                    Hi all i was able to add the readgroups fine. but now i am running the somaticindeldetector and i am having this warning below
                    please what does the warning mean?
                    here is the command line i used. elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R V.fasta -T SomaticIndelDetector -o indelswt1vmut4.vcf -verbose indels.txt -I:normal wty1.bam -I:tumor mut4.bam

                    WARNING: a count of 10000 reads reached in a window chr1:20955379-20955578 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:20955680-20955879 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21076423-21076622 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21077327-21077526 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21077618-21077817 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21077909-21078108 in tumor sample. The whole window will be dropped.
                    WARNING: a count of 10000 reads reached in a window chr1:21078412-21078611 in tumor sample. The whole window will be dropped.
                    INFO 12:35:18,002 TraversalEngine - chr1:21174588 9.45e+06 4.5 m 28.6 s 4.4% 103.4 m 98.9 m


                    thanks a lot

                    Comment


                    • #11
                      Originally posted by aforntacc View Post
                      Hi all i was able to add the readgroups fine. but now i am running the somaticindeldetector and i am having this warning below
                      please what does the warning mean?
                      Have a look at your alignments in that region in IGV or some other genome browser. I expect you'll find the depth goes crazy high. Think about the things that could cause that and how they might affect your Indel calls.

                      Comment


                      • #12
                        Originally posted by dpryan View Post
                        Have a look at your alignments in that region in IGV or some other genome browser. I expect you'll find the depth goes crazy high. Think about the things that could cause that and how they might affect your Indel calls.

                        Ok, yes i had a look at the alignment but i am confused because i do not really understand to be honest what the effect of this warning will be on the output.
                        please if you can shed more light or explain more because i think i am a novice on this one.

                        thanks a lot and kindly help

                        Comment


                        • #13
                          Hi there
                          I got the same error, can someone explain me please.
                          Thank you

                          Comment


                          • #14
                            Originally posted by jp. View Post
                            Hi there
                            I got the same error, can someone explain me please.
                            Thank you
                            Which one? There are multiple error messages in this thread, most of which with solutions.

                            Comment


                            • #15
                              Thank you for your kind and fast reply. I got errror:
                              WARNING: a count of 10000 reads reached in a window chr1:20955379-20955578 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:20955680-20955879 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21076423-21076622 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21077327-21077526 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21077618-21077817 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21077909-21078108 in tumor sample. The whole window will be dropped.
                              WARNING: a count of 10000 reads reached in a window chr1:21078412-21078611 in tumor sample. The whole window will be dropped


                              Originally posted by dpryan View Post
                              Which one? There are multiple error messages in this thread, most of which with solutions.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X