Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ocs
    Member
    • May 2011
    • 27

    What exactly is AddOrReplaceReadGroups (picard tools) doing?

    Hello folks,

    I struggled now a few days, but I don't get it:

    What exactly is AddOrReplaceReadGroups doing?


    Also, I actually can't find a good definition what a read group is. According to some descriptions on http://www.broadinstitute.org/gsa/wi...sked_Questions I infer that this the assignment of which reads belong to which lane and chip and so all reads in a read group have their own error model.

    But what is now this tool doing? I'm using it in our pipeline, because without it one step breaks up. According to the description it replaces all groups. But why, what might be wrong with the old groups? How do they differ to the prior groups?

    My call:
    Code:
    java -Djava.io.tmpdir='$TMPDIR' -jar /opt/biosw/picard-tools-1.45/AddOrReplaceReadGroups.jar RGLB=fastq/'$basename'.fastq RGPL=solexa RGPU=run RGSM=9111 I=output/'$basename'.bam O=output/'$basename'.sorted.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE VALIDATION_STRINGENCY=LENIENT'
    Thanks in advance,
    Oliver
    Last edited by ocs; 06-07-2011, 10:54 PM. Reason: link added
  • francois.sabot
    Member
    • Dec 2009
    • 41

    #2
    A ReadGroup will assign an origin to a set of reads in order to assign a specific genotype to this origin when making the SNP/InDel calling. Without this step, you will have a set of SNPs but you cannot assign them to a specific genotype... This AddOrReplace step is requested by GATK pipeline, as it supposed you will call genotype and not only SNP. If you need only a raw set of SNP, you can use PileUp format and VarScan utility Pileup2SNP.
    Francois Sabot, PhD

    Be realistic. Demand the Impossible.
    www.wikiposon.org

    Comment

    • ocs
      Member
      • May 2011
      • 27

      #3
      Hello Francois,

      thank you for your quick answer. I get the glimpse of an idea, but your answer is not fully clear to me. With origin you mean from where the reads came physically (e.g. chip, lane)? And I know what SNP calling is (locating SNPs in comparsion to reference genome), but what is genotype calling? I can imagine that its the sum of all SNPs but I'm not sure. Even with this knowledge I can't imagine what this step is useful for. My thought is that the read groups are determined by somewhat the technology since it knows on which lanes and chips which reads were sequenced. So I think of this groups as a constant which should not be changed, this is actually my problem.

      Thanks for any hints on this!

      Comment

      • francois.sabot
        Member
        • Dec 2009
        • 41

        #4
        The origin in my case can be either a lane, the name of the individual/organism. You can have eg 10 individuals tagged in a single lane, then mapped individually and then affected to a group (eg Indiv1, Indiv2...). Then all reads from a single individual are tagged by the same flag RG at the end of the SAM line. When you merge all those 10 SAM, each lane is tagged by an origin.
        Then you asked for example to the GATK Genotyper to 'call the genotype'. It means that SNP will be identified, based on depth, quality, etc. And as each read can be affected to a specific individual, you can say obtain in the resultant VCF file an info saying 'Ok, Indiv1 has a A instead of a G at the position chr01:234554'.

        This is the genotype calling, ie affecting the specific SNPs to a specific individual.
        Francois Sabot, PhD

        Be realistic. Demand the Impossible.
        www.wikiposon.org

        Comment

        • ocs
          Member
          • May 2011
          • 27

          #5
          Hello Francois,

          thank you again for your answer. I understand now what a readgroup and genotype-calling is. But the last part of my previous post is still unclear, because I use the fastq files to align to the reference genome but in the AddOrReplaceReadGroups-step I give the same files as a read-group library. This seems redundant to me, ain't it? Shouldn't he have the read - to - read group assignment already? This is what is still confusing me.

          Thanks,
          Oliver

          Comment

          • francois.sabot
            Member
            • Dec 2009
            • 41

            #6
            Yes it is redundant at first look, but if you did not specified the RG tag during the mapping assay (as BWA allows eg), you did not have this information within the SAM file. Thus you need to add it, as the information in the SAM header in a standard version did not contain any reference to the origin of the reads.

            If you had specified it, then there is no need to perform this step.
            Francois Sabot, PhD

            Be realistic. Demand the Impossible.
            www.wikiposon.org

            Comment

            • DZhang
              Senior Member
              • Jun 2010
              • 177

              #7
              Originally posted by ocs View Post
              Hello Francois,

              thank you again for your answer. I understand now what a readgroup and genotype-calling is. But the last part of my previous post is still unclear, because I use the fastq files to align to the reference genome but in the AddOrReplaceReadGroups-step I give the same files as a read-group library. This seems redundant to me, ain't it? Shouldn't he have the read - to - read group assignment already? This is what is still confusing me.

              Thanks,
              Oliver
              Hi Ocs,

              If RG is not critical to your pipeline, you may use "VALIDATION_STRINGENCY=SILENT" to suppress the warning. I used this option a few months ago but am not sure if it still works. You may give it try and report back if it still works. Picard is under very active and rapid development, as I see it.

              Douglas

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              15 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              33 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              35 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              23 views
              0 reactions
              Last Post SEQadmin2  
              Working...