Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use AddOrReplaceReadGroups.jar with Tophat2 BAMs?

    My travails started when I tried to produce mapping stats (exons, introns, intergenic regions) for RNA-Seq samples aligned with Tophat2 and Ensembl GTF files. For a number of reasons RNA-SeQC seems to be the most appropriate tool but I am struggling to use it as I could not find a description of all the steps that I need to perform to prepare my data to be used with RNA-SeQC.

    The various internet sources seem to agree that the first thing I need to do is to use AddOrReplaceReadGroups.jar with the BAM file but I could not find documentation regarding the parameters. Google finds several command line examples and they all use different parameters. My samples are single-end and are produced on Illumina MiSeq using TruSeq. This is all I know right now. What parameters shall I use with AddOrReplaceReadGroups.jar?

  • #2
    It would help to know the exact problems you are having. You've found different examples and they have different parameters -- did you try any of them and found them lacking in some way? Anyway here is my example:
    Code:
    PicardCommandLine AddOrReplaceReadGroups INPUT=$out.sam OUTPUT=$out.bam RGID=$acc RGLB=$acc RGPU=$acc RGSM=$acc RGPL=illumina
    Where '$acc' is the read group. '$out' is my output prefix.

    Comment


    • #3
      Originally posted by westerman View Post
      Where '$acc' is the read group. '$out' is my output prefix.
      I don't have a clear idea about the read groups - my hunch is that I need to worry about them only if I have multiplexed samples in the same BAM. My BAMs contain single samples - what is the parameter that I need to use?

      My problems are that I am trying to use RNA-SeQC but I couldn't find a clear description of all the pre-processing steps - different sites mention different tools. My (rather crude at this moment) understanding is that since my BAMs have been produced by tophat2 than I need to do, roughly, the following pre-processing:
      1. AddOrReplaceReadGroups.jar
      2. samtools index
      3. samtools faidx
      4. CreateSequenceDictionary.jar
      5. ReorderSam.jar


      The problem is that the recipes differ not just in respect to the tools used but also regarding their parameters (I have no problem with samtools but the other tools are still new to me). I haven't used GATK before so I am still trying to find out what exactly do I need to do.
      Last edited by feralBiologist; 02-21-2015, 09:25 PM.

      Comment


      • #4
        [quote]
        I don't have a clear idea about the read groups - my hunch is that I need to worry about them only if I have multiplexed samples in the same BAM. My BAMs contain single samples - what is the parameter that I need to use?
        [/paste]

        Not using RNA-SeQC [although I probably should start using it] I can only guess but it does not seem that there is a parameter in RNA-SeQC to tell it that you have a single sample per BAM file. So AddOrReplaceReadGroups is needed. My example above will replace all possible parameters with the same group '$acc'. This is probably overkill and may be improper but it does work to get single read group names into the BAM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X