Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAM format readgroup, what is it exactly?

    EDIT:I found an old thread saying that each read group refers to a lane for the Illumina platform. My question now becomes, if I had two lanes of the same library and sample, could I assign them different read group ID while be able to merge them as a single dataset for downstream analysis? Thanks and sorry for not looking up thoroughly in the first place.


    Could anyone comment on what exactly does read group means physically?
    Under SAM file format, the RG header consists of different subfields. I am having a hard time imagining what exactly is readgroup(ID), library(LB) and sample(SM) referring to in real life. My guess is that the SM refers to the sample-name I assign to the DNA material that's being prepped up as library for example HUMAN_SAMPLE_A, and then LB is just another name I make up when I finish making a library for example batch1_of_HUMAN_SAMPLE_A, batch2_of_HUMAN_SAMPLE_A, and readgroup is another name but I have no idea how it links to the real world and/or how it affects downstream analysis.
    thanks,

    CSoong
    Last edited by csoong; 12-23-2010, 01:58 PM. Reason: found an old thread about read group

  • #2
    All the tags from the @RG record are optional except ID.

    The record is useful when you have a BAM containing data (alignments) from multiple sources. The level of granularity tries to capture all the different possibilities, meaning, you may have reads from different libraries, different runs, different instruments, different platforms, etc... The @RG record allows you to have one single BAM but still be able to determine (with all the detail you want) from where that read was coming from.

    The ID tag in the @RG record links together reads that are under the same group (You define what a group means for you with all the other tags in the RG record).
    -drd

    Comment


    • #3
      Thanks again Drio.

      I pasted the example from sam1.pdf below: (SAM format spec pdf file page 4)
      It has 2 RG headers, how could one tell which RG ID the trailing 2 reads belong to? I don't see a correspondence between the read records and RG IDs.
      ~~~~
      I see it, the info is in the last column
      ~~~~
      @HD VN:1.0
      @SQ SN:chr20 LN:62435964
      @RG ID:L1 PU:SC_1_10 LB:SC_1 SM:NA12891
      @RG ID:L2 PU:SC_2_12 LB:SC_2 SM:NA12891
      read_28833_29006_6945 99 chr20 28833 20 10M1D25M = 28993 195 \
      AGCTTAGCTAGCTACCTATATCTTGGTCTTGGCCG <<<<<<<<<<<<<<<<<<<<<
      <9/,&,22;;<<< NM:i:1 RG:Z:L1

      read_28701_28881_323b 147 chr20 28834 30 35M = 28701 -168 \
      ACCTATATCTTGGCCTTGGCCGATGCGGCCTTGCA <<<<<;<<<<7;:<<<6;<<<<<<<<<<<<7<<<< MF:i:18 RG:Z:L2
      Last edited by csoong; 12-23-2010, 03:36 PM. Reason: I sEE it

      Comment


      • #4
        Check the RG field at the end of each read entry. Read1 points toread group L1 and read 2 to L2.
        -drd

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        48 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        34 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Working...
        X