Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • readgroup id, sample, library confusion

    We recently sequenced a specific mouse strain. The sequencing data was generated on the 5500 XL platform from the same mate-pair library from a single male mouse liver. We had our sequencing done on three flowchips with each using 6,6,3 lanes respectively and generated in total of 15 lanes of data.

    I have few doubts regarding the different terminologies used such as sample, group id for my experiment. I am writing down what I have understood so far. Please correct me if I am wrong.

    1) The six lanes in a flowchip are independent. This means that beads belonging to different lanes may have same bead ids. I also noticed this in output csfasta files. I mean two csfasta files (2 lanes) either from the same flowchip or different flowchips have same csfasta header or tag id (for e.g. >96_579_1392) for reads with different sequence.

    Now, what i have understood from other resources is that each lane must be assigned a different readgroup id in the bam file. This way even if we merge two different bam files generated from independent lanes later on, the readgroup id will be able to take care of the confusion in a way shown below:

    96_579_1392 115 10 ....... RG:Z:lane1 NH:i:1 CM:i:5 NM:i:0 CQ:Z:>;6?@@@==@@@@;@@@?.@@=--@@8=*8@@8*@?@ CS:Z:T1113323122311310213123332020212001
    96_579_1392 131 5 ....... RG:Z:lane2 NH:i:0 CM:i:2 NM:i:0 CQ:Z:>;@@;@@@?.@@=--@@8=*8@@8*@?@@0/@@@5;@@ CS:Z:T1131233320202113323122311310212001

    Can you tell me if I my understanding of this concept is correct?

    2) My second question is related to Sample ID (SM) and Library (LB) tags in the SAM format. According to my understanding, the major organizational units for NGS analysis are lane < Library < Sample < Multiple-samples. In other words, multiple libraries (PE,SE or different insert sizes) for the same sample can be made and sequenced using 1 or more lanes. In our case, we have 1 sample (the mouse strain), 1 library (mate pair) and 15 lanes of data. This means that my 15 sam/bam files should have the same library and sample ID, and different readgroupID.

    Am I correct?

    Thanks a lot for your time.

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X