Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • seq_lover
    Member
    • Oct 2011
    • 18

    readgroup id, sample, library confusion

    We recently sequenced a specific mouse strain. The sequencing data was generated on the 5500 XL platform from the same mate-pair library from a single male mouse liver. We had our sequencing done on three flowchips with each using 6,6,3 lanes respectively and generated in total of 15 lanes of data.

    I have few doubts regarding the different terminologies used such as sample, group id for my experiment. I am writing down what I have understood so far. Please correct me if I am wrong.

    1) The six lanes in a flowchip are independent. This means that beads belonging to different lanes may have same bead ids. I also noticed this in output csfasta files. I mean two csfasta files (2 lanes) either from the same flowchip or different flowchips have same csfasta header or tag id (for e.g. >96_579_1392) for reads with different sequence.

    Now, what i have understood from other resources is that each lane must be assigned a different readgroup id in the bam file. This way even if we merge two different bam files generated from independent lanes later on, the readgroup id will be able to take care of the confusion in a way shown below:

    96_579_1392 115 10 ....... RG:Z:lane1 NH:i:1 CM:i:5 NM:i:0 CQ:Z:>;6?@@@==@@@@;@@@?.@@=--@@8=*8@@8*@?@ CS:Z:T1113323122311310213123332020212001
    96_579_1392 131 5 ....... RG:Z:lane2 NH:i:0 CM:i:2 NM:i:0 CQ:Z:>;@@;@@@?.@@=--@@8=*8@@8*@?@@0/@@@5;@@ CS:Z:T1131233320202113323122311310212001

    Can you tell me if I my understanding of this concept is correct?

    2) My second question is related to Sample ID (SM) and Library (LB) tags in the SAM format. According to my understanding, the major organizational units for NGS analysis are lane < Library < Sample < Multiple-samples. In other words, multiple libraries (PE,SE or different insert sizes) for the same sample can be made and sequenced using 1 or more lanes. In our case, we have 1 sample (the mouse strain), 1 library (mate pair) and 15 lanes of data. This means that my 15 sam/bam files should have the same library and sample ID, and different readgroupID.

    Am I correct?

    Thanks a lot for your time.

Latest Articles

Collapse

  • SEQadmin2
    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
    by SEQadmin2


    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


    Here are nine questions we think about, in roughly the order they matter, before...
    06-18-2026, 07:11 AM
  • SEQadmin2
    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
    by SEQadmin2


    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
    ...
    06-02-2026, 10:05 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by SEQadmin2, 06-17-2026, 06:09 AM
0 responses
24 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-09-2026, 11:58 AM
0 responses
42 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-05-2026, 10:09 AM
0 responses
48 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-04-2026, 08:59 AM
0 responses
49 views
0 reactions
Last Post SEQadmin2  
Working...