Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • readgroup id, sample, library confusion

    We recently sequenced a specific mouse strain. The sequencing data was generated on the 5500 XL platform from the same mate-pair library from a single male mouse liver. We had our sequencing done on three flowchips with each using 6,6,3 lanes respectively and generated in total of 15 lanes of data.

    I have few doubts regarding the different terminologies used such as sample, group id for my experiment. I am writing down what I have understood so far. Please correct me if I am wrong.

    1) The six lanes in a flowchip are independent. This means that beads belonging to different lanes may have same bead ids. I also noticed this in output csfasta files. I mean two csfasta files (2 lanes) either from the same flowchip or different flowchips have same csfasta header or tag id (for e.g. >96_579_1392) for reads with different sequence.

    Now, what i have understood from other resources is that each lane must be assigned a different readgroup id in the bam file. This way even if we merge two different bam files generated from independent lanes later on, the readgroup id will be able to take care of the confusion in a way shown below:

    96_579_1392 115 10 ....... RG:Z:lane1 NH:i:1 CM:i:5 NM:i:0 CQ:Z:>;6?@@@==@@@@;@@@?.@@=--@@8=*8@@8*@?@ CS:Z:T1113323122311310213123332020212001
    96_579_1392 131 5 ....... RG:Z:lane2 NH:i:0 CM:i:2 NM:i:0 CQ:Z:>;@@;@@@?.@@=--@@8=*8@@8*@?@@0/@@@5;@@ CS:Z:T1131233320202113323122311310212001

    Can you tell me if I my understanding of this concept is correct?

    2) My second question is related to Sample ID (SM) and Library (LB) tags in the SAM format. According to my understanding, the major organizational units for NGS analysis are lane < Library < Sample < Multiple-samples. In other words, multiple libraries (PE,SE or different insert sizes) for the same sample can be made and sequenced using 1 or more lanes. In our case, we have 1 sample (the mouse strain), 1 library (mate pair) and 15 lanes of data. This means that my 15 sam/bam files should have the same library and sample ID, and different readgroupID.

    Am I correct?

    Thanks a lot for your time.

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
30 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
32 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Working...
X