EDIT:I found an old thread saying that each read group refers to a lane for the Illumina platform. My question now becomes, if I had two lanes of the same library and sample, could I assign them different read group ID while be able to merge them as a single dataset for downstream analysis? Thanks and sorry for not looking up thoroughly in the first place.
Could anyone comment on what exactly does read group means physically?
Under SAM file format, the RG header consists of different subfields. I am having a hard time imagining what exactly is readgroup(ID), library(LB) and sample(SM) referring to in real life. My guess is that the SM refers to the sample-name I assign to the DNA material that's being prepped up as library for example HUMAN_SAMPLE_A, and then LB is just another name I make up when I finish making a library for example batch1_of_HUMAN_SAMPLE_A, batch2_of_HUMAN_SAMPLE_A, and readgroup is another name but I have no idea how it links to the real world and/or how it affects downstream analysis.
thanks,
CSoong
Could anyone comment on what exactly does read group means physically?
Under SAM file format, the RG header consists of different subfields. I am having a hard time imagining what exactly is readgroup(ID), library(LB) and sample(SM) referring to in real life. My guess is that the SM refers to the sample-name I assign to the DNA material that's being prepped up as library for example HUMAN_SAMPLE_A, and then LB is just another name I make up when I finish making a library for example batch1_of_HUMAN_SAMPLE_A, batch2_of_HUMAN_SAMPLE_A, and readgroup is another name but I have no idea how it links to the real world and/or how it affects downstream analysis.
thanks,
CSoong
Comment