Hi all,
first of all must say I'm a newbie in NGS and a completely inept in forums like this, so dont blame me too much. In fact its my first post...
My objective is to create a pipeline for variant calling using the well known reference genome NA12878 from Genome In a Bottle (GIAB) consortium for validating my variant calls. I want to use a HiSeq 300x dataset (ftp://ftp.ncbi.nlm.nih.gov/giab/ftp/...01_HiSeq_300x/), and here my doubts started. In this directory you can see folders like that:
- 131219_D00360_005_BH814YADXX
- 131219_D00360_006_AH81VLADXX
- 131223_D00360_007_BH88WKADXX
- 131223_D00360_008_AH88U0ADXX
... and so on until 14 folders. I understand every folder its a run, so I went to the newest run, "131219_D00360_005_BH814YADXX", that contains 6 samples. Cant understand how and why that samples were generated. I think they were obtained from the same library (right?), so theoretically in each sample are covered the same regions. can I merge all R1 and all R2 of all samples together in an unique R1 and R2, or should I use just one sample?
The principal problem here it's i don't understand the "sample " concept. If it's the same individual, why making 6 samples when you could just sequencing one.
I hope I have explained clearly enough my doubts, thank you in advice
NielQC
first of all must say I'm a newbie in NGS and a completely inept in forums like this, so dont blame me too much. In fact its my first post...
My objective is to create a pipeline for variant calling using the well known reference genome NA12878 from Genome In a Bottle (GIAB) consortium for validating my variant calls. I want to use a HiSeq 300x dataset (ftp://ftp.ncbi.nlm.nih.gov/giab/ftp/...01_HiSeq_300x/), and here my doubts started. In this directory you can see folders like that:
- 131219_D00360_005_BH814YADXX
- 131219_D00360_006_AH81VLADXX
- 131223_D00360_007_BH88WKADXX
- 131223_D00360_008_AH88U0ADXX
... and so on until 14 folders. I understand every folder its a run, so I went to the newest run, "131219_D00360_005_BH814YADXX", that contains 6 samples. Cant understand how and why that samples were generated. I think they were obtained from the same library (right?), so theoretically in each sample are covered the same regions. can I merge all R1 and all R2 of all samples together in an unique R1 and R2, or should I use just one sample?
The principal problem here it's i don't understand the "sample " concept. If it's the same individual, why making 6 samples when you could just sequencing one.
I hope I have explained clearly enough my doubts, thank you in advice
NielQC
Comment