View Single Post
Old 11-09-2016, 08:13 AM   #1
NielQC
Junior Member
 
Location: Valencia

Join Date: Nov 2016
Posts: 2
Post GIAB NA12878 dataset

Hi all,
first of all must say I'm a newbie in NGS and a completely inept in forums like this, so dont blame me too much. In fact its my first post...

My objective is to create a pipeline for variant calling using the well known reference genome NA12878 from Genome In a Bottle (GIAB) consortium for validating my variant calls. I want to use a HiSeq 300x dataset (ftp://ftp.ncbi.nlm.nih.gov/giab/ftp/...01_HiSeq_300x/), and here my doubts started. In this directory you can see folders like that:

- 131219_D00360_005_BH814YADXX
- 131219_D00360_006_AH81VLADXX
- 131223_D00360_007_BH88WKADXX
- 131223_D00360_008_AH88U0ADXX

... and so on until 14 folders. I understand every folder its a run, so I went to the newest run, "131219_D00360_005_BH814YADXX", that contains 6 samples. Cant understand how and why that samples were generated. I think they were obtained from the same library (right?), so theoretically in each sample are covered the same regions. can I merge all R1 and all R2 of all samples together in an unique R1 and R2, or should I use just one sample?

The principal problem here it's i don't understand the "sample " concept. If it's the same individual, why making 6 samples when you could just sequencing one.

I hope I have explained clearly enough my doubts, thank you in advice

NielQC

Last edited by NielQC; 11-09-2016 at 09:48 AM.
NielQC is offline   Reply With Quote