Hi There !
This thread is intended to questions related to conversion from Complete Genomics Data to SAM/BAM formats.
So here's mine:
I'm trying to get BAM files for some of the individuals sequenced by CG. I want all reads mapped to the reference and not just alignments around variants (evidenceDnbs files) so I downloaded all MAP data associated with the individuals of my interest and I got an structure data like this for each of the individuals:
i.e
NA06994
GS06994-1100-36-MAP
GS19360-FS3-L01
mapping_GS19360-FS3-L01_001.tsv.bz2
reads_GS19360-FS3-L01_001.tsv.bz2
..
reads_GS19360-FS3-L01_011.tsv.bz2
mapping_GS19360-FS3-L01_011.tsv.bz2
GS19360-FS3-L02
GS19360-FS3-L03
GS19360-FS3-L04
GS19360-FS3-L05
..
GS20181-FS3-L08
There’s no other way to get a BAM file for a particular location or chromosome (my interest) than going through the entire set of files. I’m currently converting each of the reads and mapping files to BAM files (It’s taking quite a long) using the map2sam command provided by CG here:
My question is: In order to get a full genome BAM file do I need to make first BAM files for all those reads/mapping pairs and them merge them all using samtools? I think the answer is ‘yes you do’ but I’m still a bit confused with this data structure and why there are so many folders and files for every single individual.
Many thanks in advance,
J. Rodrigo Flores
[email protected]
This thread is intended to questions related to conversion from Complete Genomics Data to SAM/BAM formats.
So here's mine:
I'm trying to get BAM files for some of the individuals sequenced by CG. I want all reads mapped to the reference and not just alignments around variants (evidenceDnbs files) so I downloaded all MAP data associated with the individuals of my interest and I got an structure data like this for each of the individuals:
i.e
NA06994
GS06994-1100-36-MAP
GS19360-FS3-L01
mapping_GS19360-FS3-L01_001.tsv.bz2
reads_GS19360-FS3-L01_001.tsv.bz2
..
reads_GS19360-FS3-L01_011.tsv.bz2
mapping_GS19360-FS3-L01_011.tsv.bz2
GS19360-FS3-L02
GS19360-FS3-L03
GS19360-FS3-L04
GS19360-FS3-L05
..
GS20181-FS3-L08
There’s no other way to get a BAM file for a particular location or chromosome (my interest) than going through the entire set of files. I’m currently converting each of the reads and mapping files to BAM files (It’s taking quite a long) using the map2sam command provided by CG here:
My question is: In order to get a full genome BAM file do I need to make first BAM files for all those reads/mapping pairs and them merge them all using samtools? I think the answer is ‘yes you do’ but I’m still a bit confused with this data structure and why there are so many folders and files for every single individual.
Many thanks in advance,
J. Rodrigo Flores
[email protected]
Comment