Hi all,
I want to get a lot human genome fragments (more than 500 million of them) randomly.
This is a partial work of the whole process. I have .sam result file from bowtie, with 10 million human genome reads alignment. I want to compare each query reads with the 'reference sequence it aligned to' from the sam file. The reference sequence I used is hg19.fa from UCSC. So I need to be able to get the sequence from hg19.fa (or chromosome files) by using the location in the sam file.
e.g. with giving: chr4:35654-35695, i could get 42bp sequences:
gtcttccagggtttttatatttttgggttttacacttaagt
so far, i had 2 solutions:
1. python script to fetch sequences from UCSC DAS server:
http://genome.ucsc.edu/cgi-bin/das/h...r4:35654,35695
2. using python script call ''samtools faidx'' command and return commnad output,
from post:
http://seqanswers.com/forums/showthr...ome+coordinate
but, they are slow. samtools faidx is bit faster than getting it from DAS server, but still slow.
so, is there any FAST way to do this? i have the seprate chromosome fasta files, and hg19.fa file.
I want to get a lot human genome fragments (more than 500 million of them) randomly.
This is a partial work of the whole process. I have .sam result file from bowtie, with 10 million human genome reads alignment. I want to compare each query reads with the 'reference sequence it aligned to' from the sam file. The reference sequence I used is hg19.fa from UCSC. So I need to be able to get the sequence from hg19.fa (or chromosome files) by using the location in the sam file.
e.g. with giving: chr4:35654-35695, i could get 42bp sequences:
gtcttccagggtttttatatttttgggttttacacttaagt
so far, i had 2 solutions:
1. python script to fetch sequences from UCSC DAS server:
http://genome.ucsc.edu/cgi-bin/das/h...r4:35654,35695
2. using python script call ''samtools faidx'' command and return commnad output,
from post:
http://seqanswers.com/forums/showthr...ome+coordinate
but, they are slow. samtools faidx is bit faster than getting it from DAS server, but still slow.
so, is there any FAST way to do this? i have the seprate chromosome fasta files, and hg19.fa file.
Comment