Hello,
I recently started to work with exome. To test my competency, I aligned exome from the HapMap project using the following commands:
first i downloaded SRR292250.sra form SRA
$ fastqdump SRR292250.sra SRR292250.fastq
$ bwa aln -t 8 SRR292250.fastq > SRR292250.sai
$ bwa samse -r "@RG\tID:IDa\tSM:SM\tPL:Illumina" gatk/ensembl_g37/human_g1k_v37.fasta SRR292250.sai SRR292250.fastq > SRR292250.sam
$java -Xmx4g -Djava.io.tmpdir=/tmp -jar picard/SortSam.jar SO=coordinate INPUT=SRR292250.sam OUTPUT=SRR292250.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
I continued with the GATK pipeline to call variants. But it didn't work. I think my problem is in my alignment. Below are the results from flagstat and idx stat. Do they look correct?
$ samtools flagstat SRR292250.bam
85493722 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1680 + 0 mapped (0.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
$ samtools idxstats SRR292250.bam
1 249250621 22 0
2 243199373 1332 0
3 198022430 26 0
4 191154276 30 0
5 180915260 23 0
6 171115067 24 0
7 159138663 9 0
8 146364022 20 0
9 141213431 19 0
10 135534747 9 0
11 135006516 10 0
12 133851895 20 0
13 115169878 21 0
14 107349540 16 0
15 102531392 5 0
16 90354753 16 0
17 81195210 10 0
18 78077248 5 0
19 59128983 2 0
20 63025520 7 0
21 48129895 8 0
22 51304566 3 0
X 155270560 40 0
Y 59373566 3 0
MT 16569 0 0
GL000207.1 4262 0 0
...
* 0 0 85492042
Is it correct?
if you want i can also give the results from the coverage.
any hints?
thank you very much for your help.
tuka.
I recently started to work with exome. To test my competency, I aligned exome from the HapMap project using the following commands:
first i downloaded SRR292250.sra form SRA
$ fastqdump SRR292250.sra SRR292250.fastq
$ bwa aln -t 8 SRR292250.fastq > SRR292250.sai
$ bwa samse -r "@RG\tID:IDa\tSM:SM\tPL:Illumina" gatk/ensembl_g37/human_g1k_v37.fasta SRR292250.sai SRR292250.fastq > SRR292250.sam
$java -Xmx4g -Djava.io.tmpdir=/tmp -jar picard/SortSam.jar SO=coordinate INPUT=SRR292250.sam OUTPUT=SRR292250.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
I continued with the GATK pipeline to call variants. But it didn't work. I think my problem is in my alignment. Below are the results from flagstat and idx stat. Do they look correct?
$ samtools flagstat SRR292250.bam
85493722 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1680 + 0 mapped (0.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
$ samtools idxstats SRR292250.bam
1 249250621 22 0
2 243199373 1332 0
3 198022430 26 0
4 191154276 30 0
5 180915260 23 0
6 171115067 24 0
7 159138663 9 0
8 146364022 20 0
9 141213431 19 0
10 135534747 9 0
11 135006516 10 0
12 133851895 20 0
13 115169878 21 0
14 107349540 16 0
15 102531392 5 0
16 90354753 16 0
17 81195210 10 0
18 78077248 5 0
19 59128983 2 0
20 63025520 7 0
21 48129895 8 0
22 51304566 3 0
X 155270560 40 0
Y 59373566 3 0
MT 16569 0 0
GL000207.1 4262 0 0
...
* 0 0 85492042
Is it correct?
if you want i can also give the results from the coverage.
any hints?
thank you very much for your help.
tuka.
Comment