Hi everyone, I am mapping color-space fasta and QV files (sequenced in ABI SOLID) from a human exome, using the reference given by the GATK resources (Kariotype sorted). I am using Bowtie for mapping it (indexing and mapping with -C flag). my code line was:
My main goal is do a variant calling analysis, using GATK.
The results look like good (I have saw that in general the number of mapped reads for color space format is very low):
My doubt is regarding the SAM file output, It has not the the alignment section bellow the header @PG (It is the end of my SAM file), whit the mandatories fields as QNAME, FLAG, RNAME, etc, But the most important for me, is that it lacks the mapping quality (Fifth column normally).
This is the beginning and end of my SAM file:
End:
So, why I can not get these required fields in the SAM file?, Even I used the flag --mapq = 60 (Normally Bowtie should report mapq as 0 and 255) but it does not inform anything.
I need the mapq values to do the Base quality score recalibration with GATK and others analysis. By the moment, GATK rules out all my mapped reads, and it informs that = (100.00% of total) failing MappingQualityUnavailableFilter.
Previously I have converted the color-space fasta and QV files to fastq, but in some cases it is not possible (errors when the script run into that), and in others cases the fastq obtained is very unreliable.
Maybe is not possible get the alignment fields in the Bowtie Sam file using this kind of data?.
Thanks in advance and any suggestion or comment is very well received.
Alexis.
Code:
bowtie -p 40 --best --strata -a --mapq 60 --chunkmbs 1000 human_ref38 -f -C my_file.csfasta -Q my_file.qual -S align.sam
The results look like good (I have saw that in general the number of mapped reads for color space format is very low):
Code:
# reads processed: 9540575 # reads with at least one reported alignment: 7585206 (79.50%) # reads that failed to align: 1955369 (20.50%) Reported 188945896 alignments to 1 output stream(s)
This is the beginning and end of my SAM file:
Code:
@HD VN:1.0 SO:unsorted @SQ SN:chr1 LN:248956422 @SQ SN:chr2 LN:242193529 @SQ SN:chr3 LN:198295559 @SQ SN:chr4 LN:190214555 @SQ SN:chr5 LN:181538259 @SQ SN:chr6 LN:170805979 @SQ SN:chr7 LN:159345973 @SQ SN:chr8 LN:145138636 @SQ SN:chr9 LN:138394717 @SQ SN:chr10 LN:133797422 @SQ SN:chr11 LN:135086622 @SQ SN:chr12 LN:133275309 @SQ SN:chr13 LN:114364328 @SQ SN:chr14 LN:107043718 @SQ SN:chr15 LN:101991189 @SQ SN:chr16 LN:90338345 @SQ SN:chr17 LN:83257441 @SQ SN:chr18 LN:80373285 @SQ SN:chr19 LN:58617616 @SQ SN:chr20 LN:64444167 @SQ SN:chr21 LN:46709983 @SQ SN:chr22 LN:50818468 @SQ SN:chrX LN:156040895 @SQ SN:chrY LN:57227415 @SQ SN:chrM LN:16569 @SQ SN:chr1_KI270706v1_random LN:175055
Code:
@SQ SN:HLA-DRB1*13:01:01 LN:13935 @SQ SN:HLA-DRB1*13:02:01 LN:13941 @SQ SN:HLA-DRB1*14:05:01 LN:13933 @SQ SN:HLA-DRB1*14:54:01 LN:13936 @SQ SN:HLA-DRB1*15:01:01:01 LN:11080 @SQ SN:HLA-DRB1*15:01:01:02 LN:11571 @SQ SN:HLA-DRB1*15:01:01:03 LN:11056 @SQ SN:HLA-DRB1*15:01:01:04 LN:11056 @SQ SN:HLA-DRB1*15:02:01 LN:10313 @SQ SN:HLA-DRB1*15:03:01:01 LN:11567 @SQ SN:HLA-DRB1*15:03:01:02 LN:11569 @SQ SN:HLA-DRB1*16:02:01 LN:11005 @PG ID:Bowtie VN:0.12.5 CL:"/home/alsalas/.linuxbrew/bin/bowtie/bowtie -p 40 --best --strata -a --mapq 60 --chunkmbs 1000 human_ref38 -f -C my_file.csfasta -Q my_file.qual -S align.sam"
I need the mapq values to do the Base quality score recalibration with GATK and others analysis. By the moment, GATK rules out all my mapped reads, and it informs that = (100.00% of total) failing MappingQualityUnavailableFilter.
Previously I have converted the color-space fasta and QV files to fastq, but in some cases it is not possible (errors when the script run into that), and in others cases the fastq obtained is very unreliable.
Maybe is not possible get the alignment fields in the Bowtie Sam file using this kind of data?.
Thanks in advance and any suggestion or comment is very well received.
Alexis.
Comment