View Single Post
Old 07-12-2014, 06:13 AM   #7
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

Quote:
Originally Posted by kmcarr View Post
Pengchy,

Again, you are showing us different reads from the FastQ file and from the BAM file. To best understand what is going on you really need to show us the FastQ record and the BAM record for the SAME read(s).

Do you know what the quality encoding was in your input FastQ? You really need to have this information before you start your analysis.

What version of Tophat are you using?

When you ran Tophat did you use a command line parameter to tell Tophat what q-score encoding was used for the input FastQ or did you leave it as default?
Hi kmcarr, thank you for the reply.

Actually, I have done my work in the steps:
1. mapped fastq of the reads with quality solexa 1.3, like:
Code:
@FCC22UBACXX:2:2112:11384:88500#ACACGCGG/1
ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT
+
a_aeeeeegggggiiiiWeghiighhhiiiihiiiiiiihhhiiihihi
the read was mapped by tophat2 with parameters --solexa1.3-quals

2. Then the output file "unmapped.bam" of tophat2 were converted to fastq format by bedtools-2.17.0/bin/bamToFastq, the output file of this read is:
Code:
@FCC22UBACXX:2:2112:11384:88500#ACACGCGG
ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT
+
B@BFFFFFHHHHHJJJJ8FHIJJHIIIJJJJIJJJJJJJIIIJJJIJIJ
the base quality characters of bamToFastq output is the same to the "samtools view".

3. The base quality is phred+33, the reads were mapped by tophat2 with the parameters:
Code:
/path/to/tophat2 --solexa-quals --library-type fr-unstranded -o /path/to/output/directory/ /genome/index /path/to/fastq.gz
The "samtools view " of the output bam file give the following base quality like
Code:
FCC22UBACXX:2:2112:11384:88500#ACACGCGG 0       AF024514.1      14      50      49M     *       0       0       ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT     %$%(((((*****++++"(*+++*+++++++++++++++++++++++++       AS:i:-4 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:MD:Z:17A1A29     YT:Z:UU NH:i:1
Here the read from the original fastq and bam file is same. I have checked many reads manually, the output bam file all have the similar quality characters.

I don't know this phenomenon is caused by Tophat2 or "samtools view".

I have correctly set the --solexa-quals for tophat2.

We can have a check for the first character of these reads:
the original: "a" == 97
the first mapping result: "B"== 66
the second mapping result: "%" == 37

We can got that the first transformation has minus the ASCII by 31 and the second by 29.

I guess the "samtools view" always transform the input bam ASCII by minus some value.
pengchy is offline   Reply With Quote