View Single Post
Old 07-12-2014, 06:32 AM   #8
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by pengchy View Post
Hi kmcarr, thank you for the reply.

Actually, I have done my work in the steps:
1. mapped fastq of the reads with quality solexa 1.3, like:
Code:
@FCC22UBACXX:2:2112:11384:88500#ACACGCGG/1
ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT
+
a_aeeeeegggggiiiiWeghiighhhiiiihiiiiiiihhhiiihihi
the read was mapped by tophat2 with parameters --solexa1.3-quals

2. Then the output file "unmapped.bam" of tophat2 were converted to fastq format by bedtools-2.17.0/bin/bamToFastq, the output file of this read is:
Code:
@FCC22UBACXX:2:2112:11384:88500#ACACGCGG
ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT
+
B@BFFFFFHHHHHJJJJ8FHIJJHIIIJJJJIJJJJJJJIIIJJJIJIJ
the base quality characters of bamToFastq output is the same to the "samtools view".

3. The base quality is phred+33, the reads were mapped by tophat2 with the parameters:
Code:
/path/to/tophat2 --solexa-quals --library-type fr-unstranded -o /path/to/output/directory/ /genome/index /path/to/fastq.gz
The "samtools view " of the output bam file give the following base quality like
Code:
FCC22UBACXX:2:2112:11384:88500#ACACGCGG 0       AF024514.1      14      50      49M     *       0       0       ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT     %$%(((((*****++++"(*+++*+++++++++++++++++++++++++       AS:i:-4 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:MD:Z:17A1A29     YT:Z:UU NH:i:1
Here the read from the original fastq and bam file is same. I have checked many reads manually, the output bam file all have the similar quality characters.

I don't know this phenomenon is caused by Tophat2 or "samtools view".

I have correctly set the --solexa-quals for tophat2 (No, this is incorrect).

We can have a check for the first character of these reads:
the original: "a" == 97
the first mapping result: "B"== 66
the second mapping result: "%" == 37

We can got that the first transformation has minus the ASCII by 31 and the second by 29.

I guess the "samtools view" always transform the input bam ASCII by minus some value.
So if I am following your logic you:

1. Map your original reads to some reference with Tophat2.

2. Extract the unmapped reads from unmapped.bam to a new FastQ

3. Map the unmapped reads to some reference with Tophat2. I am going to assume that this mapping is to some different reference since trying to align unmapped reads back to the same reference wouldn't make sense.

I have highlighted the problem in red above. You correctly state "The base quality is phred+33", but then you incorrectly set "--solexa-quals" in your tophat command, which is telling tophat that your FastQ file is phred+64. Drop "--solexa-quals" from your second tophat alignment (step 3). Phred+33 is the default for tophat2 so there is no command line option used to specify it.
kmcarr is offline   Reply With Quote