View Single Post
Old 07-11-2014, 04:20 AM   #3
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by pengchy View Post
fastq
Code:
@FCC22UBACXX:2:1101:1463:2233#ACACGCGG/1
TGGTCTTCTAAATATTGTCTGAGGGCTCCGTAAGCCTGTGTTTTAGCAC
+
___acdeegee[efghgbbhgfdehhhfffffhZa^fggbaefhhghhh
after tophat2 mapping, the fastq base quality was changed into:
Code:
FCC22UBACXX:2:1115:5744:8409#ACACGCGG   256     EQ110773        344     3       49M     *       0       0       ATAAAACCCGACAAAAGCTGTTCGGAAAGCTCTACGGGCTCGACCGGCA     CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJHFDDD       AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:MD:Z:49  YT:Z:UU NH:i:2  CC:Z:EQ123351   CP:i:3745       HI:i:0
also, if the input was solexa-quality, the base quality in the output bam will be 0-40, which will influence the downstream analysis.

Any idea?
Thank you.
Tophat has done nothing to change the encoding. Quality scores in BAM files are stored as numeric values, not ASCII characters like they are in FASTQ files. When you view the contents of a BAM file with samtools view it will convert those numbers to ASCII characters for display and it will always use the Sanger Phred+33 encoding for display. It is only important that when you are using FASTQ files as input you correctly identify what encoding method is used in that FASTQ, then the correct q-score numbers will be stored in the BAM and everything downstream will work fine.
kmcarr is offline   Reply With Quote