![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
FASTQC guessing wrong quality encoding | PFS | Bioinformatics | 14 | 05-21-2014 08:41 AM |
base quality encoding changed after "bwa samse" command | tomjan | Bioinformatics | 4 | 02-26-2013 01:23 PM |
mapping quality tophat2 - always 255 | vbernard | Bioinformatics | 1 | 07-03-2012 02:51 AM |
Different MAPPING QUALITY/PER-BASE QUALITY SCORE | m_elena_bioinfo | Bioinformatics | 2 | 09-02-2010 10:00 AM |
dual base encoding and de novo assembly | pmiguel | SOLiD | 11 | 12-23-2009 06:46 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: China Join Date: Feb 2009
Posts: 116
|
![]()
fastq
Code:
@FCC22UBACXX:2:1101:1463:2233#ACACGCGG/1 TGGTCTTCTAAATATTGTCTGAGGGCTCCGTAAGCCTGTGTTTTAGCAC + ___acdeegee[efghgbbhgfdehhhfffffhZa^fggbaefhhghhh Code:
FCC22UBACXX:2:1115:5744:8409#ACACGCGG 256 EQ110773 344 3 49M * 0 0 ATAAAACCCGACAAAAGCTGTTCGGAAAGCTCTACGGGCTCGACCGGCA CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJHFDDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:MD:Z:49 YT:Z:UU NH:i:2 CC:Z:EQ123351 CP:i:3745 HI:i:0 Any idea? Thank you. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Montreal Join Date: May 2013
Posts: 367
|
![]()
Your argument that TopHat has changed the quality scores would be more convincing if you posted the same sequence before and after alignment with TopHat.
You've posted two different sequences, with obviously different quality scores. Can you post the same sequence, before and after alignment with TopHat? TopHat2 does have optional arguments to handle quality scores in the Solexa scale, incidentally. --solexa-quals Use the Solexa scale for quality values in FASTQ files. --solexa1.3-quals As of the Illumina GA pipeline version 1.3, quality scores are encoded in Phred-scaled base-64. Use this option for FASTQ files from pipeline 1.3 or later. |
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Montreal Join Date: May 2013
Posts: 367
|
![]()
Very informative answer kmcarr.
Much better than mine. ![]() |
![]() |
![]() |
![]() |
#5 | |
Senior Member
Location: China Join Date: Feb 2009
Posts: 116
|
![]() Quote:
More question: When I use the fastq with the base quality like this: Code:
@FCC22UBACXX:2:1101:1463:2233#ACACGCGG TGGTCTTCTAAATATTGTCTGAGGGCTCCGTAAGCCTGTGTTTTAGCAC + @@@BDEFFHFF<FGHIHCCIHGEFIIIGGGGGI;B?GHHCBFGIIHIII Code:
FCC22UBACXX:2:2112:11384:88500#ACACGCGG 0 AF024514.1 14 50 49M * 0 0 ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT %$%(((((*****++++"(*+++*+++++++++++++++++++++++++ AS:i:-4 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:MD:Z:17A1A29 YT:Z:UU NH:i:1 FCC22UBACXX:2:1313:19247:88511#ACACGCGG 0 AF024514.1 15 50 49M * 0 0 TATTGCTTCTATTTCGATTTTGTTCAAGCGTTGACCGTTGCAGGCGCTT $$$(((((&((&(*))'%(**+&(*++*&)''')&)+++%()))'&()% AS:i:-2 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:MD:Z:18A30 YT:Z:UU NH:i:1 |
|
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
Again, you are showing us different reads from the FastQ file and from the BAM file. To best understand what is going on you really need to show us the FastQ record and the BAM record for the SAME read(s). Do you know what the quality encoding was in your input FastQ? You really need to have this information before you start your analysis. What version of Tophat are you using? When you ran Tophat did you use a command line parameter to tell Tophat what q-score encoding was used for the input FastQ or did you leave it as default? |
|
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: China Join Date: Feb 2009
Posts: 116
|
![]() Quote:
Actually, I have done my work in the steps: 1. mapped fastq of the reads with quality solexa 1.3, like: Code:
@FCC22UBACXX:2:2112:11384:88500#ACACGCGG/1 ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT + a_aeeeeegggggiiiiWeghiighhhiiiihiiiiiiihhhiiihihi 2. Then the output file "unmapped.bam" of tophat2 were converted to fastq format by bedtools-2.17.0/bin/bamToFastq, the output file of this read is: Code:
@FCC22UBACXX:2:2112:11384:88500#ACACGCGG ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT + B@BFFFFFHHHHHJJJJ8FHIJJHIIIJJJJIJJJJJJJIIIJJJIJIJ 3. The base quality is phred+33, the reads were mapped by tophat2 with the parameters: Code:
/path/to/tophat2 --solexa-quals --library-type fr-unstranded -o /path/to/output/directory/ /genome/index /path/to/fastq.gz Code:
FCC22UBACXX:2:2112:11384:88500#ACACGCGG 0 AF024514.1 14 50 49M * 0 0 ATATTGCTTCTATTTCGGTTTTGTTCAAGCGTTGACCGTTGCAGGCGCT %$%(((((*****++++"(*+++*+++++++++++++++++++++++++ AS:i:-4 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:MD:Z:17A1A29 YT:Z:UU NH:i:1 I don't know this phenomenon is caused by Tophat2 or "samtools view". I have correctly set the --solexa-quals for tophat2. We can have a check for the first character of these reads: the original: "a" == 97 the first mapping result: "B"== 66 the second mapping result: "%" == 37 We can got that the first transformation has minus the ASCII by 31 and the second by 29. I guess the "samtools view" always transform the input bam ASCII by minus some value. |
|
![]() |
![]() |
![]() |
#8 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
1. Map your original reads to some reference with Tophat2. 2. Extract the unmapped reads from unmapped.bam to a new FastQ 3. Map the unmapped reads to some reference with Tophat2. I am going to assume that this mapping is to some different reference since trying to align unmapped reads back to the same reference wouldn't make sense. I have highlighted the problem in red above. You correctly state "The base quality is phred+33", but then you incorrectly set "--solexa-quals" in your tophat command, which is telling tophat that your FastQ file is phred+64. Drop "--solexa-quals" from your second tophat alignment (step 3). Phred+33 is the default for tophat2 so there is no command line option used to specify it. |
|
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: China Join Date: Feb 2009
Posts: 116
|
![]() Quote:
Your suggestion is very useful and help me a lot. Best, Pengcheng |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|