![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Using quality filtered fastq files with TopHat | bob-loblaw | Bioinformatics | 2 | 11-28-2012 06:44 AM |
Sort fastq files in order of quality? | naragam | Bioinformatics | 6 | 07-02-2012 05:56 AM |
BWA mapping fastq files with Illumina quality | maricu | Bioinformatics | 3 | 11-19-2010 12:18 PM |
Can anyone make sense of the quality scores in the qseq.txt files? | TylerBackman | Bioinformatics | 2 | 04-29-2009 10:23 AM |
quality scores vs prb files | Leighton | Illumina/Solexa | 7 | 10-16-2008 02:58 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Seattle Join Date: Jul 2011
Posts: 98
|
![]()
I extracted some fastq files from sra files. Here are some lines from one of them:
Code:
@SRR034473.2 X8097_104:6:1:881:909 length=39 TCAAAAAATGAAGAAGAAGAAAAAAATGAAAAGGGTGCA +SRR034473.2 X8097_104:6:1:881:909 length=39 CC<7C<<CCC<<C<7C<?CC<<,;,7CC6:?0??:(??. @SRR034473.3 X8097_104:6:1:900:876 length=39 TGAAGTTCTTGTGGTTCAACCAAGTGTATTGCCAGTACT +SRR034473.3 X8097_104:6:1:900:876 length=39 C<?<CCCC77CCC?4?C<4C<<$C,C47?C<??7*<(44 @SRR034473.4 X8097_104:6:1:905:908 length=39 TTGATGTGACTTGAAGGCTTCATCTCCTTTTTAGTGATT +SRR034473.4 X8097_104:6:1:905:908 length=39 CCCCCCCCACCCA??CAACAA-CCCA?ACCAA6<A7+?? 1. Why are there no #s or Bs? 2. How can I figure out what scoring system was used here? 3. With what I consider a more normal fastq file (with lines like those pasted below), what is the best way to figure out what scoring system is being used? Code:
@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 Thanks. Eric |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
1. Maybe they quality controlled prior to uploading?
2. Look for numbers in the quality score line. If it has numbers, it's Phred+33 3. As above, look for numbers in the quality score. You could look for symbols too, but I always forget which ones come before the capital letters in ASCII and which ones are between the capital and lower case letters. Phred+64 and Solexa scores will look similar, except the latter can contain =. |
![]() |
![]() |
![]() |
#3 | ||
Member
Location: US Join Date: Sep 2012
Posts: 91
|
![]()
I'm sure you saw this on the SRA website, but for reference
Quote:
(http://www.bioinformatics.babraham.a...ojects/fastqc/) Edit: And http://maq.sourceforge.net/fastq.shtml Quote:
Last edited by winsettz; 02-01-2013 at 08:29 AM. |
||
![]() |
![]() |
![]() |
#4 |
Member
Location: Seattle Join Date: Jul 2011
Posts: 98
|
![]()
Thanks very much for the help. These suggestions are really useful.
Best wishes, Eric |
![]() |
![]() |
![]() |
Thread Tools | |
|
|