Seqanswers Leaderboard Ad

**Chipper** · 04-23-2008, 10:14 AM

I am interrested in this also, have you compared quality scores for the misalignad bases at different positions in the reads?

I guess the sequences to use depends on the application, if you are only counting aligned positions (ChIP-seq, transcripome etc) it doesn't matter if the last part is crap as long as the alignment is correct.

**apfejes** · 04-23-2008, 11:03 AM

We often run multiple Eland alignments, using 32,29, 26 and 23 (or some variation of the above), and then identify the longest read where we get a unique match. This somewhat ignores your question.

As chipper pointed out, this is really a strategy that's applicable for chip-seq or possibly transcriptome data. I wouldn't apply it to all analyses.

There's a formula for coverting the quality score to probability, though, which you could use to figure out what probability you're comfortable with.

I believe it is P =1 / (1 + 10^(-Q/10))

You might want to confirm that, however, before you do anything with it. (I only obtained the equation second hand.)

**chris** · 04-24-2008, 12:31 AM

Thanks for the replies.

Chipper. No I haven't looked at the scores for misaligned bases as in our analysis so far we only looked at two mismatches. It appears are the poor quality is concentrated at the 3' by up to 6nt so our analysis doesn't find matches with these.

apfejes. Is the formula you mean the same as for converting Solexa scores to Phred scores as shown here: http://maq.sourceforge.net/fastq.shtml

I think the best course of action will be to test various cut-offs and see what I get. I'll post back here if I get anything useful.
Cheers.

**apfejes** · 04-24-2008, 08:06 AM

chris,

The formula on that page is related, but not identical. (Obviously, since they're both performing very similar transformations.) However, I was referring to the older format prb files, which contain values between -40 and 40, whereas the version you've indicated is used in the new eland pipeline. (I can't recall version numbers for them off hand.)

If your probabilities are displayed in a format consistent with what's on that page, however, then it's most likely the correct format to use. If you are using the old-style prb files, where each base is represented by four numbers, then the version I've written above is more likely to be correct.

Cheers,
Anthony

**chris** · 04-25-2008, 12:14 AM

Hi Anthony,

The scores I have range from -5 to 40 which I believe is the current Solexa Genome Analyser quality score range, so I guess I'll stick with the 'new' formula.
Thanks,

Chris.

**chris** · 04-28-2008, 07:52 AM

Right. A quick looksee of the raw Solexa quality scores at a variety of cut-offs gives:

Code:

Q Cut-off    Frequency
0            584079
5            641244
10           406655
15           179174
20            63783
25            20300
30             6389
35             3454

The frequency counts are for the number of sequence reads whose quality scores are *all* above the cut-off. Each sequence is only counted once and binned at the highest cut-off which it satisfies.

I'm a bit worried as the majority of the data has quality score of <10. This is equivalent to a Phred score of 10.4 or 90% accuracy

Does anyone else get this kind of quality or is this really a bad run?

**bioinfosm** · 04-28-2008, 09:01 AM

Chris,
Is this all 8 lanes data? Did you convert back the prb from Solexa to Q value? by *all* you mean the entire 36bp read?
Let me know and I can get similar quality scores for the data.

**chris** · 04-29-2008, 12:43 AM

I'm not exactly sure. This data is kind of second hand and from a file called 's_3_sequence.txt'. There are 2.2M reads in the form:

Code:

HWI-EAS111_2:3:17:156:119:AGTGAGGTAGTAGATTGTATAGTTTCGTATGCC:23 40 40 40 40 40 40 40 26 40 40 40 40 21 40 40 40 40 40 40 40 33 40 40 40 31 29 40 11 38 7 35 22

And these are all 33bp reads.
Thanks for your help bioinfosm

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Quality score threshold?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News