Seqanswers Leaderboard Ad

**mrawlins** · 08-16-2010, 01:03 PM

I count about 21 bases that are not N's in that sequence. You may not have enough bases for a unique match to your genome (depends on the genome size). A score of '%' is the fifth-from-lowest score possible (on Phred-33), which makes it likely either a 5 or a -1.
Personally I would throw this read out. because most of the bases aren't called, and none of them have reasonable scores.

**days369** · 08-16-2010, 01:29 PM

Hi mrawlins,

THanks for answering. Do you have any idea about the common quality score people use to trim sequences?

Originally posted by mrawlins View Post

I count about 21 bases that are not N's in that sequence. You may not have enough bases for a unique match to your genome (depends on the genome size). A score of '%' is the fifth-from-lowest score possible (on Phred-33), which makes it likely either a 5 or a -1.
Personally I would throw this read out. because most of the bases aren't called, and none of them have reasonable scores.

**mrawlins** · 08-16-2010, 02:12 PM

I don't know what scores people would use to trim/reject reads. We use SOLiD machines, so the calling is done differently than in Solexa, and the scores are different. For one thing, we never see N's. I would probably throw out any read where there wasn't at least 20 contiguous base calls and 25 base calls total (though I may expect at least 25 contiguous base calls to be safe). That makes it unlikely to match to the genome by random chance, so if the low quality reads are mis-called they will likely not map to the genome.

**Torst** · 08-16-2010, 08:19 PM

Originally posted by mrawlins View Post

I would probably throw out any read where there wasn't at least 20 contiguous base calls and 25 base calls total (though I may expect at least 25 contiguous base calls to be safe). That makes it unlikely to match to the genome by random chance, so if the low quality reads are mis-called they will likely not map to the genome.

This is reasonable BUT you have to make sure your software can actually handle ambiguous/unknown bases like 'N. For example, some fast read aligners will NOT align the read if it has an 'N', and some assembly software ignores them or converts them to 'A'.

We throw away all our reads with any N in them at all after trimming from 3' end. This usually only rejects about 1% to 5% of the total.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Do I need to trim the sequences like this?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News