Seqanswers Leaderboard Ad

**Brugger** · 07-02-2010, 05:58 AM

I assume that you are looking in the qseq.txt files. The last column in that file is whether the read passed the chastity filter or not. So any line with a 0 in that column should be dropped.

do a: cat FILE | awk '$11 == 1' > NEWFILE

If you want to map the reads with bwa/bowtie/b* you need to recalculate the QV score, this is mentioned in multiple old threads at seqanswers.

**pratibhamani** · 07-04-2010, 10:12 PM

hi

want to know the same too..

**simonandrews** · 07-05-2010, 12:28 AM

If you're seeing this you're probably using one of the more recent Illumina pipelines. Illumina uses a quality value of 2 (which is what B decodes to) as a way to mark the point at which it believes a read to become unreliable. This only applies to strings of B which run to the end of the read. In these cases this isn't intended as a true estimate of the error rate, but rather more as a flag to suggest you don't use these bases.

Frankly it's a bit of a pain when collecting aggregate statistics about sequence qualities as it tends to skew the distribution of Phred scores.

**pratibhamani** · 07-05-2010, 08:13 PM

Thank you simonandrews..

This is what I wanted to know. Okay so if I use these bases in assembly, say de-novo, then will it lead to mis-assembly? Should I remove these reads before I proceed or is it Okay to use them?

**BENM** · 07-06-2010, 02:08 AM

Originally posted by ritzriya View Post

Hello everyone,

I have come across instances or articles stating that the reads with quality as only 'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB' are meant to be LOW quality as they are called read-quality indicators. They define that it is the 3' end of the read which must be removed or filtered to prevent mis-assembly.

Is this true or is that okay to ignore?

Also, If i want to quality filter my illumina sequence file then is there any available tool or any speicific feature to recognise bad quality reads? so that I can create a custom program to do so if not ready..

I know this might be a trivial question, but I am new to Illumina technology. Kindly help.

Thanks in advance!

Hi ritzriya,

I can give you an simple script for quality control, which can filter out the low quality sequence or N's reads, it due to the option you set:
-n N's content of the reads, default 0.5, i.e. if there are over 50% are N, it will be filtered.
-q the lowest average quality control, default 20, i.e. if the average quality is lower than 20, it will be filtered
-sd the standard deviation value of quality, default 10, it used to filtered the undulated and unstable quality reads

Hope it would be help.

Attached Files

solexa-qc.pl (1,015 Bytes, 119 views)

**pratibhamani** · 07-08-2010, 08:42 PM

Problem still unsolved!

Thank you BENM for the filtering script. It will surely be useful to quality filter when necessary.

But still my question remains the same- If I use these reads having quality of 'BBB' matter during assembly of reads - will it lead to misassembly if used or it won't make much of difference? That's all I want to know.

**BENM** · 07-08-2010, 09:04 PM

Originally posted by pratibhamani View Post

Thank you BENM for the filtering script. It will surely be useful to quality filter when necessary.

But still my question remains the same- If I use these reads having quality of 'BBB' matter during assembly of reads - will it lead to misassembly if used or it won't make much of difference? That's all I want to know.

It is due to Assembly tools you use.
But for Solexa reads, there is just lower than 1% error rates. So these reads with high error rates must be in lower coverage, then they still can be figured out by common tools like VELVET, although VELVET doesn't consider about the quality of reads.

**pratibhamani** · 07-09-2010, 11:14 PM

??

It is due to Assembly tools you use.
But for Solexa reads, there is just lower than 1% error rates. So these reads with high error rates must be in lower coverage, then they still can be figured out by common tools like VELVET, although VELVET doesn't consider about the quality of reads.

That's exactly why I am asking this question. Because VELVET does not take care of the quality of the reads, I will have to before it starts processing.

It is obvious if my input is incorrect or with errors, then my output will not be pleasing enough, no matter how many times I change my kmer, right?

I hope everyone got what I have explained above..

**BENM** · 07-10-2010, 06:44 AM

Originally posted by pratibhamani View Post

That's exactly why I am asking this question. Because VELVET does not take care of the quality of the reads, I will have to before it starts processing.

It is obvious if my input is incorrect or with errors, then my output will not be pleasing enough, no matter how many times I change my kmer, right?

I hope everyone got what I have explained above..

Hello pratibhamani,

If you have enough sequencing coverage, it is no need to worry about the error rate of sequencing quality. Because in de Bruijn graphs algorithm, it can deal with them by weight of different coverage. It attributes to NGS high throughput and high quality control technologies.

If you're still anxious about it, you can make a comparison between non-pre-error-correction and pre-error-correction, using a known genome sequencing project. The other way, you can use another tools, like SOAPdenovo(http://soap.genomics.org.cn/soapdenovo.html), it is the same algorithm as VELVET, but with error correction function.

And for kmer option of VELVET is estimated by your sequencing depth, no error rate, see below link:

Error: 404 | EMBL-EBI

http://www.ebi.ac.uk/~zerbino/velvet/hash_length_choice.html

And you can use VELVET contrib package--"contrib/VelvetOptimiser-2.1.0/" to find out the appropriate kmer set.

Hope it would be help.

**pratibhamani** · 07-12-2010, 12:56 AM

Thanks!

Yes BENM. I do have a good coverage of sequencing in my case, so I need not worry about these reads. Fine.

I will surely have a look at the links you have sent. Thanks for the information. It will help me surely!

**palmgenome** · 01-27-2011, 08:03 AM

Very helpful threads. And thanks BENM for the script!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

BBBBBB read quality

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News