SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Threshold quality score to determine the quality read of ILLUMINA reads problem edge Illumina/Solexa 35 11-02-2015 10:31 AM
Poor read quality in GAII rakumar Illumina/Solexa 6 04-19-2011 11:44 PM
Illumina second read quality drop wouter Illumina/Solexa 4 11-01-2010 05:40 PM
Threshold quality score to determine the quality read of ILLUMINA reads problem edge General 1 09-13-2010 02:22 PM
about read quality biocc Illumina/Solexa 0 06-08-2010 05:18 AM

Reply
 
Thread Tools
Old 06-30-2010, 10:07 PM   #1
ritzriya
Member
 
Location: Canada

Join Date: Jun 2010
Posts: 49
Question BBBBBB read quality

Hello everyone,

I have come across instances or articles stating that the reads with quality as only 'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB' are meant to be LOW quality as they are called read-quality indicators. They define that it is the 3' end of the read which must be removed or filtered to prevent mis-assembly.

Is this true or is that okay to ignore?

Also, If i want to quality filter my illumina sequence file then is there any available tool or any speicific feature to recognise bad quality reads? so that I can create a custom program to do so if not ready..

I know this might be a trivial question, but I am new to Illumina technology. Kindly help.

Thanks in advance!
ritzriya is offline   Reply With Quote
Old 07-02-2010, 05:58 AM   #2
Brugger
Member
 
Location: Cambridge, UK

Join Date: Mar 2010
Posts: 21
Default

I assume that you are looking in the qseq.txt files. The last column in that file is whether the read passed the chastity filter or not. So any line with a 0 in that column should be dropped.

do a: cat FILE | awk '$11 == 1' > NEWFILE

If you want to map the reads with bwa/bowtie/b* you need to recalculate the QV score, this is mentioned in multiple old threads at seqanswers.
Brugger is offline   Reply With Quote
Old 07-04-2010, 10:12 PM   #3
pratibhamani
Junior Member
 
Location: UK

Join Date: Oct 2009
Posts: 6
Question hi

want to know the same too..

Last edited by pratibhamani; 07-04-2010 at 10:19 PM.
pratibhamani is offline   Reply With Quote
Old 07-05-2010, 12:28 AM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

If you're seeing this you're probably using one of the more recent Illumina pipelines. Illumina uses a quality value of 2 (which is what B decodes to) as a way to mark the point at which it believes a read to become unreliable. This only applies to strings of B which run to the end of the read. In these cases this isn't intended as a true estimate of the error rate, but rather more as a flag to suggest you don't use these bases.

Frankly it's a bit of a pain when collecting aggregate statistics about sequence qualities as it tends to skew the distribution of Phred scores.
simonandrews is offline   Reply With Quote
Old 07-05-2010, 08:13 PM   #5
pratibhamani
Junior Member
 
Location: UK

Join Date: Oct 2009
Posts: 6
Question

Thank you simonandrews..

This is what I wanted to know. Okay so if I use these bases in assembly, say de-novo, then will it lead to mis-assembly? Should I remove these reads before I proceed or is it Okay to use them?
pratibhamani is offline   Reply With Quote
Old 07-06-2010, 02:08 AM   #6
BENM
Member
 
Location: PRC

Join Date: May 2009
Posts: 33
Default

Quote:
Originally Posted by ritzriya View Post
Hello everyone,

I have come across instances or articles stating that the reads with quality as only 'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB' are meant to be LOW quality as they are called read-quality indicators. They define that it is the 3' end of the read which must be removed or filtered to prevent mis-assembly.

Is this true or is that okay to ignore?

Also, If i want to quality filter my illumina sequence file then is there any available tool or any speicific feature to recognise bad quality reads? so that I can create a custom program to do so if not ready..

I know this might be a trivial question, but I am new to Illumina technology. Kindly help.

Thanks in advance!
Hi ritzriya,

I can give you an simple script for quality control, which can filter out the low quality sequence or N's reads, it due to the option you set:
-n N's content of the reads, default 0.5, i.e. if there are over 50% are N, it will be filtered.
-q the lowest average quality control, default 20, i.e. if the average quality is lower than 20, it will be filtered
-sd the standard deviation value of quality, default 10, it used to filtered the undulated and unstable quality reads


Hope it would be help.
Attached Files
File Type: pl solexa-qc.pl (1,015 Bytes, 93 views)

Last edited by BENM; 07-08-2010 at 08:48 PM.
BENM is offline   Reply With Quote
Old 07-08-2010, 08:42 PM   #7
pratibhamani
Junior Member
 
Location: UK

Join Date: Oct 2009
Posts: 6
Question Problem still unsolved!

Thank you BENM for the filtering script. It will surely be useful to quality filter when necessary.

But still my question remains the same- If I use these reads having quality of 'BBB' matter during assembly of reads - will it lead to misassembly if used or it won't make much of difference? That's all I want to know.
pratibhamani is offline   Reply With Quote
Old 07-08-2010, 09:04 PM   #8
BENM
Member
 
Location: PRC

Join Date: May 2009
Posts: 33
Default

Quote:
Originally Posted by pratibhamani View Post
Thank you BENM for the filtering script. It will surely be useful to quality filter when necessary.

But still my question remains the same- If I use these reads having quality of 'BBB' matter during assembly of reads - will it lead to misassembly if used or it won't make much of difference? That's all I want to know.

It is due to Assembly tools you use.
But for Solexa reads, there is just lower than 1% error rates. So these reads with high error rates must be in lower coverage, then they still can be figured out by common tools like VELVET, although VELVET doesn't consider about the quality of reads.
BENM is offline   Reply With Quote
Old 07-09-2010, 11:14 PM   #9
pratibhamani
Junior Member
 
Location: UK

Join Date: Oct 2009
Posts: 6
Question ??

Quote:
It is due to Assembly tools you use.
But for Solexa reads, there is just lower than 1% error rates. So these reads with high error rates must be in lower coverage, then they still can be figured out by common tools like VELVET, although VELVET doesn't consider about the quality of reads.
That's exactly why I am asking this question. Because VELVET does not take care of the quality of the reads, I will have to before it starts processing.

It is obvious if my input is incorrect or with errors, then my output will not be pleasing enough, no matter how many times I change my kmer, right?

I hope everyone got what I have explained above..
pratibhamani is offline   Reply With Quote
Old 07-10-2010, 06:44 AM   #10
BENM
Member
 
Location: PRC

Join Date: May 2009
Posts: 33
Default

Quote:
Originally Posted by pratibhamani View Post
That's exactly why I am asking this question. Because VELVET does not take care of the quality of the reads, I will have to before it starts processing.

It is obvious if my input is incorrect or with errors, then my output will not be pleasing enough, no matter how many times I change my kmer, right?

I hope everyone got what I have explained above..
Hello pratibhamani,

If you have enough sequencing coverage, it is no need to worry about the error rate of sequencing quality. Because in de Bruijn graphs algorithm, it can deal with them by weight of different coverage. It attributes to NGS high throughput and high quality control technologies.

If you're still anxious about it, you can make a comparison between non-pre-error-correction and pre-error-correction, using a known genome sequencing project. The other way, you can use another tools, like SOAPdenovo(http://soap.genomics.org.cn/soapdenovo.html), it is the same algorithm as VELVET, but with error correction function.

And for kmer option of VELVET is estimated by your sequencing depth, no error rate, see below link:
http://www.ebi.ac.uk/~zerbino/velvet...th_choice.html
And you can use VELVET contrib package--"contrib/VelvetOptimiser-2.1.0/" to find out the appropriate kmer set.

Hope it would be help.
BENM is offline   Reply With Quote
Old 07-12-2010, 12:56 AM   #11
pratibhamani
Junior Member
 
Location: UK

Join Date: Oct 2009
Posts: 6
Question Thanks!

Yes BENM. I do have a good coverage of sequencing in my case, so I need not worry about these reads. Fine.

I will surely have a look at the links you have sent. Thanks for the information. It will help me surely!
pratibhamani is offline   Reply With Quote
Old 01-27-2011, 07:03 AM   #12
palmgenome
Junior Member
 
Location: London

Join Date: Jan 2011
Posts: 1
Default

Very helpful threads. And thanks BENM for the script!
palmgenome is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:26 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO