SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Velvet Assembler: expected coverage versus estimated coverage versus effective covera DMCH Bioinformatics 1 11-30-2011 04:21 AM
Per Base Error Rate for Next-Gen Sequencing BertieWooster Bioinformatics 2 12-13-2009 10:03 PM
Next Gen Sequencing Training Course maojn7488 Bioinformatics 12 09-29-2009 06:00 AM
PubMed: Advantages of next-generation sequencing versus the microarray in epigenetic Newsbot! Literature Watch 0 06-19-2009 05:00 AM
next-gen sequencing in diagnostics brjordan General 0 04-07-2009 05:08 AM

Reply
 
Thread Tools
Old 11-19-2008, 04:32 AM   #1
foolishbrat
Member
 
Location: South East Asia

Join Date: Nov 2008
Posts: 44
Default Next Gen versus SAGE sequencing error

Dear all,

What's the primary difference between Next Gen Sequencing
with SAGE in terms of sequencing error?

In particular the errors affecting the tag counts.
foolishbrat is offline   Reply With Quote
Old 11-19-2008, 09:37 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by foolishbrat View Post
Dear all,

What's the primary difference between Next Gen Sequencing
with SAGE in terms of sequencing error?

In particular the errors affecting the tag counts.
I think that you have rephrase your question to be more specific. I presume you are asking about the difference between doing expressing profiling with Next Gen sequencing versus doing profiling via SAGE with classical Sanger sequencing.

Hands down a single Sanger read will be more accurate than a Next Gen read. That is one answer.

However since Next gen platforms have many more reads per cost than Sanger you can
sequence to a larger depth. If you are willing to throw away any next gen reads that do not have significant depth then your accuracy will go way up. So that may be your answer.

The NextGen technology that you use will greatly influence the answer. I suspect that the SOLiD, despite not being the most accurate platform on a per-read basis, will be a good expression profiling platform simply due to the large quantity of reads at a low cost.

Unfortunately I know of no papers that cover this question. I am not even sure if it is a question that someone would want to go through the effort of answering.
westerman is offline   Reply With Quote
Old 11-21-2008, 01:33 PM   #3
Josliu
Junior Member
 
Location: State College, PA

Join Date: Nov 2008
Posts: 4
Arrow

There are a few types of errors in the sequence tag counts for Next Gen sequence.
1. The sequence basecall errors are high, .5%-1%. When we count the 17 base tags for long SAGE, we may have up to 17% errors or higher. Since different systems may have different error profiles, we may have difficulty to compare the results from one lab to another taken from different systems.
2. The low abundant gene tags may be affected by the high abundant gene tags with tag sequences differing by 1 bps, since the expression ratio difference may be in seven orders of magnitude.
3. We also have shot noise sqrt(N), N being the number of the tag. This will be problem to low abundant genes.
4. Two or more genes may share the same tag. We have no way to tell how much is from one gene and how much is from the other gene(s).
5. One gene might have two tags because of multiple isoforms. It is challenge to decide how to report them.
6. Many gene tags are short then 17 bps such as 12 bps. We will have high errors to those genes in counting the tags.
7. The errors may also come from the different channel locations in the flowcell.
8. The enzyme efficiency might be dependent on the sequence contents.
You may use NextGENe software to handle such problems. Generally the error will be minimum if the tag reach 500 counts.


josliu
Josliu is offline   Reply With Quote
Old 11-23-2008, 05:21 AM   #4
foolishbrat
Member
 
Location: South East Asia

Join Date: Nov 2008
Posts: 44
Default

Quote:
Originally Posted by Josliu View Post
There are a few types of errors in the sequence tag counts for Next Gen sequence.
1. The sequence basecall errors are high, .5%-1%. When we count the 17 base tags for long SAGE, we may have up to 17% errors or higher. Since different systems may have different error profiles, we may have difficulty to compare the results from one lab to another taken from different systems.
2. The low abundant gene tags may be affected by the high abundant gene tags with tag sequences differing by 1 bps, since the expression ratio difference may be in seven orders of magnitude.
3. We also have shot noise sqrt(N), N being the number of the tag. This will be problem to low abundant genes.
4. Two or more genes may share the same tag. We have no way to tell how much is from one gene and how much is from the other gene(s).
5. One gene might have two tags because of multiple isoforms. It is challenge to decide how to report them.
6. Many gene tags are short then 17 bps such as 12 bps. We will have high errors to those genes in counting the tags.
7. The errors may also come from the different channel locations in the flowcell.
8. The enzyme efficiency might be dependent on the sequence contents.
You may use NextGENe software to handle such problems. Generally the error will be minimum if the tag reach 500 counts.


josliu
Thanks so much for the reply. This is truly invaluable.

Do you know any existing program/papers that does correction
on on such tag counts error?
foolishbrat is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:01 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO