SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cuffdiff error - duplicate GFF ID encountered? drosoform RNA Sequencing 11 02-23-2015 07:17 AM
OrthoMCL duplicate entry error flipwell Bioinformatics 6 09-24-2013 07:48 AM
How to map genes to pathways and show results graphiclly? czc Bioinformatics 4 11-21-2012 05:06 PM
resuming stopped process due to error. xplorgenes Bioinformatics 2 05-12-2010 08:16 AM
How to estimate error rate for short-reads and base-calling duplicate? zchou Illumina/Solexa 10 01-20-2010 09:13 AM

Reply
 
Thread Tools
Old 04-03-2013, 01:22 AM   #1
maimaiti2008
Member
 
Location: university

Join Date: Apr 2013
Posts: 16
Default How likely are 2 tandem duplicate genes to show same sequences due to assembly error?

If the the two duplicated regions are similar in sequence (90% identity) and very closely localized to each other (100bp distant), how likely do they show identical sequences due to assembly error? Thanks very much!
maimaiti2008 is offline   Reply With Quote
Old 04-03-2013, 09:03 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Quote:
Originally Posted by maimaiti2008 View Post
If the the two duplicated regions are similar in sequence (90% identity) and very closely localized to each other (100bp distant), how likely do they show identical sequences due to assembly error? Thanks very much!

It would probably depend on the assembler you use, and on the length of your reads. Different assemblers give different results. If you have long reads or paired reads that can span the duplicated regions, then the assemblers are less likely to make an 'error'.
mastal is offline   Reply With Quote
Old 04-03-2013, 09:47 AM   #3
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

To me, 90% identity doesn't sound like too much for de novo assemblers to tell the difference.

If you're using a k-mer based approach, you have to have an exact k-1 identity to join neighboring k-mers. So, if you have 1/10 bases different and you're assembling at k=35, your kmers covering these genes will have, on average, 3.5 nucleotide differences. So unless you have regions of near 100% identity, where the assembler might think the more divergent region between them is just a bubble, you should be fine. And even then, paired end reads or longer kmers would likely take that problem away entirely.
Wallysb01 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:19 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO