Seqanswers Leaderboard Ad

**mastal** · 04-03-2013, 08:03 AM

Originally posted by maimaiti2008 View Post

If the the two duplicated regions are similar in sequence (90% identity) and very closely localized to each other (100bp distant), how likely do they show identical sequences due to assembly error? Thanks very much!

It would probably depend on the assembler you use, and on the length of your reads. Different assemblers give different results. If you have long reads or paired reads that can span the duplicated regions, then the assemblers are less likely to make an 'error'.

**Wallysb01** · 04-03-2013, 08:47 AM

To me, 90% identity doesn't sound like too much for de novo assemblers to tell the difference.

If you're using a k-mer based approach, you have to have an exact k-1 identity to join neighboring k-mers. So, if you have 1/10 bases different and you're assembling at k=35, your kmers covering these genes will have, on average, 3.5 nucleotide differences. So unless you have regions of near 100% identity, where the assembler might think the more divergent region between them is just a bubble, you should be fine. And even then, paired end reads or longer kmers would likely take that problem away entirely.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

How likely are 2 tandem duplicate genes to show same sequences due to assembly error?

Comment

Comment

Latest Articles

ad_right_rmr

News