SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sequence transposon flanking region Akira Sample Prep / Library Generation 8 03-18-2012 07:53 AM
PubMed: A global clustering algorithm to identify long intergenic non-coding RNA - wi Newsbot! Literature Watch 0 10-08-2011 03:00 AM
PubMed: CLOTU: an online pipeline for processing and clustering of 454 amplicon reads Newsbot! Literature Watch 0 09-20-2011 03:00 AM
clustering paired-end reads rwenang Bioinformatics 2 02-06-2011 08:15 PM
clustering short reads lpantano Bioinformatics 2 02-02-2010 06:56 AM

Reply
 
Thread Tools
Old 01-30-2012, 06:32 PM   #1
Retro
Member
 
Location: Pennsylvania

Join Date: Apr 2011
Posts: 27
Default clustering algorithm for single reads from transposon integrations

We have Ion Torrent reads from retrovirus (transposon) integration sites in unsequenced genome and we need to cluster them by sequence identity. The first fifty bases of each read is always the transposon end and the rest is basically random piece of genomic DNA that flanks the insertion. We need to collapse or cluster the reads from each unique integration site together. Currently we use de novo assembly algorithms, but those perform poorely. We need to relax the stringency of alignment because of the sequencing errors, and then de novo assembly joins artificially clusters together. Our clusters should have length of only one read.

Would anybody know of suitable algorithm to create these single read clusters?
Retro is offline   Reply With Quote
Old 01-31-2012, 09:01 AM   #2
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Quote:
Originally Posted by Retro View Post
We have Ion Torrent reads from retrovirus (transposon) integration sites in unsequenced genome and we need to cluster them by sequence identity. The first fifty bases of each read is always the transposon end and the rest is basically random piece of genomic DNA that flanks the insertion. We need to collapse or cluster the reads from each unique integration site together. Currently we use de novo assembly algorithms, but those perform poorely. We need to relax the stringency of alignment because of the sequencing errors, and then de novo assembly joins artificially clusters together. Our clusters should have length of only one read.

Would anybody know of suitable algorithm to create these single read clusters?
As I was preparing a response it became less clear exactly what you are trying to achieve. When you say that you want to relax the stringency of alignment associated with assembly and use a clustering approach, that makes since. When you say that clusters should contain one read, that seems completely in conflict with the previous statement. Could you clarify your post?
SES is offline   Reply With Quote
Old 01-31-2012, 09:16 AM   #3
Retro
Member
 
Location: Pennsylvania

Join Date: Apr 2011
Posts: 27
Default

Thanks for your response. The clusters should have a length of one read. They can contain for example 50 reads, but all reads start at position 1 ("left side" in aligned cluster). The reads in a cluster might differ in length based on the initial fragmentation.

To make it more difficult, our reads come from a pool of animals, so in addition to sequencing errors we also see SNPs. That is why we cannot use assembly based on let's say 99% homology. The de novo algorithm then starts adding read to our clusters that extend the cluster in length, mosty based on random inverted repeats in the genomic tags.
Retro is offline   Reply With Quote
Old 02-21-2012, 06:58 PM   #4
Retro
Member
 
Location: Pennsylvania

Join Date: Apr 2011
Posts: 27
Default

OK, finally I found a great program USEARCH (http://www.drive5.com/usearch/usearch_docs.html) that does exactly that.
Retro is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO