View Single Post
Old 01-30-2012, 05:32 PM   #1
Retro
Member
 
Location: Pennsylvania

Join Date: Apr 2011
Posts: 27
Default clustering algorithm for single reads from transposon integrations

We have Ion Torrent reads from retrovirus (transposon) integration sites in unsequenced genome and we need to cluster them by sequence identity. The first fifty bases of each read is always the transposon end and the rest is basically random piece of genomic DNA that flanks the insertion. We need to collapse or cluster the reads from each unique integration site together. Currently we use de novo assembly algorithms, but those perform poorely. We need to relax the stringency of alignment because of the sequencing errors, and then de novo assembly joins artificially clusters together. Our clusters should have length of only one read.

Would anybody know of suitable algorithm to create these single read clusters?
Retro is offline   Reply With Quote