SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Metagenomic assembly (filter low complexity reads) rsinha Bioinformatics 0 10-24-2012 01:24 PM
Groom data prior to using GS de novo Assembler? grassgirl 454 Pyrosequencing 5 09-29-2011 03:47 PM
Annotation and RPKM measure after De Novo Assembly cavefish RNA Sequencing 0 10-18-2010 11:43 PM

Reply
 
Thread Tools
Old 03-10-2013, 10:49 PM   #1
FGponce
Junior Member
 
Location: New Zealand

Join Date: Dec 2011
Posts: 9
Default Reduce complexity of a de novo assembly prior to annotation

Hi everyone.

We have a trinity assembly that presumably contains lots of isoforms/hybrids/paralogs etc.

We have a second organism in there at a low level so don't want to filter out isoforms with low read evidence as others may.

I'm aware trinity creates a component that it splits to isoforms but this component file doesn't appear to be generated as a fasta at any time.

How can we filter out the best transcript models before we do some in depth annotation? Can we run cuffmerge on a trinity assembly eg put two copies of the assembly in as two transcriptomes and get it to combine with itself?

Any other ideas?

Cheers,

FGPonce
FGponce is offline   Reply With Quote
Old 03-11-2013, 12:23 AM   #2
Apexy
Member
 
Location: Africa

Join Date: Apr 2011
Posts: 62
Default

Have you tried using any of the de novo clustering algorithms to curb redundancy?
Apexy is offline   Reply With Quote
Old 03-11-2013, 12:31 PM   #3
FGponce
Junior Member
 
Location: New Zealand

Join Date: Dec 2011
Posts: 9
Default

No havn't heard of those. Are they incorporated as a funtion of a particular tool or suite of tools?
FGponce is offline   Reply With Quote
Old 03-11-2013, 02:47 PM   #4
Apexy
Member
 
Location: Africa

Join Date: Apr 2011
Posts: 62
Default

Hi,
My understanding is that you have assembled a set of transcripts and you would like to annotate them. As it is with transcriptome assemblers, there is usually a lot of redundancy. It has become popular that these transcripts can be group together before using the popular BLASTX for annotation. However, biggest challenge to identifying the threshold at which further clustering leads to loss of information because sequence similarity does not invariably mean functional similarity. Programs such as WCD : http://www.sanbi.ac.za/resources/sof...downloads/wcd/, Cdhit: http://cd-hit.org/, Vmatch: http://www.vmatch.de/ and CAP3 http://seq.cs.iastate.edu/ have been used of cluster assembled transcripts, EST and 454 type sequences.

HTH
Apexy is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO