Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Complete ligation adaptor sequences Apex Ion Torrent 4 01-14-2016 05:56 AM
how complete is the draft assembly? cegma? chrishah De novo discovery 12 02-12-2015 11:39 AM
Annotaion of complete and partial transposons in RNASeq gwilymh Bioinformatics 1 09-16-2013 09:12 AM
complete gene sequences venkat.boffin Bioinformatics 1 08-05-2013 11:25 PM
PubMed: Successful Pyrosequencing of GC-rich DNA Sequences by Partial Substitution of Newsbot! Literature Watch 0 05-27-2010 03:00 AM

Thread Tools
Old 06-02-2015, 08:23 AM   #1
Location: Scotland

Join Date: Feb 2014
Posts: 27
Default CEGMA complete/partial depend on number of sequences?

Hello Everybody

I have a question related to CEGMA complete and partial score . We have done the assembly by Trinity and then assembled trinity transcripts in more complete transcript set. Now when we find out the number of complete and partial CEGMA score, that makes us a bit confuse. Here's the description:-

Trinity Transcripts [A] [Total 31259] = 202 complete, 229 partial
Assembled Trinity Transcripts [B] [Total 19920]= 186 complete, 213 partial

As the set B has more completed transcript, we are expecting high score for complete and partial as compare to set A [ where the transcripts are fragmented]. We are doubting that this difference in score is because of the total number of transcripts. Set B has lower number of transcripts as compare to set A so set B has lower CEGMA score as compare to Set!. But we are not sure. So my question is :-

1) Is Cegma Complete and partial score also depend on the total number of transcripts/sequences?

Any suggestion would be very helpful.

Many Thanks,
reema is offline   Reply With Quote
Old 06-06-2015, 11:07 AM   #2
Location: Davis, CA

Join Date: May 2011
Posts: 53

CEGMA does not care about the number of sequences in your input assembly. CEGMA was designed to work against genome, not transcriptome, assemblies; so you could have just one (very long) sequence in your input and still have 248 complete genes present.

The most parsimonious explanation —*to me anyway*— of your results is that your dataset B contains many incorrectly assembled transcripts compared to dataset A, such that CEGMA can not map as many complete (or partial) genes to the sequences. You could identify the complete genes that are in A and not B, and then BLAST them against B to see if they exist in any form (maybe as strange chimeric transcripts?).
kbradnam is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 04:45 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO