SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
Inquiry: minimum length of reads for referece-based assembly or de novo assembly sunfuhui Bioinformatics 1 10-04-2013 10:28 AM
de novo assembly hakattack Introductions 0 05-08-2013 02:12 PM
help in De novo assembly sarbashis Bioinformatics 0 04-09-2012 07:05 AM
de novo assembly vs. reference assembly fadista General 3 02-16-2011 12:11 AM

Reply
 
Thread Tools
Old 07-24-2013, 03:24 PM   #1
mruizm
Member
 
Location: Santiago

Join Date: Apr 2013
Posts: 22
Default De novo assembly

Hi everyone, i need to know what is the real meaning of the concept "Unigene" in a de novo assembly context?

Best regards!
mruizm is offline   Reply With Quote
Old 07-25-2013, 05:12 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default De novo assembly

I don't think there is a de novo assembly concept of 'Unigene'.

I think you may be confusing UniGene

http://www.ncbi.nlm.nih.gov/unigene

with unitig

http://sourceforge.net/apps/mediawik...er_Terminology
mastal is offline   Reply With Quote
Old 09-13-2013, 04:19 AM   #3
Blahah404
Member
 
Location: Cambridge, UK

Join Date: Dec 2011
Posts: 48
Default

@mastal, unigene is a concept in transcriptome assembly, exactly the same as in the NCBI definition.

@mruizm, a unigene is a hypothetical gene represented by a cluster of similar transcripts that are thought to be isoforms in a de-novo transcriptome assembly.

see for example this Safflower transcriptome paper:
Quote:
We obtained a total of 4.69 Gb in clean nucleotides comprising 52,119,104 clean sequencing reads, 195,320 contigs, and 120,778 unigenes. Based on similarity searches with known proteins, we annotated 70,342 of the unigenes (about 58% of the identified unigenes) with cut-off E-values of 10−5. In total, 21,943 of the safflower unigenes were found to have COG classifications, and BLAST2GO assigned 26,332 of the unigenes to 1,754 GO term annotations. In addition, we assigned 30,203 of the unigenes to 121 KEGG pathways.
I think it's a confusing name, because there's also the concept of a unigene in the phylogenetic context, where it refers to a gene which always occurs in a single copy in any genome.

Last edited by Blahah404; 09-13-2013 at 04:24 AM.
Blahah404 is offline   Reply With Quote
Old 09-13-2013, 05:34 AM   #4
mruizm
Member
 
Location: Santiago

Join Date: Apr 2013
Posts: 22
Default

Yeah! I agree with you @Blahah404 because it's redundant that concept, thanks for your answer!
mruizm is offline   Reply With Quote
Old 09-13-2013, 05:36 AM   #5
Blahah404
Member
 
Location: Cambridge, UK

Join Date: Dec 2011
Posts: 48
Default

@mruizm I also just remembered that some authors define unigenes as all contigs + unassembled reads.

See for example http://www.biomedcentral.com/1471-2164/14/465:
Quote:
The notions of contig and singleton are straightforward for perfect assemblies: a contig is any sequence produced by two or more overlapping reads, while singletons are the remaining isolated reads. By contrast, the assembler we compare with produces a variety of output types: first, portions of overlapping reads are assembled into “contigs” representing putative exons. Groups of contigs that appear to constitute a single gene are then arranged to form “isotigs” representing putative splice variants of the gene. Note that an isotig may consist of only a single contig. When this splice variant reconstruction fails, some “orphan” contigs may be unused in isotigs. Thus, unique sequence in a Newbler assembly is represented by unassembled singleton reads, (orphan) contigs, and isotigs. For our purposes we consider both Newbler orphan contigs and isotigs as unique assembled sequence comparable to perfectly assembled contigs. We shall refer to this combined set of orphan contigs and isotigs as c-isotigs. Further, we shall refer to the combined set of perfect contigs and singletons (and non-perfect c-isotigs and singletons) for a single assembly as the set of unigenes.
So there are at least two conflicting definitions in use for transcriptome assembly - and to add to the confusion they are pretty much opposites! The first involves collapsing the contigs, the second adds to them.
Blahah404 is offline   Reply With Quote
Old 09-13-2013, 08:06 AM   #6
mruizm
Member
 
Location: Santiago

Join Date: Apr 2013
Posts: 22
Default

Quote:
Originally Posted by Blahah404 View Post
@mruizm I also just remembered that some authors define unigenes as all contigs + unassembled reads.

See for example http://www.biomedcentral.com/1471-2164/14/465:


So there are at least two conflicting definitions in use for transcriptome assembly - and to add to the confusion they are pretty much opposites! The first involves collapsing the contigs, the second adds to them.
See also: http://www.plosone.org/article/info%...l.pone.0038653

Where they uses the word "unigene" as a single transcript!
mruizm is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:22 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO