SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-seq assembly xy6699 RNA Sequencing 6 02-06-2012 04:08 AM
RNA-Seq: De novo transcriptome assembly of RNA-Seq reads with different strategies. Newsbot! Literature Watch 0 01-10-2012 04:00 AM
RNA-Seq: Full-length transcriptome assembly from RNA-Seq data without a reference gen Newsbot! Literature Watch 7 10-26-2011 05:37 AM
RNA-Seq: Composite Transcriptome Assembly of RNA-seq data in a Sheep Model for Delaye Newsbot! Literature Watch 0 03-26-2011 02:02 AM
RNA-Seq: De novo assembly and analysis of RNA-seq data. Newsbot! Literature Watch 0 10-12-2010 03:50 AM

Reply
 
Thread Tools
Old 03-29-2011, 11:19 PM   #1
Rachel
Junior Member
 
Location: Malaysia

Join Date: Dec 2008
Posts: 9
Question RNA-seq assembly

Hi

I have a RNA-seq (Illumina platform) data without a reference sequence. Then the only option I have is to do a de novo assembly. Followed by gene prediction or mega blast to identified the content of my mRNA.

However if the gene content is unknown. May I know if there is any software available to identified the unknown genes, or any pipeline that I can used.

Say you have hypothetical proteins how am I going to determine that it is a hypothetical protein and what does it functions (any softwares).

Thanks
Rachel is offline   Reply With Quote
Old 03-30-2011, 01:25 AM   #2
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 56
Default

To annotate an assembly, http://www.blast2go.org/ may help you.
schmima is offline   Reply With Quote
Old 03-30-2011, 06:11 PM   #3
Rachel
Junior Member
 
Location: Malaysia

Join Date: Dec 2008
Posts: 9
Default

Thanks and appreciate for your reply.

If I am not mistaken the blast2go are able to annotate the available genes from the database. If the genes or hypothetical proteins is not available in the database. Then what should I need to do to predict a new or novel gene? Thanks
Rachel is offline   Reply With Quote
Old 03-30-2011, 09:41 PM   #4
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 56
Default

just to make sure that I got it right - you have an assembled transcriptome and you want to annotate it (?). I guess that for this you will always have to rely on other databases. I don't know about anything that would be able to tell you ab initio what kind of sequence would produce what kind of protein.

In other words: you have to rely on existing knowledge. However - there's quite a lot around. Example blast2go does more or less the following:
1. uses blast (in case of transcripts blastx) to search for similar transcripts which are at least somewhere somehow described (some may have experimental evidence, other are only based on predictions). In this step you will not only find the ones that are identical to known transcripts. It will also find cases where you have some similarity.
2. Annotation then via GO, InterProScan, KEGG etc. (InterPro runs - I think - only on the ones which have a GO annotation - did not finish it due to the rather slow processing )
3. Some Statistics

Using blast2go you will be able to annotate quite some of your transcripts. Nonetheless - you will definitely have others which are not similar to any of the known ones (to be exact - they may be similar to a certain extent - but less than you specified by the threshold you chose for blastx).

Now - if I got it right, you would like to do something with the remaining - unannotated transcript (?). Hm - I'm not really an expert for this. But I guess that "gene prediction" is not really what you need (as this programs are rather annotating a genome sequence - with the help of the transcripts you provide from your assembly - but as you don't have a genome sequence...). Well - there may be some programs which check transcripts directly - would be nice to know if you find something.

An other possibility would be to search for protein domains (InterProScan etc - but this time on the sequences which were left out by blast2go). However - as fas as I know, you need to have protein sequences to do so. Means you need to translate your transcript into proteins (if not strand specific: six proteins - three frames from each strand). Just keep in mind:
1. the domainscanners are again based on "similarity to known things"
2. translating transcripts into proteins can be quite errorprone (imagine you had some intronic reads (eg either unspliced pre-mRNA or antisense transcripts): they will be incorporated into your transcript and during in silico translation it will mix up your protein sequence quite badly)


In summary:
I don't know about a "good" way of dealing with unknown transcripts which are not similar to anything that is known [well there are some - but not on the computer you would have to go to the bench ]
schmima is offline   Reply With Quote
Old 03-30-2011, 11:03 PM   #5
Rachel
Junior Member
 
Location: Malaysia

Join Date: Dec 2008
Posts: 9
Default

Hi

Really appreciate for detailed out my questions ^_^ That is exactly what I want to know > how to deal with the unknown transcripts.

Well I have not done anything on the project yet. But I would kind of assuming if I have something different from the known database then what should I do...

Share with me if there is any additional info ^_^ Have a nice day ahead ya
Rachel is offline   Reply With Quote
Old 03-30-2011, 11:45 PM   #6
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 56
Default

was a pleasure

Quote:
Well I have not done anything on the project yet. But I would kind of assuming if I have something different from the known database then what should I do...
I guess if it is totally different you'll have a hard time. Well - in principle you could translate into protein and do some crazy stuff maybe via the structure... but I think this is everything else than easy...

well - if you just have few of them (or could filter based on whatever criteria down to few):
1. back to the lab try to get/confirm the transcript (means: clone and sequence it the old way)
2. still in the lab - use other methods to characterize it...
3. some years later: either , , , , , or ...



have a nice day - and in case you found a solution, let me know

all the best
schmima is offline   Reply With Quote
Old 03-30-2011, 11:58 PM   #7
Rachel
Junior Member
 
Location: Malaysia

Join Date: Dec 2008
Posts: 9
Default

WOW seems to be very challenging and a lot of stuff to be done if that happens!!!
Will see what else I can do with it....

Anyway, much appreciate for the sharing... THANKS!
Rachel is offline   Reply With Quote
Old 04-04-2011, 03:14 PM   #8
eskirton
Junior Member
 
Location: Walnut Creek, CA

Join Date: Dec 2009
Posts: 1
Default try hmmscan vs pfam

maybe try a blast-based annotation first (as recommended above) and with your remaining (and low-confidence) transcripts, try a more sensitive hmm based annotation.

first identify the coding regions and translate (e.g. using prodigal or similar), and run hmmscan vs pfam. novel proteins will likely have conserved domains, so even if they don't have "full-length" hits to known proteins, the domains themselves are informative.
eskirton is offline   Reply With Quote
Old 04-04-2011, 08:47 PM   #9
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 56
Default

By the way - beside the protein-similarity searches via blastx and domainscanners (forgot to note that blast2go is only trying to annotate protein coding transcripts - as GOs are only associated with proteins) I would also search for similarities on the nucleotide level (normal blast/blat - don't know about any software that is wrapping everything - if anyone knows - would be interesting) - I believe you will be able to annotate some of the ones that were not having any protein(-domain) similarity (some of them could also be rather intersting in biological meaning).

All the best (writing at the phone is tricky - sry for mistakes...)
schmima is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:15 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO