SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Which annotation source is best?RNA-seq SEQond Bioinformatics 2 07-30-2012 02:04 PM
RNA-Seq: Detection, annotation and visualization of alternative splicing from RNA-Seq Newsbot! Literature Watch 0 01-10-2012 04:00 AM
RNA-Seq: Computational methods for transcriptome annotation and quantification using Newsbot! Literature Watch 0 05-31-2011 02:40 AM
RNA-seq & gene annotation rongronghai RNA Sequencing 4 09-08-2010 01:35 AM
RNA-Seq: Function annotation of rice transcriptome at single nucleotide resolution by Newsbot! Literature Watch 0 07-16-2010 02:40 AM

Reply
 
Thread Tools
Old 01-31-2012, 06:30 AM   #1
mhadidi2002
Member
 
Location: Germany

Join Date: Jun 2011
Posts: 24
Default RNA-Seq annotation

Hello,

I need to know is there any software that can take input of Reference genome and RNA-Seq reads and output genomic annotation from these RNA reads?

Thanks
mhadidi2002 is offline   Reply With Quote
Old 01-31-2012, 08:36 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

By that, do you mean is there a program that will align RNA-seq reads against a reference genome (returning the aligned position and any mismatches), or do you want to add a particular annotation (presumably associated gene or transcript) to already aligned reads? I assume that what you want is the former, in which case you can use tophat/bowtie, novoalign, bwa, etc.
dpryan is offline   Reply With Quote
Old 01-31-2012, 11:55 AM   #3
jjohnson
Member
 
Location: Washington DC Metro Area

Join Date: Aug 2009
Posts: 20
Default

If you are new to NGS analysis, I could also recommend using a commercial platform such as CLC or DNANexus for RNA-Seq analysis. CLC is a local workbench with yearly license fees (very reasonable for academic or non-profits), while Nexus is cloud based and a pay per GB model.

Service providers are also an option as well to get consulting work done.

If you have the informatics chops, DESEQ, tuxedo suite, and several other packages align and help you annotate and evaluate expression data.
__________________
Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio
jjohnson is offline   Reply With Quote
Old 01-31-2012, 11:47 PM   #4
mhadidi2002
Member
 
Location: Germany

Join Date: Jun 2011
Posts: 24
Default

Quote:
Originally Posted by dpryan View Post
By that, do you mean is there a program that will align RNA-seq reads against a reference genome (returning the aligned position and any mismatches), or do you want to add a particular annotation (presumably associated gene or transcript) to already aligned reads? I assume that what you want is the former, in which case you can use tophat/bowtie, novoalign, bwa, etc.
Thanks Dpryan for the reply..
Actually I have the reads already aligned, I need to add annotations to the aligned reads that will include: ORFs, splice variants, SNPs,... and how these all will affect the protein.

That's it..

Thanks
mhadidi2002 is offline   Reply With Quote
Old 01-31-2012, 11:57 PM   #5
mhadidi2002
Member
 
Location: Germany

Join Date: Jun 2011
Posts: 24
Default

Quote:
Originally Posted by jjohnson View Post
If you are new to NGS analysis, I could also recommend using a commercial platform such as CLC or DNANexus for RNA-Seq analysis. CLC is a local workbench with yearly license fees (very reasonable for academic or non-profits), while Nexus is cloud based and a pay per GB model.

Service providers are also an option as well to get consulting work done.

If you have the informatics chops, DESEQ, tuxedo suite, and several other packages align and help you annotate and evaluate expression data.
Hi jjohnson,

Thanks for the reply.
can these software give annotation to aligned reads? The information that I need is ORFs, SNPs, splice varients, ..etc

Thanks again
mhadidi2002 is offline   Reply With Quote
Old 02-01-2012, 12:22 AM   #6
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

So, you want to know for every single one of your millions of read whether it sits on a SNP, an ORF, a splice variant etc, giving you a huge list with millions of lines. Are you sure you know whet you want to do next with that? There is a reason why this is not how it is usually done.
Simon Anders is offline   Reply With Quote
Old 02-01-2012, 12:32 AM   #7
mhadidi2002
Member
 
Location: Germany

Join Date: Jun 2011
Posts: 24
Default

Quote:
Originally Posted by Simon Anders View Post
So, you want to know for every single one of your millions of read whether it sits on a SNP, an ORF, a splice variant etc, giving you a huge list with millions of lines. Are you sure you know whet you want to do next with that? There is a reason why this is not how it is usually done.
Simon, thanks for joining the conversation.

the main aim is to find differences among 4 species, each one has its own RNA reads aligned to the genome. these 4 species suffer different stress, I need to know the effect of these stresses on the expression. is that the right way? as I assume, by knowing from which gene this RNA was expressed, I will figure out which genes are involved in tolerating the stress.

Thanks
mhadidi2002 is offline   Reply With Quote
Old 02-01-2012, 06:02 AM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Sounds like what you actually want to do is look at differential expression between the groups. Search the forum for a likely plethora of threads on that subject.
dpryan is offline   Reply With Quote
Old 02-01-2012, 06:38 AM   #9
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by mhadidi2002 View Post
Simon, thanks for joining the conversation.

the main aim is to find differences among 4 species, each one has its own RNA reads aligned to the genome. these 4 species suffer different stress, I need to know the effect of these stresses on the expression. is that the right way? as I assume, by knowing from which gene this RNA was expressed, I will figure out which genes are involved in tolerating the stress.

Thanks
Could you give some more details about the species and how you did the alignments? You say you aligned them to the genome. Do all four have a sequenced genome or was there just a single genome you aligned them to? Also do you know if there are annotation files for this/these genomes, like a gff or gtf file?

I have done a similar experiment with 3 different plant species, for which we did not have an annotated genome. To get at the question of differential expression, we aligned to the sequenced genome of a closely related species and then used that as our reference.
chadn737 is offline   Reply With Quote
Old 02-01-2012, 07:21 AM   #10
mhadidi2002
Member
 
Location: Germany

Join Date: Jun 2011
Posts: 24
Default

Quote:
Originally Posted by chadn737 View Post
Could you give some more details about the species and how you did the alignments? You say you aligned them to the genome. Do all four have a sequenced genome or was there just a single genome you aligned them to? Also do you know if there are annotation files for this/these genomes, like a gff or gtf file?

I have done a similar experiment with 3 different plant species, for which we did not have an annotated genome. To get at the question of differential expression, we aligned to the sequenced genome of a closely related species and then used that as our reference.
Hello Chadn737,

My work is similar to yours. I have 4 species for the same plant, their genomes aren't sequenced, but there is a sequence for 1 closely related plant, which is considered as the reference.

I aligned the 4 different species which the closely related species. the output is in bed and bam files. I need to make a reannonation for that reference genome, depending on these 4 species. I have annotation for the reference genome, but in GFF format.

do u any idea?

Thanks for the discussion.
mhadidi2002 is offline   Reply With Quote
Old 02-01-2012, 07:30 AM   #11
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by mhadidi2002 View Post
Hello Chadn737,

My work is similar to yours. I have 4 species for the same plant, their genomes aren't sequenced, but there is a sequence for 1 closely related plant, which is considered as the reference.

I aligned the 4 different species which the closely related species. the output is in bed and bam files. I need to make a reannonation for that reference genome, depending on these 4 species. I have annotation for the reference genome, but in GFF format.

do u any idea?

Thanks for the discussion.
This is exactly the situation we have worked with.

You're problem of "annotation" is fairly straight forward. What you are really wanting to do is get a list of gene names and the number of reads mapping to each for downstream analysis like differential expression, right?

There are a couple of options to do this, the approach I prefer is to use htseq-count which was written by Simon Anders. But there are other approaches using bedtools and the like.

If you don't mind me asking, what genome did you align to? If it is Arabidopsis, Maize, or another well annotated plant species, then there will be a lot more tools available for downstream analysis once you find your differentially expressed genes.
chadn737 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:51 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO