SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   How can I identify homologous genes between these two datasets? (http://seqanswers.com/forums/showthread.php?t=62345)

aprice67 08-31-2015 06:02 AM

How can I identify homologous genes between these two datasets?
 
Hi. Thanks for taking the time to read my question. I am a PhD student and need some help getting over a bump on a project I'm working on.

I have a RNA-seq dataset. I aligned the reads to the reference genome with bowtie2, I have a bam file for this.

I assembled a genome from the same reads using Trinity, then aligned the reads to the assembly using bowtie2. I have a bam file for this. I have also ordered contigs based on the reference genome using Mauve, and did some genefinding using RAST. It's not a perfect assembly by any means.

I want to check gene expression levels between these two cases, but that means I have to identify the homologous genes. I need to be able to say, "In the first case gene A is expressed this much, and in the second case gene A is expressed that much." I just am not sure how to get there from where I'm at now. I was thinking maybe I somehow have to blast the data and parse out position values or something, but I'm not sure. I feel like people must have seen this problem before.

I really appreciate any advice anyone can offer. Thanks very much in advance!

AntonioRFranco 08-31-2015 07:21 AM

It is time to use a R package such as edgeR, DESeq2, etc
It will do the differential expression analysis for you

maubp 08-31-2015 09:48 AM

I would suggest a Reciprocal Best Blast Hit (RBBH) analysis as your first step in finding candidate homologues. If you have or expect to have lots of gene duplication in either species, then more sophisticated methods/analysis may be needed.

e.g. You could use my script & Galaxy wrapper:
https://github.com/peterjc/galaxy_bl...ocal_best_hits

See also the reference suggested in the help,

Punta and Ofran (2008) The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function. PLoS Comput Biol 4(10): e1000160.
http://dx.doi.org/10.1371/journal.pcbi.1000160

GenoMax 08-31-2015 09:56 AM

@aprice67: If a reference genome is available what was the reason to do a trinity assembly? Were you expecting to improve on the annotation available?

What exactly do you mean by this"
Quote:

I want to check gene expression levels between these two cases

aprice67 08-31-2015 02:03 PM

Quote:

Originally Posted by maubp (Post 179852)
I would suggest a Reciprocal Best Blast Hit (RBBH) analysis as your first step in finding candidate homologues. If you have or expect to have lots of gene duplication in either species, then more sophisticated methods/analysis may be needed.

e.g. You could use my script & Galaxy wrapper:
https://github.com/peterjc/galaxy_bl...ocal_best_hits

See also the reference suggested in the help,

Punta and Ofran (2008) The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function. PLoS Comput Biol 4(10): e1000160.
http://dx.doi.org/10.1371/journal.pcbi.1000160

@maubp: Thanks very much! I'm going to give this a try and see where it leads.


All times are GMT -8. The time now is 10:00 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.