SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pipeline for DESeq and annotation bjoernoest Bioinformatics 2 02-24-2014 04:57 AM
Gene Fusion Annotation pipeline fabate RNA Sequencing 6 04-04-2013 05:47 AM
PubMed: MAKER2: an annotation pipeline and genome-database management tool for second Newsbot! Literature Watch 0 06-19-2012 02:00 AM
varscan-annotation pipeline? dkrtndhkd Bioinformatics 5 06-07-2012 11:41 PM
ncRNA annotation pipeline ideas Francisc Bioinformatics 0 03-22-2012 05:00 AM

Reply
 
Thread Tools
Old 10-02-2013, 02:58 PM   #1
marct
Member
 
Location: Tempe, AZ

Join Date: Oct 2013
Posts: 11
Default Genome Annotation Pipeline Help Required

First post as a user here, so please go easy on me for lack of due diligence

We have a genome assembled from Illumina data. There is a reference genome of a closely related species (same genus). I downloaded the proteins from this reference genome and sought to map them to our genome using local tblastn, as a homology-based annotation (we also have predicted transcripts from MAKER as an ab initio annotation method).

I have seen this method used in the literature, but all the method descriptions skip an important step - actually physically mapping the best tblastn hits (from whatever criteria) to the genome.

I assume there is some way to convert the blast xml output to an annotation file (GFF or similar) - one that conserves the info from the blast (especially protein name and function). I tried looking into Biopython and BioPerl but could not lay hands on the proper method of doing this.

Can someone please point me in the right direction?
marct is offline   Reply With Quote
Old 10-02-2013, 03:52 PM   #2
whataBamBam
Member
 
Location: Italy

Join Date: May 2013
Posts: 27
Default

Quote:
Originally Posted by marct View Post
First post as a user here, so please go easy on me for lack of due diligence

We have a genome assembled from Illumina data. There is a reference genome of a closely related species (same genus). I downloaded the proteins from this reference genome and sought to map them to our genome using local tblastn, as a homology-based annotation (we also have predicted transcripts from MAKER as an ab initio annotation method).

I have seen this method used in the literature, but all the method descriptions skip an important step - actually physically mapping the best tblastn hits (from whatever criteria) to the genome.

I assume there is some way to convert the blast xml output to an annotation file (GFF or similar) - one that conserves the info from the blast (especially protein name and function). I tried looking into Biopython and BioPerl but could not lay hands on the proper method of doing this.

Can someone please point me in the right direction?
I'm currently doing something similar using gmap.. I have a transcriptmome that I'm mapping to a genome though.. (we assembled the transcriptome and then the genome of a related species was subsequently released) also have ests that I'm mapping to a genome with gmap. Exonerate I believe does a similar job and has a protein matching mode.. Anyway both these programs will output in gff format
whataBamBam is offline   Reply With Quote
Old 10-07-2013, 08:48 AM   #3
marct
Member
 
Location: Tempe, AZ

Join Date: Oct 2013
Posts: 11
Default

Thanks for the reply. As of this moment I am running genBlast and exonerate as well as the tblastn. In all of these cases, I am using the protein database of the model species as the query and my genomic sequence as the target (or database). I'll let you know how it goes.
marct is offline   Reply With Quote
Old 10-07-2013, 10:26 AM   #4
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

How come the functions of MAKER were not sufficient for your analysis?
AdrianP is offline   Reply With Quote
Old 10-07-2013, 01:07 PM   #5
marct
Member
 
Location: Tempe, AZ

Join Date: Oct 2013
Posts: 11
Default

I am joining this project somewhat in the middle of the process.

From my understanding, the initial Maker run was ab initio only, we do not have ESTs or RNA-seq data to add to the pipeline. So while important, the SNAP/Augustus etc gene calls from Maker should constitute one line of evidence for our annotations, while direct alignment of homologous proteins coupled with splice-site detection (a la exonerate) should constitute another, homology-based line of evidence.

Stop me if I'm wrong.
marct is offline   Reply With Quote
Old 10-07-2013, 06:09 PM   #6
whataBamBam
Member
 
Location: Italy

Join Date: May 2013
Posts: 27
Default

Hang on..

I thought MAKER was just for ab initio. You can use that to bring together ESTs and RNA-Seq data too?

That's what I have. I'm just mapping my transcripts to the ESTs at the moment
whataBamBam is offline   Reply With Quote
Reply

Tags
annotation, tblastn

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:03 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO