SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Re-Annotate legacy gene predictions Amative Bioinformatics 6 04-28-2013 09:50 AM
survery: what assembler, gene prediction and gene annotation softwares do you use? Gorbenzer Metagenomics 6 04-22-2013 01:25 AM
how to run tophat without gtf file and annotate the gene mehtaaditya Bioinformatics 0 03-19-2013 08:53 AM
ChIP-Seq: Tissue-specific prediction of directly regulated genes. Newsbot! Literature Watch 0 07-05-2011 03:00 AM
PubMed: Prediction of novel miRNAs and associated target genes in Glycine max. Newsbot! Literature Watch 0 03-05-2010 02:00 AM

Reply
 
Thread Tools
Old 09-17-2013, 05:56 PM   #1
mictadlo
Junior Member
 
Location: Island

Join Date: Sep 2013
Posts: 1
Default How to annotate genes from gene prediction

Hi,
For a completly novel genome I used a gene prediction software called SNAP (http://korflab.ucdavis.edu/software.html) and it provided me a GFF and Fasta file (see below).

SNAP's GFF:
Code:
XA8	SNAP	Einit	6161	7325	-5.800	-	.	XA8r-snap.4
XA8	SNAP	Eterm	5974	6008	5.650	-	.	XA8r-snap.4
SNAP's fasta:
Code:
>XA8-snap.4
MAAHPPTLLDRAYGVNNIKSHIPIILDNNDHNYDAWRELLLTHCQSFEVA
GHLDGTLLPTDDNDQLWIKRDGLVKLWLYGTISKDLFRSVFKTGGTSREI
WTRIENYFRDNKEARAIRLDHELRNKTIGDLTIHAYRQDLKSISELLANV
ESPVSERTLVTYMINGLSAKFDNIINVIMHRQPFPTFEQARSMLILEEER
LNKGDKSPLVKDSPSSDKVLNVSATSQPPATTQQPQQQQRFYNNRGSKRN
NRGRGRNYNNNQRPMYNQWGVPFWPNAYSFWGNQQQAPWGQQQFNNQGIL
GPRPSQQAHQVQTQGQFPSAAPFVPTTDFASAFNTMTLTDPTDHQWYMDS
GATAHLTNNPGNLKSILNTGTKQTVKVANGDIIPITKTGPSNSTDNSPQ*
In next step I used the SNAP's fasta file to Blastp against UniRef90 and I have got the following XML file:

Code:
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastp</BlastOutput_program>
  <BlastOutput_version>BLASTP 2.2.26+</BlastOutput_version>
  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&amp;auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>/sw/db/uniprot/uniref90</BlastOutput_db>
  <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
  <BlastOutput_query-def>XA8-snap.1</BlastOutput_query-def>
  <BlastOutput_query-len>96</BlastOutput_query-len>
  <BlastOutput_param>
    <Parameters>
      <Parameters_matrix>BLOSUM62</Parameters_matrix>
      <Parameters_expect>1e-05</Parameters_expect>
      <Parameters_gap-open>11</Parameters_gap-open>
      <Parameters_gap-extend>1</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>4</Iteration_iter-num>
      <Iteration_query-ID>Query_4</Iteration_query-ID>
      <Iteration_query-def>XA8-snap.4</Iteration_query-def>
      <Iteration_query-len>400</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>UR090:UniRef90_Q9FX16</Hit_id>
          <Hit_def>F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH</Hit_def>
          <Hit_accession>UR090:UniRef90_Q9FX16</Hit_accession>
          <Hit_len>308</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>278.1</Hsp_bit-score>
              <Hsp_score>710</Hsp_score>
              <Hsp_evalue>1.87694e-87</Hsp_evalue>
              <Hsp_query-from>10</Hsp_query-from>
              <Hsp_query-to>290</Hsp_query-to>
              <Hsp_hit-from>8</Hsp_hit-from>
              <Hsp_hit-to>286</Hsp_hit-to>
              <Hsp_query-frame>0</Hsp_query-frame>
              <Hsp_hit-frame>0</Hsp_hit-frame>
              <Hsp_identity>146</Hsp_identity>
              <Hsp_positive>192</Hsp_positive>
              <Hsp_gaps>10</Hsp_gaps>
              <Hsp_align-len>285</Hsp_align-len>
              <Hsp_qseq>DRAYGVNNIKSHIPIILDNNDHNYDAWRELLLTHCQSFEVAGHLDGTLLPTDDNDQLWIKRDGLVKLWLYGTISKDLFRSVFKTGGTSREIWTRIENYFRDNKEARAIRLDHELRNKTIGDLTIHAYRQDLKSISELLANVESPVSERTLVTYMINGLSAKFDNIINVIMHRQPFPTFEQARSMLILEEERLNKGDK-SPLVKDSPSSDKVLNVSATSQPPATT-QQPQQQQRFYNNRGSKRN-NRGRGRNYNNNQRPMYNQWGV-PFWPNAYSFWGNQQQAPWG</Hsp_qseq>
              <Hsp_hseq>EQIYGVSNIKSHIPVMLDIEESNYDAWRELFLTHCLSFDVMGHIDGTLLPTNANDVNWQKRDGIVKLSLYGTLTPKQFQGSFVTSSTSRDIWLRIKNQFRNNKDARALRLDSELRTKDIGDMRVADYYRKMKKLADSLRNVDVPVTDRNLVMYVLNGLNPKFDNIINVIKHRQPFPSFDDAATMLQEEEDRLKRAIKPNPTHVDHSSSSTVL--ACSEAPPVTNFQRSGGNQMGYRGRGRGNNIFRGRGGRFSYYNMPTFNSWNRPPFYQNSYQMWNH----PWG</Hsp_hseq>
              <Hsp_midline>++ YGV+NIKSHIP++LD  + NYDAWREL LTHC SF+V GH+DGTLLPT+ ND  W KRDG+VKL LYGT++   F+  F T  TSR+IW RI+N FR+NK+ARA+RLD ELR K IGD+ +  Y + +K +++ L NV+ PV++R LV Y++NGL+ KFDNIINVI HRQPFP+F+ A +ML  EE+RL +  K +P   D  SS  VL  + +  PP T  Q+    Q  Y  RG   N  RGRG  ++    P +N W   PF+ N+Y  W +    PWG</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
...
How would be possible to generate a GFF3 or GTF file with annotations for SNPeff (http://snpeff.sourceforge.net/)?


Is there any good annotation pipeline avaible?

Thank you in advance.

Last edited by mictadlo; 09-17-2013 at 06:00 PM.
mictadlo is offline   Reply With Quote
Reply

Tags
blastp, gene annotation, gene prediction, gff3 file, gtf annotation file

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:28 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO