SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Creating Transcriptome File for Use With BWA from GTF file and genomic fasta file PolPittacus7 Bioinformatics 4 07-17-2015 10:45 AM
Interproscan output visualisation fahmida Bioinformatics 0 12-16-2013 10:38 PM
DEXSEQ Prepare Annotation File and R output gokhulkrishnakilaru Bioinformatics 14 10-18-2012 01:17 AM
cufflinks output against annotation file masylichu Bioinformatics 1 09-19-2012 03:43 AM
creating Visual Repbase annotation files Lucisussman Bioinformatics 0 09-10-2012 07:37 AM

Reply
 
Thread Tools
Old 11-01-2017, 01:50 AM   #1
gtduarte
Junior Member
 
Location: Brazil

Join Date: Jan 2016
Posts: 5
Default Script for creating GO annotation file from Interproscan output

Dear comunity,

I am working with a non-model organism, thus I have to use alternative approaches for many of the analyses. At the moment I want to run the GO enrichment analysis of my differentially expressed transcripts using BiNGO, and for that I have to load my own annotation file of the transcriptome assembly which I created with Interproscan. The output of IPRS can be found here: https://github.com/ebi-pf-team/inter...example-output

So basically, using the IPRS output I have to create one association per line of the transcript ID that is in the 1st column with the GO term that is 14th column. The issue is that for many transcripts there are different GO terms associated, while others have none, as for instance:

transcript_1 ...columns_2-13... GO:0004601|GO:0006979|GO:0020037|GO:0055114
transcript_1 ...columns_2-13... GO:0004601|GO:0006979|GO:0020037|GO:0055114
transcript_1 ...columns_2-13...
transcript_1 ...columns_2-13... GO:0004601|GO:0055114
transcript_1 ...columns_2-13... GO:0004601|GO:0042744
transcript_2 ...columns_2-13...
transcript_2 ...columns_2-13... GO:0055085

And here is how the the custom annotation file should be:

transcript_1 = 0004601
transcript_1 = 0006979
transcript_1 = 0020037
transcript_1 = 0055114
transcript_1 = 0042744
transcript_2 = 0055085

Please, can someone help me with that? It wouldn't be a problem if the output of the script generates reduntant lines really, I can remove duplicated values later.

Best regards,

Gustavo
gtduarte is offline   Reply With Quote
Old 11-01-2017, 08:41 PM   #2
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 28
Default

Hi Gustavo,

I've attached a little python script that should do what you want. I had to name it "rearrange_go.txt" because of seqanswers restrictions - just rename it "rearrange_go.py".

First open the python file in a text editor and replace INPUT_FILE_NAME with the name of your file. Then put the script in the same directory as your file and run the following:

python rearrange_go.py

Another file called 'custom_annotation.txt' should be created. The script assumes you have python installed (mac and linux usually do by default). It also assumes that column 14 is always GO terms, and not anything else, though it can be blank.

Give it a go and let me know if it works!

Cheers,

Matt.
Attached Files
File Type: txt rearrange_go.txt (490 Bytes, 8 views)
neavemj is offline   Reply With Quote
Old 11-02-2017, 11:57 AM   #3
gtduarte
Junior Member
 
Location: Brazil

Join Date: Jan 2016
Posts: 5
Default

Hi Matt,

It worked, thanks a lot!!! You saved my day xD

Cheers,

Gustavo
gtduarte is offline   Reply With Quote
Old 11-06-2017, 05:36 PM   #4
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 28
Default

Excellent! Glad it worked
neavemj is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO