Dear comunity,
I am working with a non-model organism, thus I have to use alternative approaches for many of the analyses. At the moment I want to run the GO enrichment analysis of my differentially expressed transcripts using BiNGO, and for that I have to load my own annotation file of the transcriptome assembly which I created with Interproscan. The output of IPRS can be found here: https://github.com/ebi-pf-team/inter...example-output
So basically, using the IPRS output I have to create one association per line of the transcript ID that is in the 1st column with the GO term that is 14th column. The issue is that for many transcripts there are different GO terms associated, while others have none, as for instance:
transcript_1 ...columns_2-13... GO:0004601|GO:0006979|GO:0020037|GO:0055114
transcript_1 ...columns_2-13... GO:0004601|GO:0006979|GO:0020037|GO:0055114
transcript_1 ...columns_2-13...
transcript_1 ...columns_2-13... GO:0004601|GO:0055114
transcript_1 ...columns_2-13... GO:0004601|GO:0042744
transcript_2 ...columns_2-13...
transcript_2 ...columns_2-13... GO:0055085
And here is how the the custom annotation file should be:
transcript_1 = 0004601
transcript_1 = 0006979
transcript_1 = 0020037
transcript_1 = 0055114
transcript_1 = 0042744
transcript_2 = 0055085
Please, can someone help me with that? It wouldn't be a problem if the output of the script generates reduntant lines really, I can remove duplicated values later.
Best regards,
Gustavo
I am working with a non-model organism, thus I have to use alternative approaches for many of the analyses. At the moment I want to run the GO enrichment analysis of my differentially expressed transcripts using BiNGO, and for that I have to load my own annotation file of the transcriptome assembly which I created with Interproscan. The output of IPRS can be found here: https://github.com/ebi-pf-team/inter...example-output
So basically, using the IPRS output I have to create one association per line of the transcript ID that is in the 1st column with the GO term that is 14th column. The issue is that for many transcripts there are different GO terms associated, while others have none, as for instance:
transcript_1 ...columns_2-13... GO:0004601|GO:0006979|GO:0020037|GO:0055114
transcript_1 ...columns_2-13... GO:0004601|GO:0006979|GO:0020037|GO:0055114
transcript_1 ...columns_2-13...
transcript_1 ...columns_2-13... GO:0004601|GO:0055114
transcript_1 ...columns_2-13... GO:0004601|GO:0042744
transcript_2 ...columns_2-13...
transcript_2 ...columns_2-13... GO:0055085
And here is how the the custom annotation file should be:
transcript_1 = 0004601
transcript_1 = 0006979
transcript_1 = 0020037
transcript_1 = 0055114
transcript_1 = 0042744
transcript_2 = 0055085
Please, can someone help me with that? It wouldn't be a problem if the output of the script generates reduntant lines really, I can remove duplicated values later.
Best regards,
Gustavo
Comment