SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
noiseq, where to download hg19 gene feature length? slowsmile Bioinformatics 4 01-22-2014 05:43 AM
XSQ converter paolo.kunder SOLiD 5 01-05-2012 01:46 AM
pileup2vcf converter? a11msp Bioinformatics 2 11-21-2010 04:29 AM
soap2bowtie converter? bioenvisage Bioinformatics 1 12-03-2009 07:10 AM

Reply
 
Thread Tools
Old 10-17-2013, 08:19 AM   #1
jason_ARGONAUTE
Member
 
Location: china

Join Date: Aug 2013
Posts: 14
Default gene feature converter

Hi guys,
i hv got a GO file for my differentially expressed genes file, it goes like:

FBgn00001 GO:0016301 [Name:****(annotation)]
FBgn00002 GO:0016301 [Name:****(annotation)]
FBgn00003 GO:0016301 [Name:****(annotation)]
FBgn00004 GO:0003700 [Name:****(annotation)]
FBgn00004 GO:0009651 [Name:****(annotation)]
FBgn00004 GO:0006355 [Name:****(annotation)]
FBgn00005 GO:0009556 [Name:****(annotation)]
FBgn00005 GO:0005515 [Name:****(annotation)]
FBgn00005 GO:0080019 [Name:****(annotation)]
FBgn00005 GO:0016563 [Name:****(annotation)]
FBgn00005 GO:0016627 [Name:****(annotation)]
FBgn00006 GO:0003700 [Name:****(annotation)]
FBgn00006 GO:0010018 [Name:****(annotation)]

now i want to use WEGO ,so i need to convert it like:

FBgn00001 GO:0016301
FBgn00002 GO:0016301
FBgn00003 GO:0016301
FBgn00004 GO:0003700 GO:0009651 GO:0006355
FBgn00005 GO:0009556 GO:0005515 GO:0080019 GO:0016563 GO:0016627
FBgn00006 GO:0003700 GO:0010018

I think this could be solved using a perl script. I am not able to do this since i am a beginner. Can someone help me out? A simple perl script is good enough for me^^
jason_ARGONAUTE is offline   Reply With Quote
Old 10-17-2013, 08:20 AM   #2
jason_ARGONAUTE
Member
 
Location: china

Join Date: Aug 2013
Posts: 14
Default

Quote:
Originally Posted by jason_ARGONAUTE View Post
Hi guys,
i hv got a GO file for my differentially expressed genes file, it goes like:

FBgn00001 GO:0016301 [Name:****(annotation)]
FBgn00002 GO:0016301 [Name:****(annotation)]
FBgn00003 GO:0016301 [Name:****(annotation)]
FBgn00004 GO:0003700 [Name:****(annotation)]
FBgn00004 GO:0009651 [Name:****(annotation)]
FBgn00004 GO:0006355 [Name:****(annotation)]
FBgn00005 GO:0009556 [Name:****(annotation)]
FBgn00005 GO:0005515 [Name:****(annotation)]
FBgn00005 GO:0080019 [Name:****(annotation)]
FBgn00005 GO:0016563 [Name:****(annotation)]
FBgn00005 GO:0016627 [Name:****(annotation)]
FBgn00006 GO:0003700 [Name:****(annotation)]
FBgn00006 GO:0010018 [Name:****(annotation)]

now i want to use WEGO ,so i need to convert it like:

FBgn00001 GO:0016301
FBgn00002 GO:0016301
FBgn00003 GO:0016301
FBgn00004 GO:0003700 GO:0009651 GO:0006355
FBgn00005 GO:0009556 GO:0005515 GO:0080019 GO:0016563 GO:0016627
FBgn00006 GO:0003700 GO:0010018

I think this could be solved using a perl script. I am not able to do this since i am a beginner. Can someone help me out? A simple perl script is good enough for me^^
both of files are tab-delemited.
jason_ARGONAUTE is offline   Reply With Quote
Old 10-18-2013, 07:39 AM   #3
Ciaran
Junior Member
 
Location: Cambridge

Join Date: Sep 2011
Posts: 9
Default

This might help

sed 's/\[.*\]//g' genes_file
Ciaran is offline   Reply With Quote
Old 10-18-2013, 10:24 AM   #4
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

This python script should work

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
if row[0] not in new.keys():
new[row[0]] = [row[1]]
else:
new[row[0]].append(row[1])


with open("wego.txt","w") as f:
for key, value in sorted(new.items()):
f.write(key+"\t"+"\t".join(value)+"\n")

Last edited by crazyhottommy; 10-18-2013 at 10:26 AM.
crazyhottommy is offline   Reply With Quote
Old 10-18-2013, 10:27 AM   #5
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

I don't know why the indentation is messed up....

Quote:
Originally Posted by crazyhottommy View Post
This python script should work

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
if row[0] not in new.keys():
new[row[0]] = [row[1]]
else:
new[row[0]].append(row[1])


with open("wego.txt","w") as f:
for key, value in sorted(new.items()):
f.write(key+"\t"+"\t".join(value)+"\n")
crazyhottommy is offline   Reply With Quote
Old 10-18-2013, 01:42 PM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by crazyhottommy View Post
I don't know why the indentation is messed up....
You need to use the "["CODE"]" tags (remove the quotes). If you go to the advanced mode, then click on the hash tag in the toolbar.

Code:
I'm in a code block
    and I can be indented to not muck up python
dpryan is offline   Reply With Quote
Old 10-19-2013, 04:55 AM   #7
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

[/CODE][/CODE]
Quote:
Originally Posted by dpryan View Post
You need to use the "["CODE"]" tags (remove the quotes). If you go to the advanced mode, then click on the hash tag in the toolbar.

Code:
I'm in a code block
    and I can be indented to not muck up python

Test...


Code:
import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
    if row[0] not in new.keys():
        new[row[0]] = [row[1]]
    else:
        new[row[0]].append(row[1])


with open("wego.txt","w") as f:
    for key, value in sorted(new.items()):
       f.write(key+"\t"+"\t".join(value)+"\n")

Last edited by crazyhottommy; 10-19-2013 at 04:57 AM.
crazyhottommy is offline   Reply With Quote
Old 10-21-2013, 11:23 AM   #8
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

This one line awk can do the trick...

awk '{ if (a[$1]) a[$1]=a[$1]"\t"$2; else a[$1]=$2;} END { for (i in a) print i, a[i]}' OFS="\t" input.txt
crazyhottommy is offline   Reply With Quote
Old 11-04-2013, 08:15 PM   #9
jason_ARGONAUTE
Member
 
Location: china

Join Date: Aug 2013
Posts: 14
Default

i like the simplicity, thank you!
jason_ARGONAUTE is offline   Reply With Quote
Old 11-04-2013, 08:18 PM   #10
jason_ARGONAUTE
Member
 
Location: china

Join Date: Aug 2013
Posts: 14
Default

Quote:
Originally Posted by Ciaran View Post
This might help

sed 's/\[.*\]//g' genes_file

i like the simplicity of linux commands, thank you!
jason_ARGONAUTE is offline   Reply With Quote
Old 11-04-2013, 08:19 PM   #11
jason_ARGONAUTE
Member
 
Location: china

Join Date: Aug 2013
Posts: 14
Default

Quote:
Originally Posted by crazyhottommy View Post
This one line awk can do the trick...

awk '{ if (a[$1]) a[$1]=a[$1]"\t"$2; else a[$1]=$2;} END { for (i in a) print i, a[i]}' OFS="\t" input.txt
i'm new to command awk, but thanks anyway^^
jason_ARGONAUTE is offline   Reply With Quote
Old 11-04-2013, 08:23 PM   #12
jason_ARGONAUTE
Member
 
Location: china

Join Date: Aug 2013
Posts: 14
Wink

Quote:
Originally Posted by crazyhottommy View Post
[/CODE][/CODE]


Test...


Code:
import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
    if row[0] not in new.keys():
        new[row[0]] = [row[1]]
    else:
        new[row[0]].append(row[1])


with open("wego.txt","w") as f:
    for key, value in sorted(new.items()):
       f.write(key+"\t"+"\t".join(value)+"\n")
many people told me to learn Python instead of Perl, maybe i'll learn python someday^^
jason_ARGONAUTE is offline   Reply With Quote
Old 11-05-2013, 02:21 AM   #13
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
Originally Posted by jason_ARGONAUTE View Post
many people told me to learn Python instead of Perl, maybe i'll learn python someday^^
A Perl version? Okay, here's something that might work:

Code:
perl -ane '
  if($gn ne $F[0]){
    print ($gn?"\n":"").$gn;
  }
  print " ".$F[1];
  $gn = $F[0];
  END {
    print "\n";
  }'
[delimiter can be changed with the -F option, i.e. -F '/\t/']
gringer is offline   Reply With Quote
Reply

Tags
gene feature convert, go file, perl script

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO