SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
C programming question arkal Bioinformatics 1 10-24-2011 10:48 PM
bowtie programming fancy Bioinformatics 1 09-29-2011 03:23 PM
Save error report as text file in BWA CNVboy Bioinformatics 0 06-27-2011 03:03 PM
Biomedical Text Mining Engineer Ingenuity Industry Jobs! 0 02-04-2011 04:05 PM
Parsing Pileup with Text:CSV in Perl guavajuice Bioinformatics 0 08-23-2010 07:50 AM

Reply
 
Thread Tools
Old 07-09-2010, 08:31 AM   #1
kapoormanav
Junior Member
 
Location: St Louis

Join Date: Jul 2010
Posts: 9
Default Re: New to programming need help for inserting text

Hi all

I am new to programming. I have two very big files

File 1: (>80,000 lines)

CCDS635.1_0 123 G S 255 33 1.00 63 60 G 255 C
CCDS635.1_0 175 C S 255 51 0.94 63 62 C 243 G
CCDS635.1_8 259 G R 90 254 1.00 63 62 G 255 A
CCDS635.1_14 328 A T 39 4 0.00 63 36 N 158 N
CCDS635.1_16 80 G K 139 22 1.00 63 62 G 255 T

File 2:

CCDS635.1_0 1 67162857
CCDS635.1_1 1 67143396
CCDS635.1_2 1 67131424
CCDS635.1_3 1 67129349
CCDS635.1_4 1 67112976

I want to insert all three columns of File 2 to File one at the
respective positions according to column 1 of File 1.

e.g.

File 3 should be

CCDS635.1_0 1 67162857 123 G S 255 33 1.00 63 60 G 255 C
CCDS635.1_0 1 67162857 175 C S 255 51 0.94 63 62 C 243 G

Can anyone help me with this?

Manav
kapoormanav is offline   Reply With Quote
Old 07-09-2010, 08:45 AM   #2
adamdeluca
Member
 
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95
Default

join 2.txt 1.txt > 3.txt

on mac/linux
adamdeluca is offline   Reply With Quote
Old 07-09-2010, 08:57 AM   #3
kapoormanav
Junior Member
 
Location: St Louis

Join Date: Jul 2010
Posts: 9
Default

Thanks for replying.
I tried to join but it gives error:

join: file 1 is not in sorted order
join: file 2 is not in sorted order

There is another problem also..These both files have different number of lines.
I don't want exactly merging of file, I want to create new file where
the similar identifier get replaced by three columns of 2nd file. If
similar identifier occurs in 4 rows of 1st file then the next 2
columns of the same identifier in file 2 get inserted before 2nd
column of file one for all 4 rows. It can be like find and replace
where identifier "CCDS635.1_0" will be replaced by "CCDS635.1_0
1 67162857" in all rows. I can do it with "sed" command of unix
but then "sed" will do one identifier at one time and I have >1000.
And everytime sed will create new file. So I was looking for solution
where "identifier" replaced by new identifier with 3 columns.

Manav
kapoormanav is offline   Reply With Quote
Old 07-09-2010, 10:28 AM   #4
kapoormanav
Junior Member
 
Location: St Louis

Join Date: Jul 2010
Posts: 9
Default

Thanks...It worked I given the option join --nocheck-order

Manav
kapoormanav is offline   Reply With Quote
Old 07-09-2010, 10:30 AM   #5
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

Maybe somthing like this would work:

Code:
#!/bin/env python
   
def merge_files(fname1, fname2, out_fname):
  
     #read file 2 into memory
     file2 = {}
     for line in open(fname2):
         line = line.strip()
         if not line:
             continue
         line_name = line.split()[0]
         file2[line_name] = line
 
     #now create the output
     output_fhand = open(out_fname, 'w')
     for line in open(fname1):
         line = line.strip()
         line_name, line = line.split(' ', 1)
         line_name = file2[line_name]
         output_fhand.write(' '.join((line_name, line, '\n')))
   
   if __name__ == '__main__':
       merge_files('file1.txt', 'file2.txt', 'merged.txt')
Jose Blanca is offline   Reply With Quote
Old 07-09-2010, 12:23 PM   #6
kapoormanav
Junior Member
 
Location: St Louis

Join Date: Jul 2010
Posts: 9
Default

Thanks for the help.

awk 'BEGIN{FS=OFS="\t"}FNR==NR{a[$1]=$2}FNR!=NR{print $1,a[$1],a[$2],$2,$3,$4,$5,$6,$7,$8,$9}' exonlegth.txt coor.txt > coor1.txt

worked for me and I was able to join the two files without sorting.

Thanks all for help

Manav
kapoormanav is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO