![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Common elements within two set of miRNAs | Giorgio C | Bioinformatics | 1 | 01-05-2012 10:35 AM |
Scripting help to identify adaptors in reads | Giorgio C | Bioinformatics | 7 | 11-10-2011 07:01 AM |
How to extract Common Genes from 2 spreadsheets | byou678 | Bioinformatics | 4 | 10-04-2011 08:42 PM |
Fastq sorting and common part? | stoker | Bioinformatics | 0 | 07-07-2011 01:53 AM |
Help please, regulatory elements | johnshembb | Bioinformatics | 1 | 04-29-2010 10:57 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Hi all,
I have a question hoping someone can help me. I have a file with two columns A and B with a list of gene coordinates in each of them: I'd like to write in another file every element that is present in column A but not in B. (I mean I'd like to write in output the NON common elements of the two columns). Do you have any suggestions ( little script in perl or python) not exel cuz the list is huge. Thanks you in advance, Giorgio |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311
|
![]()
Hi Giorgio,
See if this helps: say you have test.txt like this (tab-separated): Code:
chr1 chr2 chr2 chr1 chr3 chr4 chr5 chr4 This python script should do it. It assumes that all the unique elements in column A can fit in memory: Code:
python -c " fin= open('test.txt') col_a= set() for line in fin: ## Unique elements in column A a= line.strip().split()[0] col_a.add(a) fin.seek(0) for line in fin: ## Print out elements in column B not in set A b= line.strip().split()[1] if b not in col_a: print(b) fin.close() " Hope it helps and I haven't made any mistake! Dario |
![]() |
![]() |
![]() |
#3 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Thank you so much for your answer I've tried but it give me an error:
Traceback (most recent call last): File "script.py", line 9, in <module> a= line.strip().split()[0] IndexError: list index out of range Do you know what may be the problem? |
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311
|
![]() Quote:
In fact, this version will skip blank lines: Code:
python -c " fin= open('test.txt') col_a= set() for line in fin: if line.strip() == '': continue ## Unique elements in column A a= line.strip().split()[0] col_a.add(a) fin.seek(0) for line in fin: if line.strip() == '': continue ## Print out elements in column B not in set A b= line.strip().split()[1] if b not in col_a: print(b) fin.close() " |
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
"comm" command is useful for this stuff ..
Code:
-bash-3.00$ cat junk chr1 chr2 chr2 chr1 chr3 chr4 chr5 chr4 -bash-3.00$ awk '{print $1}' < junk | sort | uniq > junkcol1 -bash-3.00$ awk '{print $2}' < junk | sort | uniq > junkcol2 -bash-3.00$ comm -3 junkcol1 junkcol2 chr3 chr4 chr5 check out the "comm" command with "man comm" or via search engine. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Rather than write your own tool, BEDTools intersectBed can do exactly this, if you convert your lists to .bed format
|
![]() |
![]() |
![]() |
#7 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Thank you Dariober,
as you said it was a problem of some empty lines in the middle of the list that I did not see at first. Now it works good and however the 'skip blank script' version works greatly. Richard Finney thank you for your other suggestion, it works good too. |
![]() |
![]() |
![]() |
#8 | |
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311
|
![]() Quote:
Good luck! Dario |
|
![]() |
![]() |
![]() |
#9 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Yes infact, that was one of the thing I've noticed, in the tab separeted file needs to add (\t) to the script.
By the way it works good ! Thank you again for your precious help. Cheers, Giorgio |
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Austin, tx Join Date: Aug 2013
Posts: 2
|
![]()
This is a great discussion for a novice scriptor of Python and bash/C shells. Thank you Giorgio C for all of your questions. They provide food for my thoughts. And thanks to Richard Finney for pointing out the use of the "comm" and "awk" commands.
This code is a combo of my first thoughts on how to approach the problem with subsequent introduction of the "comm" command. Code:
bash-3.2$ cat junk chr1 chr2 chr2 chr1 chr3 chr4 chr5 chr4 bash-3.2$ cut -f1 junk | sort | uniq > junkcol1 bash-3.2$ cut -f2 junk | sort | uniq > junkcol2 bash-3.2$ comm -3 junkcol1 junkcol2 > commonjunk bash-3.2$ cat commonjunk chr3 chr4 chr5 John |
![]() |
![]() |
![]() |
#11 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Sure John,
this is a very useful forum |
![]() |
![]() |
![]() |
Thread Tools | |
|
|