Seqanswers Leaderboard Ad

**rahularjun86** · 10-15-2012, 02:30 AM

Hi,
You can use unix 'join' command for this task. Please paste first 5 lines of your input/output files, So that one could write the script. You may refer this link: http://www.albany.edu/~ig4895/join.htm.
Best wishes,
Rahul

**AdrianJ217** · 10-15-2012, 03:09 AM

Hi, thank you. Here are the first several lines of the file:

Gene_Horn Uninduced_Horn Gene_DEGSeq Uninduced_DEGSeq
Tb04.24M18.150 12 Tb04.24M18.150 172
Tb04.3I12.100 21 Tb04.3I12.100 11
Tb05.28F8.200 97 Tb05.5K5.100 52
Tb05.30F7.410 43 Tb05.5K5.10 19
Tb06.3A7.270 572 Tb05.5K5.110 5
Tb06.3A7.960 74 Tb05.5K5.120 9
Tb07.26A24.210 100 Tb05.5K5.130 24
Tb09.142.0320 56 Tb05.5K5.140 63
Tb09.142.0350 201 Tb05.5K5.150 12

There's thousands of these lines, and basically I want a script that would look at Gene_Horn and Gene_DEGSeq and only find those genes that are found in both columns and to put that as the first column in the output file along with the corresponding 2 columns of reads (Unindiced_Horn and Uninduced_DEGSeq).

**rahularjun86** · 10-15-2012, 04:10 AM

You mean where the column1(Gene_Horn) and column3(Gene_DEGSeq) are same, print the column1, column2 and column3?

**AdrianJ217** · 10-15-2012, 04:15 AM

I mean where column1 (Gene_Horn) and column3 (Gene_DEGSeq) are the same, print a column containing the genes that overlap (called column1), along with column2(Uninduced_Horn) which is the reads of that gene from Horn, and column3 (Uninduced_DEGSeq) which is the reads of that gene from DEGSeq. This way, I can plot both sets of reads for each gene on a scatter plot to see how much variance there is between both data sets.

**rahularjun86** · 10-15-2012, 04:46 AM

Ok thanks, Please try the following unix one liner:

awk '$1 ~ $3{print$1"\t"$2"\t"$4}' input.txt > output.txt

Best,
Rahul

**rahularjun86** · 10-15-2012, 04:51 AM

Oops sorry, Please use the following command, It would consider the word boundaries and will generate accurate results:

awk '"\b"$1"\b" ~ "\b"$3"\b"{print$1"\t"$2"\t"$4}' demo.txt > out.txt

Thnx

**AdrianJ217** · 10-15-2012, 04:58 AM

Rahul, thanks so much but it only gave me 3 that lined up. I checked and the problem is that the one liner you gave me only looks for those lines that exactly match up and gives me those results, but column1 and column3 dont line up because there are genes that are in one and not in the other. So i need a script that will look at all of column 1 and all of column 3 and give me all those genes that are found in both, not just the ones that are on the same parallel line.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Perl Script

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News