Hi, I have 2 sets of 2 columns of data. One set has a column of gene names and the other column is the number of reads per gene. The second set of columns includes one column also with gene names and the other also with the number of reads. Each set used a different program for mapping the genes and counting reads so the columns dont match up. I would like to plot both sets of columns to see the correlation but first I need to create one column of genes that overlap from both and then include 2 columns of reads next to it. Does anyone have a Perl script that can do this?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi,
You can use unix 'join' command for this task. Please paste first 5 lines of your input/output files, So that one could write the script. You may refer this link: http://www.albany.edu/~ig4895/join.htm.
Best wishes,
RahulLast edited by rahularjun86; 10-15-2012, 02:39 AM.Rahul Sharma,
Ph.D
Frankfurt am Main, Germany
-
Hi, thank you. Here are the first several lines of the file:
Gene_Horn Uninduced_Horn Gene_DEGSeq Uninduced_DEGSeq
Tb04.24M18.150 12 Tb04.24M18.150 172
Tb04.3I12.100 21 Tb04.3I12.100 11
Tb05.28F8.200 97 Tb05.5K5.100 52
Tb05.30F7.410 43 Tb05.5K5.10 19
Tb06.3A7.270 572 Tb05.5K5.110 5
Tb06.3A7.960 74 Tb05.5K5.120 9
Tb07.26A24.210 100 Tb05.5K5.130 24
Tb09.142.0320 56 Tb05.5K5.140 63
Tb09.142.0350 201 Tb05.5K5.150 12
There's thousands of these lines, and basically I want a script that would look at Gene_Horn and Gene_DEGSeq and only find those genes that are found in both columns and to put that as the first column in the output file along with the corresponding 2 columns of reads (Unindiced_Horn and Uninduced_DEGSeq).
Comment
-
I mean where column1 (Gene_Horn) and column3 (Gene_DEGSeq) are the same, print a column containing the genes that overlap (called column1), along with column2(Uninduced_Horn) which is the reads of that gene from Horn, and column3 (Uninduced_DEGSeq) which is the reads of that gene from DEGSeq. This way, I can plot both sets of reads for each gene on a scatter plot to see how much variance there is between both data sets.
Comment
-
Rahul, thanks so much but it only gave me 3 that lined up. I checked and the problem is that the one liner you gave me only looks for those lines that exactly match up and gives me those results, but column1 and column3 dont line up because there are genes that are in one and not in the other. So i need a script that will look at all of column 1 and all of column 3 and give me all those genes that are found in both, not just the ones that are on the same parallel line.
Comment
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
34 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
||
Started by seqadmin, 04-04-2024, 08:48 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 08:48 AM
|
||
Started by seqadmin, 04-01-2024, 06:45 AM
|
0 responses
45 views
0 likes
|
Last Post
by seqadmin
04-01-2024, 06:45 AM
|
||
Started by seqadmin, 03-27-2024, 06:37 PM
|
0 responses
32 views
0 likes
|
Last Post
by seqadmin
03-27-2024, 06:37 PM
|
Comment