Seqanswers Leaderboard Ad

**gene_x** · 02-26-2013, 02:02 PM

Originally posted by rkk View Post

Hello,

I have a file like the following

chr1 1234
chr1 2345
chr2 94837
chr2 73457

how can I split this data into two files

chr1.txt

chr1 1234
chr1 2345

chr2.txt

chr2 94837
chr2 73457

Thanks in advance.

$ awk '$1 =="chr1"' file > file1
$ awk '$1 =="chr2"' file > file2

This in therapy should work..

**gokhulkrishnakilaru** · 02-26-2013, 02:10 PM

Code:

awk '{print > $1".txt"}' input

**alexdobin** · 02-26-2013, 02:11 PM

A more "universal" way to do it:
awk '{print > $1 ".txt"}' Input.file.txt

**gene_x** · 02-26-2013, 02:21 PM

Good to learn a easier way to do this.. can you explain a bit how did it work?

**rkk** · 02-26-2013, 02:43 PM

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

I am getting the above error...

**gene_x** · 02-26-2013, 02:45 PM

Originally posted by rkk View Post

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

I am getting the above error...

make sure you pay attention to sigle quote, double quote, brackets etc. It worked for me.

**rkk** · 02-26-2013, 02:48 PM

$head -5 test.txt

1 9992
1 9992
1 9993
1 9994
1 9994

$awk '{print > $1 ".txt"}' test.txt

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

This is what I get for my test.txt file

**gene_x** · 02-26-2013, 02:51 PM

Originally posted by rkk View Post

$head -5 test.txt

1 9992
1 9992
1 9993
1 9994
1 9994

$awk '{print > $1 ".txt"}' test.txt

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

This is what I get for my test.txt file

It worked for me.... not sure why it's not working for you.

**gokhulkrishnakilaru** · 02-26-2013, 02:55 PM

Originally posted by rkk View Post

$head -5 test.txt

1 9992
1 9992
1 9993
1 9994
1 9994

$awk '{print > $1 ".txt"}' test.txt

awk: syntax error at source line 1
context is
{print > $1 >>> ".txt" <<<
awk: illegal statement at source line 1

This is what I get for my test.txt file

Where r u running it on?

Are you on linux server or running at your Mac's terminal?

Try using nawk or gawk instead of awk.

**gokhulkrishnakilaru** · 02-26-2013, 03:00 PM

Originally posted by gene_x View Post

Good to learn a easier way to do this.. can you explain a bit how did it work?

Code:

awk '{print > $1".txt"}' input

$1 refers to the first column.

for each distinct column1,

Code:

print

to another file

Code:

with the same column name

Code:

$1

**gene_x** · 02-26-2013, 03:02 PM

Originally posted by gokhulkrishnakilaru View Post

Code:

awk '{print > $1".txt"}' input

$1 refers to the first column.

for each distinct column1,

Code:

print

to another file

Code:

with the same column name

Code:

$1

I can understand print to another file with the same column name. What I don't get is where the separation based on first column contents happened..

**rkk** · 02-26-2013, 03:27 PM

I should use that command in LINUX...

Now, I have another issue

I have a file like following..I need to bin the first column in 100bp regions and count the second column value for that bin
10175 1
10179 1
10189 1
10191 1
10201 1
10243 1
10249 1
10262 1
10313 1
10414 1
10485 1
10499 1

The output should be something like this..

10101-10200 4
10201-10300 4
10301-10400 1
10401-10500 3

Can someone help with this..

Thanks in advance..

**gokhulkrishnakilaru** · 02-26-2013, 04:01 PM

Originally posted by rkk View Post

I should use that command in LINUX...

Now, I have another issue

I have a file like following..I need to bin the first column in 100bp regions and count the second column value for that bin
10175 1
10179 1
10189 1
10191 1
10201 1
10243 1
10249 1
10262 1
10313 1
10414 1
10485 1
10499 1

The output should be something like this..

10101-10200 4
10201-10300 4
10301-10400 1
10401-10500 3

Can someone help with this..

Thanks in advance..

Do you already know your bins?

If not, what are your start values and end values to consider bins at 100bp?

**rkk** · 02-26-2013, 04:04 PM

command has to identify min and max value from col1 values.. and then bin that into 100bp regions...

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Awk command

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News