Seqanswers Leaderboard Ad

**rflrob** · 10-18-2012, 12:44 PM

Assuming you want to keep the seq number, it could be done with a moderately simple python script:

Code:

fh = open('file_name')
print fh.readline() # Clear the header
best_lines = {}
for line in fh:
    id, fpkm = line.strip().split()
    fpkm = float(fpkm)  # Turn into a number
    id_base, id_seqnum = id.rsplit('_', 1) # Assume everything before _seq is the same

    if id_base not in best_lines:
        best_lines[id_base] = (fpkm, id_seqnum)
    else:
        if fpkm > best_lines[id_base][0]:
            best_lines[id_base] = (fpkm, id_seqnum)

for id_base in best_lines:
    fpkm, id_seqnum = best_lines[id_base]
    print id_base+"_"+id_seqnum, fpkm

This won't necessarily retain the original order of the file, but will deal with the possibility that, for instance, comp267138_c0_seq1 and comp267138_c0_seq2 aren't in adjacent lines.

**upendra_35** · 10-18-2012, 12:59 PM

Originally posted by rflrob View Post

Assuming you want to keep the seq number, it could be done with a moderately simple python script:

Code:

fh = open('file_name')
print fh.readline() # Clear the header
best_lines = {}
for line in fh:
    id, fpkm = line.strip().split()
    fpkm = float(fpkm)  # Turn into a number
    id_base, id_seqnum = id.rsplit('_', 1) # Assume everything before _seq is the same

    if id_base not in best_lines:
        best_lines[id_base] = (fpkm, id_seqnum)
    else:
        if fpkm > best_lines[id_base][0]:
            best_lines[id_base] = (fpkm, id_seqnum)

for id_base in best_lines:
    fpkm, id_seqnum = best_lines[id_base]
    print id_base+"_"+id_seqnum, fpkm

This won't necessarily retain the original order of the file, but will deal with the possibility that, for instance, comp267138_c0_seq1 and comp267138_c0_seq2 aren't in adjacent lines.

Hi rflrob, it worked perfectly. I have been struggling to write something like this in perl for a while but couldn't get it to work and your script worked like a charm. Don't worry about the order of id's as i am not too worried about them as long as i filter the columns. Thanks a lot again man.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

how do i filter rownames based on column value

Comment

Comment

Latest Articles

ad_right_rmr

News