Seqanswers Leaderboard Ad

**Heisman** · 08-04-2013, 01:27 PM

I wrote an equivalent program in C++ and it works in <5 seconds on my work computer, lol. Still have no idea what's going wrong with the python script. Sometimes I hate computers.

**fpr** · 08-04-2013, 06:32 PM

Hi Hiseman,

When you are not sure what is causing the slow down, use a profiler....

Code:

python -mcProfile <yourScript.py>

Anyhow I will venture a few guesses here... I don't know what version of python you are using.. but the first thing that stands out is your use of the range command. If you are using python 2.x should probably want to be using xrange, since range will basically be allocating a new list/array at each loop. The modulo operation can be expensive too... Another thing that comes to mind is that you are not using slicing to move around the array. With slicing this is how the top of your loop would like like:

Code:

dataIndex = range(len(data))
for i in range(0,len(sample)):
        for j in dataIndex[begin:]:
                temp = temp + 1
                if ((temp%100) == 0):
                        print("temp: ", temp)
                if (sample[i][1] == data[j][1]):
                        begin=(j+1)
                        break

There are many things that can be done but I am not 100% sure what you are trying to do. If what you are trying to do is to intersect to bed files I would use a dictionary to store the smaller one (preferably), then compare if a line is present in second file while it is being read. A quick way to implement this:

Code:

import csv
import sys
from collections import OrderedDict

datafile = open(sys.argv[1], "rb")
datareader = csv.reader(datafile)
data = OrderedDict()

for lineNo,row in enumerate(datareader, 1):
    data[row]=lineNo

samplefile = open(sys.argv[2], "rb")
samplereader = csv.reader(samplefile)
sample = []

for lineNo,row in enumerate(samplereader, 1):
    if row in data: print('MATCH LINE %s:%d == %s:%d'%(sys.argv[2],lineNo, sys.argv[1],data[row]))

I have not tested this... but this is general idea.

**Heisman** · 08-04-2013, 07:10 PM

Thank you, I'm a new python user. School is starting up for me tomorrow so it'll be a bit before I can look at this in more detail but I will do so within a week. Thank you very much for the feedback.

**fpr** · 08-05-2013, 05:19 AM

Originally posted by Heisman View Post

Thank you, I'm a new python user. School is starting up for me tomorrow so it'll be a bit before I can look at this in more detail but I will do so within a week. Thank you very much for the feedback.

No Problem! BTW My code is just for illustration. Usually I would try to create a key based on the rows to use with the dictionary (instead of the whole row), e.g. chrom+pos. It all depends on the problem.

Python is cool language, explore dictionaries and sets the are really useful.

Good luck.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

python script running slowly, can't figure out why

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News