Seqanswers Leaderboard Ad

**sBeier** · 02-12-2013, 11:25 PM

If you don't have memory considerations, why don't you read in both files first, map them in a way useful for you (in a dict, or OrderedDict) and then iterate once over the first map?

**dariober** · 02-13-2013, 01:35 AM

Originally posted by gwilymh View Post

Does anyone know a technique to begin the next iteration of the first loop without beginning from the very first line of the second input file, i.e. a way to ‘save’ the iterator position on the second for loop?

What about using the tell() and seek() file methods? They seem to do what you need. From python docs http://docs.python.org/2/tutorial/inputoutput.html:

f.tell() returns an integer giving the file object’s current position in the file, measured in bytes from the beginning of the file. To change the file object’s position, use f.seek(offset, from_what)

(However, depending exactly on what you need to do a better data structure like interval trees might scale better...)

Best
Dario

**ECO** · 02-15-2013, 02:32 PM

Here's a tip, post in the correct forum!

Moving to Bioinfx.

**Luyi Tian** · 02-16-2013, 09:22 AM

First
there are two ways to read lines from file and remember the 'position'

Code:

    file=open('yourfile','r')
    file.readline()##read one line from file. if you call it the second times it will return the next line
    file.next()##use the generator. return one line from file. similar to readline()

Second
you could use pypy to accerelate your script(if your script contains a lot 'for' 'while' loops, use pypy would make it 10 times faster). also you could use file.readlines(10000) to read 10000 line each time to save I/O time.

**maubp** · 02-18-2013, 10:12 AM

It sounds like you can just open file 1 and file 2 once BEFORE starting the nested loops, but perhaps I've not understood your problem fully.

Based on the filenames you might have one line per gene in both files, so a loop iterating over both files together could work. For example something like this:

Code:

import itertools
handle1 = open(...)
handle2 = open(...)
for line1, line2 in itertools.zip(handle1, handle):
    #assert line1 and line2 for same gene
handle1.close()
handle2.close()

**rflrob** · 02-18-2013, 04:00 PM

Originally posted by maubp View Post

It sounds like you can just open file 1 and file 2 once BEFORE starting the nested loops, but perhaps I've not understood your problem fully.

Based on the filenames you might have one line per gene in both files, so a loop iterating over both files together could work. For example something like this:

Code:

import itertools
handle1 = open(...)
handle2 = open(...)
for line1, line2 in itertools.zip(handle1, handle):
    #assert line1 and line2 for same gene
handle1.close()
handle2.close()

Even if you don't have one line per gene, you can still use the same trick of opening the handles once:

Code:

handle1 = open(...)
handle2 = open(...)

for gene in handle1:
    # do stuff
    for snp in handle2:
        # do stuff
        if condition: 
             break

You'd have to be careful not to lose the first snp for each gene, of course.

As a hint, there are code tags that you can use that will maintain the indentation of your post, which will make understanding your python code much easier.

**brentp** · 02-19-2013, 07:29 AM

use bedtools or pybedtools in python. if your data is in bed format, this will make your script much faster and much simpler.

Topics	Statistics	Last Post
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, Yesterday, 02:46 PM	0 responses 11 views 0 likes	Last Post by seqadmin Yesterday, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 13 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 23 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM

Seqanswers Leaderboard Ad

Announcement

Tips on using nested for statements in Python to maximize program efficiency

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News