SEQanswers (
-   General (
-   -   Add 'missing' lines of data by using python code (

visse226 12-19-2016 05:30 AM

Add 'missing' lines of data by using python code
So I am a beginner when it comes to programming and python and such. But I think I have a very simple question.

I have large tab-delimited files that for example contain lines like this:

10000 7
20000 1
30000 2
60000 3

What I want to have, is a file that also contains the 'missing' lines, such as this:

10000 7
20000 1
30000 2
40000 0
50000 0
60000 3

The files are rather large as I am working with whole genome sequence data. The first column is basically a position in the genome and the second column is the number of SNPs I find within that 10kb window. However, I don't think this information is even relevant, I just want to write a simple python code that will add these lines to the file by using if else statements.

So if the position does not match the position of the previous line + 10000, the 'missing line' is written, otherwise the normal occurring line is written.

I just foresee one problem in this, namely when several lines in a row are missing (as in my example). Does anyone have a smart solution for this simple problem?

Many thanks!

wdecoster 12-19-2016 05:48 AM

An easy solution would be to loop over the file and have a variable 'previous':

!Untested sample code generated by tired coffee deprived me:


previous = 0
for line in file:
    now = line.split('\t')[0]
    if  now != previous + 10000:
        for n in range(previous + 10000, now, step=10000):
            print(n + "\t0")
    previous = now

visse226 12-22-2016 03:50 AM

I will try this soon, definitely!. It always looks so simple in the end but writing it yourself is still a struggle when you've only just started figuring out coding. Thank you so much I might come back to it!

visse226 12-22-2016 07:04 AM

If I do this though I get an error that the range function does not take keywords as arguments. Not sure how to solve this yet

All times are GMT -8. The time now is 08:44 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.