SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
sam to bam conversion error, no @SQ lines in the header, missing header? efoss Bioinformatics 17 12-03-2015 05:28 AM
Error while python code terse bio_informatics General 2 10-22-2014 12:18 PM
python and NGS data markero Bioinformatics 8 05-16-2014 12:27 PM
How to add Arabidopsis T-DNA lines as tracks on IGV upendra_35 RNA Sequencing 2 09-22-2013 07:05 PM
Combining data from 2 lines suludana Illumina/Solexa 1 01-09-2009 09:37 AM

Reply
 
Thread Tools
Old 12-19-2016, 06:30 AM   #1
visse226
Junior Member
 
Location: The Netherlands

Join Date: Nov 2016
Posts: 9
Talking Add 'missing' lines of data by using python code

So I am a beginner when it comes to programming and python and such. But I think I have a very simple question.

I have large tab-delimited files that for example contain lines like this:

10000 7
20000 1
30000 2
60000 3

What I want to have, is a file that also contains the 'missing' lines, such as this:

10000 7
20000 1
30000 2
40000 0
50000 0
60000 3

The files are rather large as I am working with whole genome sequence data. The first column is basically a position in the genome and the second column is the number of SNPs I find within that 10kb window. However, I don't think this information is even relevant, I just want to write a simple python code that will add these lines to the file by using if else statements.

So if the position does not match the position of the previous line + 10000, the 'missing line' is written, otherwise the normal occurring line is written.

I just foresee one problem in this, namely when several lines in a row are missing (as in my example). Does anyone have a smart solution for this simple problem?

Many thanks!
visse226 is offline   Reply With Quote
Old 12-19-2016, 06:48 AM   #2
wdecoster
Member
 
Location: Antwerp, Belgium

Join Date: Oct 2015
Posts: 95
Default

An easy solution would be to loop over the file and have a variable 'previous':

!Untested sample code generated by tired coffee deprived me:

Code:
previous = 0
for line in file:
    now = line.split('\t')[0]
    if  now != previous + 10000:
        for n in range(previous + 10000, now, step=10000):
            print(n + "\t0")
    print(line)
    previous = now
wdecoster is offline   Reply With Quote
Old 12-22-2016, 04:50 AM   #3
visse226
Junior Member
 
Location: The Netherlands

Join Date: Nov 2016
Posts: 9
Default

I will try this soon, definitely!. It always looks so simple in the end but writing it yourself is still a struggle when you've only just started figuring out coding. Thank you so much I might come back to it!
visse226 is offline   Reply With Quote
Old 12-22-2016, 08:04 AM   #4
visse226
Junior Member
 
Location: The Netherlands

Join Date: Nov 2016
Posts: 9
Default

If I do this though I get an error that the range function does not take keywords as arguments. Not sure how to solve this yet
visse226 is offline   Reply With Quote
Reply

Tags
data, file, missing, python

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:01 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO