Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Add 'missing' lines of data by using python code

    So I am a beginner when it comes to programming and python and such. But I think I have a very simple question.

    I have large tab-delimited files that for example contain lines like this:

    10000 7
    20000 1
    30000 2
    60000 3

    What I want to have, is a file that also contains the 'missing' lines, such as this:

    10000 7
    20000 1
    30000 2
    40000 0
    50000 0
    60000 3

    The files are rather large as I am working with whole genome sequence data. The first column is basically a position in the genome and the second column is the number of SNPs I find within that 10kb window. However, I don't think this information is even relevant, I just want to write a simple python code that will add these lines to the file by using if else statements.

    So if the position does not match the position of the previous line + 10000, the 'missing line' is written, otherwise the normal occurring line is written.

    I just foresee one problem in this, namely when several lines in a row are missing (as in my example). Does anyone have a smart solution for this simple problem?

    Many thanks!

  • #2
    An easy solution would be to loop over the file and have a variable 'previous':

    !Untested sample code generated by tired coffee deprived me:

    Code:
    previous = 0
    for line in file:
        now = line.split('\t')[0]
        if  now != previous + 10000:
            for n in range(previous + 10000, now, step=10000):
                print(n + "\t0")
        print(line)
        previous = now

    Comment


    • #3
      I will try this soon, definitely!. It always looks so simple in the end but writing it yourself is still a struggle when you've only just started figuring out coding. Thank you so much I might come back to it!

      Comment


      • #4
        If I do this though I get an error that the range function does not take keywords as arguments. Not sure how to solve this yet

        Comment


        • #5
          I won't write out the code since I have to go to a meeting, but you could also take advantage of the power of Pandas data frame objects. If you are new to Python, learn Pandas as soon as possible.

          But you could create a data frame of one column that contain the values:
          10000
          20000
          30000
          ...
          max_value

          Then create a data frame object of your actual values. Then you simply do a "join" on the two tables and it will fill in the missing values by virtue of joining the 2 tables.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            Today, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 07:17 AM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-02-2024, 08:06 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-30-2024, 12:17 PM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-29-2024, 10:49 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Working...
          X