Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Add 'missing' lines of data by using python code

    So I am a beginner when it comes to programming and python and such. But I think I have a very simple question.

    I have large tab-delimited files that for example contain lines like this:

    10000 7
    20000 1
    30000 2
    60000 3

    What I want to have, is a file that also contains the 'missing' lines, such as this:

    10000 7
    20000 1
    30000 2
    40000 0
    50000 0
    60000 3

    The files are rather large as I am working with whole genome sequence data. The first column is basically a position in the genome and the second column is the number of SNPs I find within that 10kb window. However, I don't think this information is even relevant, I just want to write a simple python code that will add these lines to the file by using if else statements.

    So if the position does not match the position of the previous line + 10000, the 'missing line' is written, otherwise the normal occurring line is written.

    I just foresee one problem in this, namely when several lines in a row are missing (as in my example). Does anyone have a smart solution for this simple problem?

    Many thanks!

  • #2
    An easy solution would be to loop over the file and have a variable 'previous':

    !Untested sample code generated by tired coffee deprived me:

    Code:
    previous = 0
    for line in file:
        now = line.split('\t')[0]
        if  now != previous + 10000:
            for n in range(previous + 10000, now, step=10000):
                print(n + "\t0")
        print(line)
        previous = now

    Comment


    • #3
      I will try this soon, definitely!. It always looks so simple in the end but writing it yourself is still a struggle when you've only just started figuring out coding. Thank you so much I might come back to it!

      Comment


      • #4
        If I do this though I get an error that the range function does not take keywords as arguments. Not sure how to solve this yet

        Comment


        • #5
          I won't write out the code since I have to go to a meeting, but you could also take advantage of the power of Pandas data frame objects. If you are new to Python, learn Pandas as soon as possible.

          But you could create a data frame of one column that contain the values:
          10000
          20000
          30000
          ...
          max_value

          Then create a data frame object of your actual values. Then you simply do a "join" on the two tables and it will fill in the missing values by virtue of joining the 2 tables.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X