Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • question about random select certain number of reads from ChIP-seq bed file

    Hi everyone:
    Now I want to normalize ChIP-seq bed file reads to certain number by randomly choose certain number of reads from the original bed file.
    Dose anyone have such kind of scripts to do this? I tried to do the perl script by ran() operator. It will work for small bed file. But it will calculate forever if the reads if around 38 million. So is there any way I can unsort the @array element? So that I can use head to form a certain number of reads by randomly selection. Or Does anyone have better idea about that?

    Thank you

    Lijing

  • #2
    Hi Lijing,

    The (python) script below will downsample a file by writing out each line with probability p. E.g. if you want to sample a bed file from 38 million reads to 1 million p= 1/38 (~0.026):

    Code:
    linesampler.py full.bed sampled.bed 0.026
    You can optionally pass a 4th argument as seed to the random number generator to make the sampling repeatable.

    It's not super-fast being Python but it shouldn't take more than few minutes for sampling a bed file of tens of millions of rows.

    The code for linesampler.py:

    Code:
    #!/usr/local/bin/python
    
    import sys
    import random
    
    if len(sys.argv) < 3 or len(sys.argv) > 5:
        sys.exit("""
    Sample lines from file.
    
    USAGE:
        linesampler.py <file in> <file out> <p> <seed>
        
    file in:  File to sample
    file out: Output file
    p:        Probability of a line to be sampled (sent to output)
    seed :    (Optional) Seed to start the sequence of random numbers
                """)
        
    p= float(sys.argv[3])
    if p < 0 or p > 1:
        sys.exit('Invalid p (%s): p must be between 0 and 1' %(sys.argv[3]))
    
    if len(sys.argv) == 5:
        rseed= sys.argv[4]
    else:
        rseed= None
    random.seed(rseed)
    
    fin= open(sys.argv[1])
    fout= open(sys.argv[2], 'w')
    
    for line in fin:
        prand= random.uniform(0,1)
        if prand <= p:
            fout.write(line)
    
    fin.close()
    fout.close()
    sys.exit()
    Hope it helps
    Dario

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:37 PM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 06:07 PM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    67 views
    0 likes
    Last Post seqadmin  
    Working...
    X