Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bug in samtools view -s?

    Hi, I'm experiencing difficulties trying to downsample .bam files using samtools view -s. Specifically some of the commands fail while others work; this seems sometimes to be correlated with the -s float argument being > 0.5 (but not always). Here I'm c/p'ing some of the code that worked and some which failed.

    Thanks to any helpful suggestions!

    samtools view -b -s 0.271 1.bam > 1_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.5077 2.bam > 2_ds.bam # gives 0 reads unexpectedly
    samtools view -b -s 0.2113 3.bam > 3_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.3322 4.bam > 4_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.5306 5.bam > 5_ds.bam# gives 0 reads unexpectedly
    samtools view -b -s 0.204 6.bam > 6_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.3841 7.bam > 7_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.4691 8.bam > 8_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.6861 9.bam > 9_ds.bam # gives 0 reads unexpectedly
    samtools view -b -s 0.2261 10.bam > 10_ds.bam # gives 730697 reads as expected

    samtools view -b -s 0.6653 23.bam > 23_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0444 24.bam > 24_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0492 25.bam > 25_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.1648 26.bam > 26_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0801 27.bam > 27_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.171 28.bam > 28_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0979 29.bam > 29_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0511 30.bam > 30_ds.bam # gives 730697 reads as expected

  • #2
    Not answering your question directly but you could use "reformat.sh" from BBMap suite to do this as well. You can specify sampling parameters with more granularity (even as certain number of reads etc).

    Comment


    • #3
      Originally posted by jkzebrafish View Post
      Hi, I'm experiencing difficulties trying to downsample .bam files using samtools view -s. Specifically some of the commands fail while others work; this seems sometimes to be correlated with the -s float argument being > 0.5 (but not always). Here I'm c/p'ing some of the code that worked and some which failed.
      It seems there are specific read alignments that are causing the failures. You could confirm this by using taking one of the .bam's that failed, use different random seeds w/ a small sample fraction, and you should see the failure some percentage on of the time.

      Are these alignments of very long reads? (> 65k bp). Alignments with cigar strings longer than the 16-bit integer limit (65,535) can behave strangely

      Comment


      • #4
        Thanks cstack for the response. These are paired end 75bp reads, nothing crazy.

        Here is a little more information:

        samtools view -b -s 0.6861 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 0.4861 9.bam > 9_ds.bam # gives ~50k reads
        samtools view -b -s 0.5 9.bam > 9_ds.bam # gives ~50k reads
        samtools view -b -s 0.5001 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 1.6861 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 5.6861 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 100.6861 9.bam > 9_ds.bam # gives 0 reads

        No errors or warnings are given, hence my confusion. Thanks for any insight.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        23 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Working...
        X