Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools -d option to limit read depth

    Dear all,

    I need some help to understand the -d option of samtools, which is meant to do:

    - At a position, read maximally INT reads per input BAM. [250]


    I am working on PCR amplified data and the extreme read depth ~ 2,000 is affecting the QC steps. Hence I would like to make sure that I do not load more than 250 reads at each position. However, whatever I do using:

    samtools mpileup -d 200 -f ...

    I still get the same VCF output:
    11 26463579 . C T 9.52 . DP=1990;VDB=0.0000;AF1=0.5;AC1=1;DP4=960,0,998,0;MQ=60;FQ=12.3;PV4=1,0,1,1
    With the DP4 flags indicating a read depth around 2,000, so the full dataset without any cap on read depth. It looks like samtools is completely ignoring the -d option. Am I missing something here? How do I force samtools to cap the read depth?

    Suggestions welcome,

    Thanks,

    Vincent

  • #2
    I've never used the -d option but I think you are misinterpreting what it does.

    I bet what it does it looks at the bam, and after it sees that 200 reads have all started at the exact same base, it skips all the rest of the reads that start at that base, and moves onto the next base.

    But that would still allow for lots more that 200x coverage at any one base.

    So try dropping that -d even further. The other thing you can try is downsampling. Picard is one program that can do that.

    Comment


    • #3
      Thank you for your answer. But I can drop -d to very low levels, nothing changes in the resulting vcf. Moreover, because of the PCR amplification all my reads should start at the same place so the read depth should be effectively capped even with your interpretation. It looks to me like this samtools -d option is broken.

      Thanks for the Picard suggestion, I think I'll try that, but it would be easier if it could be implemented at the calling stage rather than creating yet another BAM file.

      Comment


      • #4
        I might be wrong, but I think -d will only affect the pileup output, but not the snp prediction.

        Comment


        • #5
          Originally posted by vplagnol View Post
          Thank you for your answer. But I can drop -d to very low levels, nothing changes in the resulting vcf. Moreover, because of the PCR amplification all my reads should start at the same place so the read depth should be effectively capped even with your interpretation. It looks to me like this samtools -d option is broken.

          Thanks for the Picard suggestion, I think I'll try that, but it would be easier if it could be implemented at the calling stage rather than creating yet another BAM file.
          Did you ever find a solution to your problem with the -d parameter? I believe I am dealing with very similar data as you where I have PCR amplified amplicons and targeted re-sequencing generating depth of like 10,000. We you trying to limit the depth that is reported?

          I am running into inconsistency where the depth reported by samtools is very small compared to what is reported in IGV.

          Comment


          • #6
            For my situation, I am running into inconsistencies with what is reported in IGV and samtools mpileup.

            Where in IGV, I see 9162 reads supporting a position. When I run samtools mpileup like this:

            Code:
            samtools mpileup -A -B -q 1 -f ref.fa in.bam > out.mpileup
            This position is reported as only containing 562 reads. I figured setting the -d parameter to a higher number would just solve this:

            Code:
            samtools mpileup -A -B -q 1 -d 10000 -f ref.fa in.bam > out.mpileup
            But instead of returning 9162, it only reports 2562. When I increased it to 50,000:

            Code:
            samtools mpileup -A -B -q 1 -d 50000 -f ref.fa in.bam > out.mpileup
            This ended up reporting all 9162 reads at this positions. I am not clear as to how the -d parameter actual works because it is behaving oddly. Also, by default it says the max depth is 250, but when you run samtools without using -d it always says the max depth per bam is set to 8000 at a position. So again, not sure why there is this discrepancy...

            Does anyone know?

            Thanks,

            Comment


            • #7
              I would suggest setting the -d flag very high (like 100 million) in order to get accurate coverage results. If your coverage at a position is above the threshold, then the reported depth may not be correct. It may report a number much less than the threshold even if the coverage is above the threshold. This depth flag was probably put in there to limit the amount of memory being used - otherwise the maximum memory needed is proportional to the deepest depth. Setting the -d flag very high will make sure you don't hit that cap, which is especially useful for amplicons since you expect many reads to start at the same position.

              Justin

              Comment


              • #8
                Originally posted by BAMseek View Post
                I would suggest setting the -d flag very high (like 100 million) in order to get accurate coverage results. If your coverage at a position is above the threshold, then the reported depth may not be correct. It may report a number much less than the threshold even if the coverage is above the threshold. This depth flag was probably put in there to limit the amount of memory being used - otherwise the maximum memory needed is proportional to the deepest depth. Setting the -d flag very high will make sure you don't hit that cap, which is especially useful for amplicons since you expect many reads to start at the same position.

                Justin
                Thanks for that suggestion. Will implement.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Working...
                X