Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools/bcftools problem

    I'm using samtools to look for SNPs with ultradeep (>18,000X) coverage. For some reason samtools and bcftools does not call some of the SNPs.

    Code:
    gi|251831106|ref|NC_012920.1|	16125	.	G	.	99	.	DP=18552;AF1=0;CI95=1.5,0;DP4=832,646,0,1;MQ=35;PV4=0.44,0.14,1,1	PL	0
    gi|251831106|ref|NC_012920.1|	16126	.	T	C	99	.	DP=18770;AF1=1;CI95=1,1;DP4=28,21,777,596;MQ=35;PV4=1,0,0.00042,0.28	PL	219,255,0
    For example, at position 16215, IGV gives me:
    Total count: 18522
    A: 17030 (92%, 8124+, 8906-)
    C: 0
    G: 1516 (8%, 861+, 655-)
    T: 5 (0%, 4+, 1-)
    N:1 (0%, 1+, 0-)

    Sam/bcftools should be making a SNP call at position 16215 but is not. Also, the first two counts in DP4 almost agree between bcftools and IGV, but the third and fourth do not. The sequences are quality trimmed and the reads containing the A allele are not of low quality.

    Anyone know why? Thanks.

  • #2
    Did you get any response to your post?

    I have similar data with ultra deep coverage where the total depth counts seen in IGV match the bcftools DP but the alternate allele counts are close but do not match those in the DP4 bcftools output.

    I assume this is to do with the quality of the calls but I thought I had disabled this with -BQ0 when generating the mpileup

    Any ideas?

    Thanks

    Comment


    • #3
      Originally posted by abelhj View Post
      I'm using samtools to look for SNPs with ultradeep (>18,000X) coverage. For some reason samtools and bcftools does not call some of the SNPs.
      Hi,

      Can you post the commands of samtools/bcftools you are using? If you use vcfutils.pl varFilter there is a -D option that does not call SNPs exceeding D depth (the default is 10000000 so probably this is not the problem if you didn't change it!)

      Sorry I can't be of more help...

      All the best
      Dario

      Comment


      • #4
        Hi, thanks for the quick response

        I am simply running

        samtools mpileup -l site.txt -BQ0 -d1000000 -uf ref.fasta test.sorted.bam > test_mpileup.bcf

        bcftools view -Acg test_mpileup.bcf

        I am just after the number of reads for each allele at a specific position, like the display you can get in IGV that the OP showed

        Comment


        • #5
          Maybe try:

          mpileup -l site.txt -BQ0 -d1000000 ref.fasta test.sorted.bam > test_mpileup.pileup

          To see what the quality of your alternate letters are according to mpileup. Then look at the individual sam entries for those reads. Maybe there's a discrepancy there.

          Comment


          • #6
            Thanks for the reply, I've just come across VarScan (http://varscan.sourceforge.net/index.html) which works on the pileup output from samtools.

            Using this I am able to get the counts per allele per position in the pileup which also matches the IGV calls

            All the best

            Chris

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            31 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X