Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pileup shows variants - Not called as SNPs

    Hello,

    I notice that with samtools pileup/mpileup, there are several positions which have variants that can be classified as SNPs. However, they fail to show up when using the subsequent SNP program in its most basic form (i.e. without any filters or flags). What can be causing these 'calls' to drop off?

    Thanks,
    BertieW

  • #2
    Maybe you could post some examples?

    Comment


    • #3
      First guess, the quality of the alternate letters is too low. Post the line of the mpileup that you think is a putative SNP, and see if that's the case.

      Comment


      • #4
        I have a similar situation with several SNPs. I filtered out reads that align to this SNP position to investigate the mapping quality and base qualities. All reads aligning to the position where I observe this SNP on IGV have a mapping quality of 37 which is pretty good. All of these reads match perfectly without mismatch to the reference and a majority of these reads (>40) differ at this position compared to reference. But Samtools pileup does not call it a SNP. Is it just the limitation of samtools or is there something I could try to overcome this? The SNP I observe is at position 33388713
        Code:
        328MB44S:8:42:10743:11542#0 0 1 33388674 37 70M * 0 0 CCTTAGGTCACACTGCAGAACAGAGTGATATTGCAAAAAACCAAATTATCAGTCTACGTGCAACCACCAT gcghhhffdhhghfhhhhhhhhhfhehfhhhghghhgcfhghghhhhghhhhgdfhhhhhhffheahhhf XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:39G30

        Comment


        • #5
          Looks like your base qualities are in Illumina's phred+64 format (see http://en.wikipedia.org/wiki/FASTQ_format). They should in Sanger format. Since the base qualities are out of range, all bets are off.

          Comment


          • #6
            Thanks Nils. I am redoing the workflow with phred+33 qualities now. However, I am thinking what difference should I expect? It looks like with phred+64, I was overestimating the base quality and hence I was seeing lot many SNPs and indels than expected. With the corrected base qualities, this may go down. I wonder how I can expect this fix to enable the detection of the SNP that samtools had missed earlier. Any thoughts?

            Comment


            • #7
              Post your results when you are done.

              Comment


              • #8
                BWA aln has a -I option if your fastq files are in Illumina format rather than Sanger. If this option is used are quality scores changed to phred+33 in the output?

                Comment


                • #9
                  I have a similar question and I hope someone can help me with this. If you need more info, just let me know.

                  I aligned my 2 x 100 bp reads with BWA - I and applied realignment with GATK in a later stage and looking at the results in IGV, everything looks quite good. There are some problems though.

                  The bam files of two family members show an AC-deletion at chr1:165,378,729 and that one is being called. At position 165,378,733 and 165,378,735 I find a C > T variants that are supported by 62 out of 62 and 58 out of 68 reads in one patient and by 36 out of 42 and 22 out of 42 reads in the second patient. Seems like clear snp's to me and the base call quality is definitely in SANGER format and between 30 and 40.

                  Does samtools allow snp's near indels or could the problem be somewhere else? (or maybe just because it is Monday)

                  Thank you!

                  EDIT: I know that both the deletion and the C>T snp are actual known variants (dbSNP132), although I can imagine it to be a bit complex, because they are in a repetitive region/simple repeats.
                  Last edited by Papillon; 05-09-2011, 11:49 AM.

                  Comment


                  • #10
                    Similar Problem

                    I have a similar problem and don't want to start a separate thread unless necessary.

                    Illumina HiSeq paired-end genomic DNA for Arabidopsis.
                    Reads aligned using BWA (aln and sampe modules).
                    Alignment is sorted and indexed.

                    I used Samtools mpileup:
                    samtools mpileup -uf TAIR10.fa arab1.sorted.bam | bcftools view -c - > abar1.var.raw.vcf

                    When I compare a particular region: Chr5:4,764,955 between the output of mpileup & bcftools with the samtools tview -- I don't see the same results. I will post here if possible:

                    from the vcf file output:
                    Chr5 4763954 . T . 45 . DP=65;AF1=4.92e-07;CI95=1.5,0;DP4=1,4,0,0;MQ=54 PL 0
                    Chr5 4763955 . C . 3.67 . DP=64;AF1=1;CI95=0.5,1;DP4=0,0,0,3;MQ=60 PL 40
                    Chr5 4763956 . C . 42 . DP=64;AF1=4.837e-07;CI95=1.5,0;DP4=1,3,0,0;MQ=60 PL 0

                    The tview "view" is attached. I'm not sure why, with 64 reads, ALL with a T at position 4763955, that it is not called as a SNP. I'm trying to assess why? Is it the gap nearby?

                    Please note that this "gap" inserted into the reference is caused by a single low-quality read (not shown in the png file). How can I get BWA or mpileup to not insert such low quality inserts ??

                    Many Thanks !

                    Jim
                    Attached Files

                    Comment


                    • #11
                      Do'h, do I feel stupid!

                      I am also using SAMTools' varFilter to loose the variants which have too low coverage. But it actually does much more and this is why I am loosing these SNP's.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      18 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      47 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X