Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools mpileup finds SNPs on each position

    Hello everyone,

    I am trying to use samtools mpileup. I am using the settings:
    Code:
    samtools mpileup -f ../chr12Ref.fa chr12.bam > chr12.vcf
    When I look in the output, I see this result:
    Code:
    SL2_40ch12      1       C       0
    SL2_40ch12      2       T       0
    SL2_40ch12      3       A       2       .^].    ;=
    SL2_40ch12      4       T       2       ..      ;@
    SL2_40ch12      5       A       2       ..      D@
    SL2_40ch12      6       G       2       ..      BD
    SL2_40ch12      7       C       2       ..      AD
    SL2_40ch12      8       A       3       ..^].   B?@
    SL2_40ch12      9       A       3       ...     DBB
    SL2_40ch12      10      A       4       ...^].  ?:C@
    This means each position has SNP, and the read base is an integer...

    When looking at the input bam file, everything looks fine in IGV. Also the source looks exactly as expected:
    Code:
    @HD     VN:1.3  SO:coordinate
    @SQ     SN:SL2_40ch12   LN:65486253
    @PG     ID:bwa  PN:bwa  VN:0.7.4-r385
    FCD116PACXX:8:1212:1169:72997#GTCCAGAA  117     SL2_40ch12      1       0       *       =       1       0       ATGCATAGACAAGGTCTTGACGGACCTCCACAAACAAATTTGACATTTTTGATGTCAGAATCCGGATCACCCAAAAAATGCTTTGCTATAGCACACAAAA    EEDEEEDDDDCDDCDDDDDDDDC?DDEDFFFFHFHHFHGIJIHFJJJIJJJJJJJJJIJHFIGIJIHGCJJJJJJJJIIIJJJJJJJHHHGHEDDFFCCB
    FCD116PACXX:8:1212:1169:72997#GTCCAGAA  153     SL2_40ch12      1       37      100M    =       1       0       CTATAGCAAACCTTTTTTTGGGTGATTCGGATTCCGATGTCAAAAATGCCAAATTTTTTTGTGGACGCCCGTCAAGACCATGTATATGCATCCGGTTGGC    EEEEDDDDC>@DBDBDDBDDDCDDB@@DDC@DDDBDC@C?BDEDDDDCEEEEEDHJJJJIIEIIGIGGIHCIJIGHFJJJJJJJIJJHHGHHFFFFFCC> XT:A:U  NM:i:0  SM:i:37 AM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100        ZQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@E
    The reference sequence looks like this (so you can see the first reference bases are found by samtools mpileup):
    Code:
    >SL2_40ch12
    CTATAGCAAACCTTTTTTTGGGTGATTCGGATTCCGATGTCAAAAATGCCAAATTTTTTTGTGGACGCCCGTCAAGACCA
    TGTATATGCATCCGGTTGGCCCTCACGGCCCGTCCGACCCATTAAGAAGGTCAAACGAGCCCCGAAGCGAGCATACCCCT
    CATTTCGATCATTTTCGTGTGCTATAGCAAACCATTTTTTGGGTGATCCGGATTTCGACGTCAAAAATGCCAAATTTTTT
    When looking at this data, there has something to be wrong with the bam file, does anyone know what this might be?

  • #2
    I think I have found the problem...
    Samtools mpileup doesn't create a vcf file by default but a file in pileup format.
    Now I am using the -ug option and piping the output to bcftools view.

    Comment


    • #3
      Hi Jetse,
      the mpileup output is simply another way to represent the alignment.
      Consider this output line:
      SL2_40ch12 7 C 2 .. AD
      It means that in chr12 at position 7 you have a base C in your reference genome, In your alignment the coordinate have a depth of 2 and there are not any missmatch (..). AD are the quality values of your sequenced C.

      The following row is the same but the depth is 3
      SL2_40ch12 9 A 3 ... DBB

      If you have a SNP you should have something like this:

      SL2_40ch12 9 A 3 GGG DBB

      That means homozygous SNP AtoG

      Comment


      • #4
        That was indeed my problem, but now I converted this to vcf format by using the -ug option and piping this output to bcftools view. Now my output looks like this:
        Code:
        #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  chr12.bam
        SL2_40ch12      1       .       C       X       0       .       DP=7;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0.000000,0.000000,0.000000,0.000000 PL      0,0,0
        SL2_40ch12      2       .       T       X       0       .       DP=7;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=-nan,0.000000,0.000000,0.000000     PL      0,0,0
        SL2_40ch12      3       .       A       X       0       .       DP=8;I16=2,0,0,0,54,1460,0,0,89,4441,0,0,6,36,0,0;QS=1.000000,0.000000,0.000000,0.000000        PL      0,6,49
        SL2_40ch12      4       .       T       X       0       .       DP=8;I16=2,0,0,0,57,1637,0,0,89,4441,0,0,8,50,0,0;QS=1.000000,0.000000,0.000000,0.000000        PL      0,6,52
        SL2_40ch12      5       .       A       X       0       .       DP=8;I16=2,0,0,0,66,2186,0,0,89,4441,0,0,10,68,0,0;QS=1.000000,0.000000,0.000000,0.000000       PL      0,6,55
        SL2_40ch12      6       .       G       X       0       .       DP=8;I16=2,0,0,0,68,2314,0,0,89,4441,0,0,12,90,0,0;QS=1.000000,0.000000,0.000000,0.000000       PL      0,6,59
        SL2_40ch12      7       .       C       X       0       .       DP=8;I16=2,0,0,0,67,2249,0,0,89,4441,0,0,14,116,0,0;QS=1.000000,0.000000,0.000000,0.000000      PL      0,6,59
        SL2_40ch12      8       .       A       X       0       .       DP=9;I16=3,0,0,0,94,2950,0,0,149,8041,0,0,16,146,0,0;QS=1.000000,0.000000,0.000000,0.000000     PL      0,9,75
        This is the VCF output I expected already from samtools mpileup, but now I see at each position an X as alternative base...

        [EDIT]
        When looking more into the generated vcf file, I see the SNPs, for instance:
        Code:
        SL2_40ch12      35      .       C       T,X     0       .       DP=17;I16=10,0,1,0,378,14442,24,576,538,30482,60,3600,177,3745,25,625;QS=0.938931,0.061069,0.000000,0.000000;RPB=1.106797e+00   PL      0,9,167,30,170,181
        Here the alternative bases are T,X. When looking in IGV, there is only one read at that position that has a T. So there has to be something in this line which indicates this is probably not a SNP...
        Last edited by Jetse; 06-19-2013, 01:14 AM.

        Comment


        • #5
          Found the answer!

          the problem was with bcftools view. Had to use the options -vcg so I retrieved the output as expected!
          This command gave me this file:
          Code:
          #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  chr12.bam
          SL2_40ch12      241     .       T       TG      8.17    .       INDEL;IS=11,0.733333;DP=15;VDB=3.823388e-02;AF1=1;AC1=2;DP4=0,0,15,0;MQ=54;FQ=-79.5     GT:PL:GQ        1/1:48,45,0:49
          SL2_40ch12      840     .       T       A       222     .       DP=37;VDB=2.932610e-03;AF1=1;AC1=2;DP4=0,0,21,16;MQ=60;FQ=-138  GT:PL:GQ        1/1:255,111,0:99
          SL2_40ch12      841     .       A       G       222     .       DP=37;VDB=2.393675e-03;AF1=1;AC1=2;DP4=0,0,21,16;MQ=60;FQ=-138  GT:PL:GQ        1/1:255,111,0:99

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X