Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools mpileup - start and end segments

    Hi,

    I'm having a problem calling a variant with NGS which has been confirmed via Sanger sequencing.

    I'm using samtools mpileup followed by VarScan.

    Bit of background
    The lab are doing resequencing of human candidate genes to identify rare mutations.

    Target enrichment is performed via HaloPlex, sequencing is done on a MiSeq with 150bp paired end reads.

    I am aligning reads with BWA using default PE settings againts the entire human genome
    (I have tried the UCSC, 1KG, and MiSeq ref genomes - all the same result)

    The problem

    In the pileup (generated with or without BAQ computation) there is a ~20bp region 'missing'

    Example commands
    samtools mpileup -f ref.fa test.bam > test_BAQ.pileup
    samtools mpileup -Bf ref.fa test.bam > test_NO_BAQ.pileup

    From positions chr1:76740134 to chr1:76740154 are not present in the pileup?

    The bases flanking the 'missing' region are marked by ^ (start of read segment) and $ (end of read segment).

    However, when I look in IGV there are plenty reads covering the 'missing' region and the variant I know is there is there clear as day! It just doesn't make it into the pileup?!

    Any help / advice about what is going on here would be much appreciated

    BW

    Chris

    Example pileup
    chr1 76740129 T 51 ................................................... FF:FGBFG?FGFFEGBF6F??GBGE@FB5DF?FGDFBBFGGFD.FF?G?FG
    chr1 76740130 G 51 ................................................... GF@FGFDFBFFFFEG?DBFBBGDF2DGFBFGDFGFFFDFGFFFDFF?FBFG
    chr1 76740131 G 51 ...$................................................ GG9,GFDFFFGFBEGDB?FBDGFG;EGD,F>BGGBFGFFFGGDFFFBGDFG
    chr1 76740132 A 50 .................................................. GG?FFDGBGFFB@G>DFFBFGFG@DGFBG?;GG?GFFFFFGFFFG>F;GF
    chr1 76740133 A 50 .................................................. GGFFFFFDBFDD@G>F?FBBFF>@DGF>G??FFBFGBFGBFGFFG?>,FB
    chr1 76740134 T 50 .$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$ >>>>>>>>>>>>9>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    chr1 76740154 A 42 ^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^], B=ECAACCAEBA@BEB=;C?CCBCC;>ECAECC==>>
    chr1 76740155 T 42 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, EEEGEG=GBGBD@EGEBEE;EDEGGEDEEBGDEBEGBEDDGD
    chr1 76740156 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,^],^],^],^],^],^],^],^],^], FDDGFGDGEGEG@FGEBEGDGEEGGEGEEFGEGDFGGEDFEF@BBB@BB>9
    chr1 76740157 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FBDGFG4GEGDEEFGEDDGEGFFGGEGEEFGEGEFGGDDEGF@;DEDE@DD
    chr1 76740158 T 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FBBGFG;GGG,EEFEEDBGDGFFGGEGBFEEEGEFGGF=DGF**DD9E@BD
    chr1 76740159 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FEEGFGBGGGDDEEGE=BG;GEEDEEGFFEDEEEEEGFEEGFED@99BE>9

  • #2
    OK I have found a work around

    If I use the mpileup -A flag (count anomolous read pairs) then the region is included in the pileup

    BW

    Chris

    Comment


    • #3
      Chris,

      I'm glad you came across the solution - it looks like there's a universal issue mapping reads in "proper" pairs at that location. Thanks for using VarScan!

      Comment


      • #4
        Originally posted by cboustred View Post
        OK I have found a work around

        If I use the mpileup -A flag (count anomolous read pairs) then the region is included in the pileup

        BW

        Chris
        How well does Haloplex perform based on your experience? I am thinking about trying it out. However, I am worrying about FFPE DNA input requirement.
        Any potential to get copy number information using Haloplex because of the multiple amplicon design? I understand the reads are not good as random reads from hybridization capture.

        Comment


        • #5
          I am having a similar problem with samtools mpileup:
          #If I use :
          samtools mpileup -r chr12:25398277-25398285 Sample1.bam >output1
          where output1 is below:
          chr12 25398279 N 16 G$A$*A$C$G$A$G$CCCCCCCC BB11BACBHHHHGHF0
          chr12 25398280 N 9 *GGGGGGGA 1EGG?AC01
          chr12 25398281 N 9 CCCCCCCCC 1EGGA/EE

          These are obviously truncated lines! Why does mpileup truncate the lines in the last two columns?

          Any help shall be appreciated.
          ~Rini

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X