Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools mpileup consensus calls A instead of a gap

    Hello,

    I'm using samtools with the following command to call the consensus of my mapping:

    samtools mpileup -uf Ref SEQ | bcftools view -cg - | vcfutils.pl vcf2fq > SEQ.fq

    The problem is that I have sequences that have 1 and some have 2 copies of a short repeat - the reference has 2. If I use the reference with 2 copies I end up with 2 copies in the consensus since it sometimes maps the reads to the first and sometimes to the second copy of the repeat. So I exchanged one copy in the reference with N's., but end up with A's instead of the gap in the consensus.

    Example:

    Ref AANNNTATTATTA
    Read1 AA---TATTATTA
    Read2 AA---TATTATTA

    it calls AAAAATATTATTA instead of AATATTATTA.

    I also get the following error if I use the reference with N's instead of the second copy (that I don't get if I use the reference with 2 copies):

    Unknown command "vcf2fq".


    I've had the same problem with the previous pileup function!

    Any idea how to change that?

    Cheers,
    Stefan

  • #2
    I wonder if this is an issue with treating N in the reference as A, or down to the fact that the reads have an A before the gap? Do you have some other examples with different bases used instead of the N's?

    Comment


    • #3
      Hi,

      Replacing "N" by "-" seams to help. It looks like it's really reading N's as A's.

      Thanks!

      Stefan

      Comment


      • #4
        So if you replaced the "N" with "C" or "T" or "G" do you get those letters instead of "A"? That would pretty much confirm the nature of this bug / design limitation, and give a good clue about where in the code to start looking.

        Reminds me of this tweet from Vince Buffalo,
        ‏@vsbuffalo: Any time you only handle A, T, C, or G, the members of IUPAC kill an innocent kitten. We've all done it. #bioinfo
        Last edited by maubp; 12-01-2012, 06:16 AM. Reason: formatting

        Comment


        • #5
          Hi,

          Yes, I checked and it calls the reference in case the position is not covered by any read.

          I should probably report that as a bug.

          Thanks!

          Stefan

          Comment


          • #6
            It seams they changed it a little bit since the pileup function, which also interprets "-" as "A", but didn't fully resolve the problem.

            Cheers,
            Stefan

            Comment


            • #7
              It seams to be a problem only when positions are "covered" by gaps in the read. Since it's calling N in the consensus if the position is not covered by a single read.

              Stefan

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              29 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X