Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • very short deletion messes up SAMtools SNP calling

    Hi,

    I used Bowtie to align a solid read to the human genome, and now I'm trying to use samtools to call snps. When I use the pileup function, I get tons of snps, most due to a polymorphism (poly-c) in which there is a difference between my read and the reference. As samtools doesn't recognize this I get tons of snps from that point onwards.

    I would appreciate any help on the matter.

    Thanks,

    Eyal

  • #2
    Use an aligner that is capable of gapped alignment. This is ESSENTIAL. No variant caller can work well with an ungapped aligner.

    Comment


    • #3
      Thanks a lot for the reply!

      Which aligners are capable of gapped alignment? I understand MAQ is, but I couldn't get it to run as I do not have access to a cluster, so I need a software that can run on my core i7 with 12gb (so 10gb max for alignment).

      Many thanks,

      Eyal

      Comment


      • #4
        Since Li Heng is too polite to suggest BWA I will recommend it, it's comparable to Bowtie in terms of speed and supports gapped alignments: http://bio-bwa.sourceforge.net/index.shtml

        Comment


        • #5
          Thanks. I'll try it.

          Comment


          • #6
            I tried using BWA, I used the supplied solid2fastq.pl file to create a gzip of my reads in fastq. Used the default settings for BWA, and later pileup, got me very bad alignment, with no connection at all between the reference genome (I'm checking only the mitochondria) and the consensus call.
            What coud I be doing wrong?

            Comment


            • #7
              I'm not really sure since I don't work with SOLiD reads, but I think that BWA actually just uses the fastq format to store the colorspace reads in and uses ACGT as color representations. If you then try to align these fastq files normally, you will get many errors because of the nature of the colorspace encoding.

              What you should do is generate a colorspace reference of your genome of interest and then align against that. The command looks like this for a human sized genome:
              bwa index -a bwtsw -c genome.fa
              You then align your reads in colorspace:
              bwa aln -c genome.fa reads.fastq > alignment.sai
              In any case, you should definetely always align colorspace reads in colorspace.

              Comment


              • #8
                These steps are exactly the ones I followed. When I look at the SAM file now, I see many N's in the reads, in similar places in the sequence, for instance:

                NGGNGNNNTAGGGNANNNANGCCNGNTNGNGNTNGNNNGATNGNCNNNN
                NTCNTNNNAGTGCNANNNGNGTGGGNGNGNTNANCGNNGCGCGNANNNN
                etc...

                Comment


                • #9
                  Looks odd, how do the "raw" fastq files look?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  7 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  7 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  66 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X