Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • very short deletion messes up SAMtools SNP calling

    Hi,

    I used Bowtie to align a solid read to the human genome, and now I'm trying to use samtools to call snps. When I use the pileup function, I get tons of snps, most due to a polymorphism (poly-c) in which there is a difference between my read and the reference. As samtools doesn't recognize this I get tons of snps from that point onwards.

    I would appreciate any help on the matter.

    Thanks,

    Eyal

  • #2
    Use an aligner that is capable of gapped alignment. This is ESSENTIAL. No variant caller can work well with an ungapped aligner.

    Comment


    • #3
      Thanks a lot for the reply!

      Which aligners are capable of gapped alignment? I understand MAQ is, but I couldn't get it to run as I do not have access to a cluster, so I need a software that can run on my core i7 with 12gb (so 10gb max for alignment).

      Many thanks,

      Eyal

      Comment


      • #4
        Since Li Heng is too polite to suggest BWA I will recommend it, it's comparable to Bowtie in terms of speed and supports gapped alignments: http://bio-bwa.sourceforge.net/index.shtml

        Comment


        • #5
          Thanks. I'll try it.

          Comment


          • #6
            I tried using BWA, I used the supplied solid2fastq.pl file to create a gzip of my reads in fastq. Used the default settings for BWA, and later pileup, got me very bad alignment, with no connection at all between the reference genome (I'm checking only the mitochondria) and the consensus call.
            What coud I be doing wrong?

            Comment


            • #7
              I'm not really sure since I don't work with SOLiD reads, but I think that BWA actually just uses the fastq format to store the colorspace reads in and uses ACGT as color representations. If you then try to align these fastq files normally, you will get many errors because of the nature of the colorspace encoding.

              What you should do is generate a colorspace reference of your genome of interest and then align against that. The command looks like this for a human sized genome:
              bwa index -a bwtsw -c genome.fa
              You then align your reads in colorspace:
              bwa aln -c genome.fa reads.fastq > alignment.sai
              In any case, you should definetely always align colorspace reads in colorspace.

              Comment


              • #8
                These steps are exactly the ones I followed. When I look at the SAM file now, I see many N's in the reads, in similar places in the sequence, for instance:

                NGGNGNNNTAGGGNANNNANGCCNGNTNGNGNTNGNNNGATNGNCNNNN
                NTCNTNNNAGTGCNANNNGNGTGGGNGNGNTNANCGNNGCGCGNANNNN
                etc...

                Comment


                • #9
                  Looks odd, how do the "raw" fastq files look?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  57 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  56 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X