Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Kasycas,
    What exactly do you mean by annotation - synonymous or non synonymous? SNP location in the genome?
    John

    Comment


    • #17
      @jgibbons1

      Both actually, it would be nice to have interpretable output where you can see how relevant a particular SNP is. Therefore, I was trying to get information such as; what gene it's from, what position within the gene has the SNP, the resulting amino acid change if any and if it's syn/nonsynonomous.

      I'm finding it hard to believe a tool for this purpose doesn't exist!

      Thanks for the reply.

      Comment


      • #18
        Hmmm...ok. I haven't figured out how to see if a SNP is synonymous or non synonymous but all of the other information is in the SNP output after you run the "cns2snp" command.

        Here's an example output of the 1st five columns:

        chromosome, position, reference base, consensus base, Phred-like consensus quality

        GENE; SITE; REF_BASE; SNP_BASE; QUALITY_SCORE
        lcl|AL123456.2_gene_1725 268 T C 255
        lcl|AL123456.2_gene_1731 219 C T 255
        lcl|AL123456.2_gene_1731 447 T C 255
        lcl|AL123456.2_gene_1731 485 C T 255
        lcl|AL123456.2_gene_1732 69 A G 255

        Do you get the same output? If you find software to characterize the SNP itself I would love to know about it too!

        Comment


        • #19
          Yep, got that alright. Position just isn't enough because you then need to see the gene's it's affecting. I guess it means writing a script.

          Thanks for your response anyway, it's always better that nothing!!

          Kas

          Comment


          • #20
            Hi garwuf,
            I wanna ask you about the samtools mpileup command for haploid genome in bacteria. I tried it many times but it always hangs with me.
            knowing that I did my allignment using Bowtie 2 which allows allignments with gaps.
            this is my commands:

            samtools mpileup -uf NC_008596.1.fasta mt1sortfilter.bam ->snp/pileup/mt1.pileup

            I don't know what's wrong, but it freeze and give nothing for hours

            thanks



            Originally posted by garwuf View Post
            I gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.

            Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.

            To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.

            Comment


            • #21
              mpileup and Gtak command for haploid genomes

              Hi,
              I wanna ask about the samtools mpileup and Gatk commands for haploid genome in bacteria.
              I tried them many times but it always hangs with me.
              knowing that I did my allignment using Bowtie 2 which allows allignments with gaps.
              for instance , this is my mpileup command :

              samtools mpileup -uf NC_008596.1.fasta mt1sortfilter.bam ->snp/pileup/mt1.pileup

              I don't know what's wrong, but it freeze and give nothing for hours

              thanks

              Comment


              • #22
                Originally posted by Medo View Post
                Hi,
                I wanna ask about the samtools mpileup and Gatk commands for haploid genome in bacteria.
                I tried them many times but it always hangs with me.
                knowing that I did my allignment using Bowtie 2 which allows allignments with gaps.
                for instance , this is my mpileup command :

                samtools mpileup -uf NC_008596.1.fasta mt1sortfilter.bam ->snp/pileup/mt1.pileup

                I don't know what's wrong, but it freeze and give nothing for hours

                thanks
                the - before the > might be the problem

                Comment


                • #23
                  HI vv85,
                  Thanks a lot , that was the reason .
                  But do you know really if samtools pileup and GATK are really applicable in haploid genomes or i will get false positive variants?

                  Thanks alot

                  Comment


                  • #24
                    Like another poster has mentioned I prefer using samtools on haploid genomes. False positive variants are always possible depending on the initial sequencing data you're using and specific features of your genome.

                    Comment


                    • #25
                      You may try this recent program SNVer.

                      It models the number of haploids in its model so it is applicable to haplid genomes too.


                      Originally posted by d17 View Post
                      Does anyone have any thoughts on calling SNPs from short read data (e.g. Illumina) in haploid genomes? It seems that many SNP calling programs are set up to deal only with diploid genomes (e.g. GATK's UnifiedGenotyper).

                      I found the program FreeBayes from the Marth Lab which allows you to specify the ploidy. This looks like a good candidate and I will definitely try it. It appears to be unpublished.

                      Does anyone have any experience with calling SNPs in haploid genomes using FreeBayes or another program?

                      Thanks!

                      Comment


                      • #26
                        @Kasycas and @jgibbons1.
                        Its highly possible you wrote/found a script to map your SNPs on to genes (or find out synonymous and non-syn mutations.
                        I use snpEFF program for that. All you need is your VCF file and gene annotations in GFF format.



                        Shamefully agree, i wrote a (inferior)script to do it myself before finding this one.
                        Gowthaman

                        Comment


                        • #27
                          Originally posted by garwuf View Post
                          I gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.

                          Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.

                          To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.
                          I'm the author of freebayes.

                          Did you submit bug reports about these issues? We have been using freebayes for haploid detection without issue.

                          When you say that freebayes was reporting many false SNPs, was this before or after you filtered the output on the QUAL field? It is our expectation that users filter the output data, and the output will include many SNPs with very low reported quality so as to allow filtering at any desired level.

                          The test setup you are describing is very similar to one we use during development, but your results are dramatically different.

                          Also, I am not aware of any existing issues with larger genomes, as we typically work with human samples, but again, I will be able to resolve anything with a bug report.

                          It's likely that if other users reported the same issues they have been resolved in the time since you tested.

                          Comment


                          • #28
                            To answer the original post, simply running

                            % freebayes -p 1 -f reference.fasta alignments.bam

                            is sufficient to generate haploid SNP, indel, and complex allele calls using freebayes. The method is described in arXiv:1207.3907, "Haplotype-based variant detection from short-read sequencing."

                            If anyone has issues with this method, please report them to me (via email) or to the freebayes mailing list.

                            Happy variant detecting.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-27-2024, 06:37 PM
                            0 responses
                            12 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-27-2024, 06:07 PM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            53 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            69 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X