Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Snp analysis Microbial genomes 454 data

    Hello,

    We've done some great 454 runs on some of our favorite microbes, de novo assembled each, sorted the contigs to a database and completed the annotation of all of them, however now I'm sorta stuck in getting to the part where we compare them to look for variation, mostly point mutation and/or small insertions/deletions.

    I know the Roche GS Mapper can do such analysis, however it refuses to read my annotation files (all in gff3) as it requires goldenpath type 128 files. And I can't seem to find anything else which would give me a nice output of snp's in genes and possible corresponding changes in amino acids. I have the consensus reads in several formats, but the good thing from the Roche mapper is that it will include the sequence depth (from the sff files) at which the region with a snp was established, as to eliminate false positives.

    I've browsed these forums, yet I can't find anyone else with this specific problem. Can someone give me some advice on how I can complete my analysis?

    Thanks in advance

  • #2
    The Roche gsMapper probably does the best job of aligning 454 read and predicting variants. However I have recently discovered that its prediction of the effects of those variants are completely unreliable. My project was very similar to yours, mapping reads to a well annotated bacterial genome (I created my own refGenes.txt file, and yes, I'm sure it was correct). In some cases the amino acid calls by gsMapper were from the wrong frame, even though it reported the correct frame as read from the refGene.txt file. In other cases the errors were due to the odd way in which gsMapper reports some SNPs. gsMapper will sometimes report a SNP as a deletion followed by an insertion; this may throw off prediction of the SNP effect. I manually went through the HCDiffs file changing these to substitutions. You should to if you decide to use gsMapper for mapping the reads and calling variants followed by another tool for predicting effects of those variants.

    Have you checked out the Ensemble Bacteria Variant and SNP effect predictor?

    Ensembl Bacteria is a genome-centric portal for bacterial species of scientific interest

    Comment


    • #3
      We put an emphasis on SNP calling from 454 data, not de novo assembly to start with. I have been using gsMapper results using SNP calls from the 454HCDiffs files and the AlignmentInfo.tsv as a comparison.

      As kmcarr says the output format is suboptimal with multiple lines per pututative SNP position (ref to deletion on one line, del to SNP base on the next). Why they can't make it like a pileup to avoid confusion is beyond me.

      We assess the effects of SNPs on amino acids with SNPeff. It s the best software I've found so far for this purpose.

      Comment


      • #4
        Thanks for the quick responses, I'll have a look at the ensembl site.

        On the other hand, so far I haven't been able to actually use the GSmapper as it can't read my reference file (in gff3), as it asks for goldenpath, does anyone know a way to convert from gff3 to a file I can use as an annotation file in GSMapper?

        I can look for snp's in the fasta's coupled with the sff files, but then I lose my annotation.

        Comment


        • #5
          Shouldn't the gsMapper reference be in FastA format ?
          It might be tricky to assess the effects of variants on a whole lot of new contigs, all with new annotations.

          Comment


          • #6
            It takes a fasta as a reference yes, but in the third tab you can add an annotation file.

            If I just use the fasta file (which I have), it doesn't include the amino acid substitutions. But perhaps I can take the output and run it at the ensembl site.

            Comment


            • #7
              Originally posted by kmcarr View Post
              The Roche gsMapper probably does the best job of aligning 454 read and predicting variants. However I have recently discovered that its prediction of the effects of those variants are completely unreliable. My project was very similar to yours, mapping reads to a well annotated bacterial genome (I created my own refGenes.txt file, and yes, I'm sure it was correct). In some cases the amino acid calls by gsMapper were from the wrong frame, even though it reported the correct frame as read from the refGene.txt file. In other cases the errors were due to the odd way in which gsMapper reports some SNPs. gsMapper will sometimes report a SNP as a deletion followed by an insertion; this may throw off prediction of the SNP effect. I manually went through the HCDiffs file changing these to substitutions. You should to if you decide to use gsMapper for mapping the reads and calling variants followed by another tool for predicting effects of those variants.

              Have you checked out the Ensemble Bacteria Variant and SNP effect predictor?

              http://bacteria.ensembl.org/tools.html
              I had the same problem with gsMapper, wrong amino acid variant reported because of a wrong translation frame (different from that specified in the refGene.txt file).

              But it happens for some variants only, the majority of them are correct so I am not able to figure out which is the problem...

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              50 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X