Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jschuur1
    Junior Member
    • Mar 2011
    • 3

    Snp analysis Microbial genomes 454 data

    Hello,

    We've done some great 454 runs on some of our favorite microbes, de novo assembled each, sorted the contigs to a database and completed the annotation of all of them, however now I'm sorta stuck in getting to the part where we compare them to look for variation, mostly point mutation and/or small insertions/deletions.

    I know the Roche GS Mapper can do such analysis, however it refuses to read my annotation files (all in gff3) as it requires goldenpath type 128 files. And I can't seem to find anything else which would give me a nice output of snp's in genes and possible corresponding changes in amino acids. I have the consensus reads in several formats, but the good thing from the Roche mapper is that it will include the sequence depth (from the sff files) at which the region with a snp was established, as to eliminate false positives.

    I've browsed these forums, yet I can't find anyone else with this specific problem. Can someone give me some advice on how I can complete my analysis?

    Thanks in advance
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #2
    The Roche gsMapper probably does the best job of aligning 454 read and predicting variants. However I have recently discovered that its prediction of the effects of those variants are completely unreliable. My project was very similar to yours, mapping reads to a well annotated bacterial genome (I created my own refGenes.txt file, and yes, I'm sure it was correct). In some cases the amino acid calls by gsMapper were from the wrong frame, even though it reported the correct frame as read from the refGene.txt file. In other cases the errors were due to the odd way in which gsMapper reports some SNPs. gsMapper will sometimes report a SNP as a deletion followed by an insertion; this may throw off prediction of the SNP effect. I manually went through the HCDiffs file changing these to substitutions. You should to if you decide to use gsMapper for mapping the reads and calling variants followed by another tool for predicting effects of those variants.

    Have you checked out the Ensemble Bacteria Variant and SNP effect predictor?

    Ensembl Bacteria is a genome-centric portal for bacterial species of scientific interest

    Comment

    • colindaven
      Senior Member
      • Oct 2008
      • 417

      #3
      We put an emphasis on SNP calling from 454 data, not de novo assembly to start with. I have been using gsMapper results using SNP calls from the 454HCDiffs files and the AlignmentInfo.tsv as a comparison.

      As kmcarr says the output format is suboptimal with multiple lines per pututative SNP position (ref to deletion on one line, del to SNP base on the next). Why they can't make it like a pileup to avoid confusion is beyond me.

      We assess the effects of SNPs on amino acids with SNPeff. It s the best software I've found so far for this purpose.

      Comment

      • jschuur1
        Junior Member
        • Mar 2011
        • 3

        #4
        Thanks for the quick responses, I'll have a look at the ensembl site.

        On the other hand, so far I haven't been able to actually use the GSmapper as it can't read my reference file (in gff3), as it asks for goldenpath, does anyone know a way to convert from gff3 to a file I can use as an annotation file in GSMapper?

        I can look for snp's in the fasta's coupled with the sff files, but then I lose my annotation.

        Comment

        • colindaven
          Senior Member
          • Oct 2008
          • 417

          #5
          Shouldn't the gsMapper reference be in FastA format ?
          It might be tricky to assess the effects of variants on a whole lot of new contigs, all with new annotations.

          Comment

          • jschuur1
            Junior Member
            • Mar 2011
            • 3

            #6
            It takes a fasta as a reference yes, but in the third tab you can add an annotation file.

            If I just use the fasta file (which I have), it doesn't include the amino acid substitutions. But perhaps I can take the output and run it at the ensembl site.

            Comment

            • enrico
              Junior Member
              • Jul 2010
              • 5

              #7
              Originally posted by kmcarr View Post
              The Roche gsMapper probably does the best job of aligning 454 read and predicting variants. However I have recently discovered that its prediction of the effects of those variants are completely unreliable. My project was very similar to yours, mapping reads to a well annotated bacterial genome (I created my own refGenes.txt file, and yes, I'm sure it was correct). In some cases the amino acid calls by gsMapper were from the wrong frame, even though it reported the correct frame as read from the refGene.txt file. In other cases the errors were due to the odd way in which gsMapper reports some SNPs. gsMapper will sometimes report a SNP as a deletion followed by an insertion; this may throw off prediction of the SNP effect. I manually went through the HCDiffs file changing these to substitutions. You should to if you decide to use gsMapper for mapping the reads and calling variants followed by another tool for predicting effects of those variants.

              Have you checked out the Ensemble Bacteria Variant and SNP effect predictor?

              http://bacteria.ensembl.org/tools.html
              I had the same problem with gsMapper, wrong amino acid variant reported because of a wrong translation frame (different from that specified in the refGene.txt file).

              But it happens for some variants only, the majority of them are correct so I am not able to figure out which is the problem...

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 11:58 AM
              0 responses
              10 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              25 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              35 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              58 views
              0 reactions
              Last Post SEQadmin2  
              Working...