Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help! SNP determination across multiple assemblies

    First off, I apologize for the long post.

    Second, I have not found anything via searches that is definitively able to help me, so I am making a new thread here

    Ok, so let me give you some info on my project (I am a Ph.D. student in Microbiology) so that you can understand the question I am trying to answer.
    1) I study a bacteria
    2) If grown on a petri dish with nutrient agar, and allowed to grow up from a single cell, bacteria will form what we call "colonies" which are visible by eye and are usually quite similar in their physical characteristics within a given species
    3) My bacteria likes to make different-looking colonies
    4) These different-looking colonies are more than just interesting to look at, they also have implications for physiology, behavior and pathogenicity (my bug is an opportunistic pathogen).
    5) A collaborator of ours offered to sequence some of these "colony variants" for us and I have the data back and assembled
    5a) The data was gathered on a Roche 454 platform
    5b) I have been using DNASTAR's seqMan NGen to assemble the data and DNASTAR's seqMan Pro to view the assembly
    5c) I have not closed any of the genomes with new sequence data

    6) I have not yet been able to find a piece of software that will allow me to compare all of my sequenced variants at one time to determine whether any given mutation is important or not so I built a spreadsheet by hand to do that and manually searched all of the assemblies for the read/base-pair composition at any site of interest (usually found in the SNP report for one of the given variants). Unfortunately, this produces about 1,000 SNPs and introduces an unacceptable amount of human error (discovered the hard way) - both of which cannot be brute-force-fixed either with re-sequencing or with man hours (trust me, I've tried)

    I **desperately** need a tool that will:
    -take different assemblies from a nearly isogenic collection of samples and arrange them to see what is similar/different about them
    -highlight regions that *could* be of interest but would normally be filtered out due to low depth of coverage (and could be filled in by targeted re-sequencing)

    I was under the impression that SAM (Sequence Assembly Manager) could do this for me with the pileup function and I am in the process of installing it; however I have run into several snags, the biggest of which is that GNU Compiler Collection (GCC) will not update from 4.1.2 to anything higher and it looks like I am going to have to find a new OS and re-install because apparently, my OS (CentOS_5.6) is no longer supported.

    Yet I cannot keep throwing my time away on this project - it supposed to be a preliminary side project and we've been working on it for over a year now, so I have to stop with this trial-and-error nonsense and actually finish the data analysis. I am begging you guys, if there is any piece of software you know that would do this, what is it and what do I need to run it? (and will you coach me into getting the thing up and running) Or, ***if you have personal experience doing this kind of thing (or know someone who does), could you PLEASE contact me*** - I will seriously buy you beer and make you cookies or whatever you want
    Last edited by austic; 01-30-2012, 06:08 PM.

  • #2
    Send me a PM and I can talk with you offline.

    Comment


    • #3
      Thank you guys so much for all your help!

      Comment


      • #4
        Sounds like you are sorted, just saw this, so probably the following is not needed, but
        I have a tool that (I think) does what you want - it will simultaneously assemble them all and just tell you what the differences are and which strains have what.

        Website: cortexassembler.sourceforge.net
        Paper:
        Z Iqbal, M Caccamo, I Turner, P Flicek, G McVean. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics (2012) (doi:10.1038/ng.1028)

        Feel free to contact me (zam AT well.ox.ac.uk) if you want to find out more.

        Comment


        • #5
          PS - if you let me know how many strains and what kind of coverage, I can tell you how hard this is. But roughly speaking, I would expect you to be able to get results within a day

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          66 views
          0 likes
          Last Post seqadmin  
          Working...
          X