Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help! SNP determination across multiple assemblies

    First off, I apologize for the long post.

    Second, I have not found anything via searches that is definitively able to help me, so I am making a new thread here

    Ok, so let me give you some info on my project (I am a Ph.D. student in Microbiology) so that you can understand the question I am trying to answer.
    1) I study a bacteria
    2) If grown on a petri dish with nutrient agar, and allowed to grow up from a single cell, bacteria will form what we call "colonies" which are visible by eye and are usually quite similar in their physical characteristics within a given species
    3) My bacteria likes to make different-looking colonies
    4) These different-looking colonies are more than just interesting to look at, they also have implications for physiology, behavior and pathogenicity (my bug is an opportunistic pathogen).
    5) A collaborator of ours offered to sequence some of these "colony variants" for us and I have the data back and assembled
    5a) The data was gathered on a Roche 454 platform
    5b) I have been using DNASTAR's seqMan NGen to assemble the data and DNASTAR's seqMan Pro to view the assembly
    5c) I have not closed any of the genomes with new sequence data

    6) I have not yet been able to find a piece of software that will allow me to compare all of my sequenced variants at one time to determine whether any given mutation is important or not so I built a spreadsheet by hand to do that and manually searched all of the assemblies for the read/base-pair composition at any site of interest (usually found in the SNP report for one of the given variants). Unfortunately, this produces about 1,000 SNPs and introduces an unacceptable amount of human error (discovered the hard way) - both of which cannot be brute-force-fixed either with re-sequencing or with man hours (trust me, I've tried)

    I **desperately** need a tool that will:
    -take different assemblies from a nearly isogenic collection of samples and arrange them to see what is similar/different about them
    -highlight regions that *could* be of interest but would normally be filtered out due to low depth of coverage (and could be filled in by targeted re-sequencing)

    I was under the impression that SAM (Sequence Assembly Manager) could do this for me with the pileup function and I am in the process of installing it; however I have run into several snags, the biggest of which is that GNU Compiler Collection (GCC) will not update from 4.1.2 to anything higher and it looks like I am going to have to find a new OS and re-install because apparently, my OS (CentOS_5.6) is no longer supported.

    Yet I cannot keep throwing my time away on this project - it supposed to be a preliminary side project and we've been working on it for over a year now, so I have to stop with this trial-and-error nonsense and actually finish the data analysis. I am begging you guys, if there is any piece of software you know that would do this, what is it and what do I need to run it? (and will you coach me into getting the thing up and running) Or, ***if you have personal experience doing this kind of thing (or know someone who does), could you PLEASE contact me*** - I will seriously buy you beer and make you cookies or whatever you want
    Last edited by austic; 01-30-2012, 06:08 PM.

  • #2
    Send me a PM and I can talk with you offline.

    Comment


    • #3
      Thank you guys so much for all your help!

      Comment


      • #4
        Sounds like you are sorted, just saw this, so probably the following is not needed, but
        I have a tool that (I think) does what you want - it will simultaneously assemble them all and just tell you what the differences are and which strains have what.

        Website: cortexassembler.sourceforge.net
        Paper:
        Z Iqbal, M Caccamo, I Turner, P Flicek, G McVean. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics (2012) (doi:10.1038/ng.1028)

        Feel free to contact me (zam AT well.ox.ac.uk) if you want to find out more.

        Comment


        • #5
          PS - if you let me know how many strains and what kind of coverage, I can tell you how hard this is. But roughly speaking, I would expect you to be able to get results within a day

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Investigating the Gut Microbiome Through Diet and Spatial Biology
            by seqadmin




            The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
            02-24-2025, 06:31 AM
          • seqadmin
            Quality Control Essentials for Next-Generation Sequencing Workflows
            by seqadmin




            Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

            Nucleic Acid Quality Control
            Preparing for NGS starts with isolating the...
            02-10-2025, 01:58 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-03-2025, 01:15 PM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-28-2025, 12:58 PM
          0 responses
          124 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-24-2025, 02:48 PM
          0 responses
          485 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-21-2025, 02:46 PM
          0 responses
          241 views
          0 likes
          Last Post seqadmin  
          Working...
          X