Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Snp discovery without a reference

    I have paired-end (76bp) output from a GA in which I would like to try snp discovery. The hiccup is there is no reference genome for my specie.

    Does anyone have any ideas, or know any tool that could do this?

    Most of the tools that do snp discovery well, use a pre aligned dataset to work on. If I were to assemble the data, is there something that could to ace->(snp discovery tool format) to do the work?

    Thanks

  • #2
    Hi,

    you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs.

    Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand.

    Matt

    Comment


    • #3
      I thought about this and, without any good reason, I wondered if any 'bias' or something of the sort would be added to the results since the reads used to build an assembly would be aligned to themselves.

      Can't hurt trying though (except for a few lost CPU hours :-) )

      thanks

      Comment


      • #4
        I can't think of any reason why this wouldn't work myself....but stand to be corrected In fact, I think it makes for an interesting comparison between the denovo assembly program and parameters that are used in that to the corresponding parameters in the reference guided assembler.

        Matt

        Comment


        • #5
          I am in the middle of trying this approach for SNP discovery. My starting material was normalized cDNA from several individuals. I used SSAKE for the assembly and maq to look for SNPs. I am hoping to test some of the putative SNPs soon.

          Comment


          • #6
            Hi,


            why cant you try using the ESTs as the reference for aligning..

            Comment


            • #7
              There are no ESTs on my fungi genome, as far as I know.

              I tried MattB's approach and it seemed to work well. I have a bit too many snps compared to what would be expected, but the lab will validate a few as QC.

              Comment


              • #8
                I'd be suspicious about SNPs only found on the last one or two bases of your reads (I posted a separate thread on this), as they could well be remnants of adaptor sequence (adaptor trimming won't work when only one or few bases of adaptor are present on the ends of your reads).

                Comment


                • #9
                  Is there a need to obtain flanking sequence to design a genotyping assay? If so, how will you get sufficient flanking sequence if you are mapping short reads to the contig consensus seqs (assuming no reference genome).

                  Comment


                  • #10
                    Boonie, it depends on the type of genotyping assay (ie. number of SNPs) that are interested in. For the Illumina Infinium iSelect assay, Illumina specify minimum 50bp on EITHER side of the SNP for probe design, so short contigs in theory aren't such a problem (although it would be nice to have 50bp both sides so Illumina can pick the 'best' probe). For other genotyping applications like Sequenom iPlex, then you will need more flanking sequence on both sides..

                    Comment


                    • #11
                      This is great MattB.
                      I am trying to develop SNP from a de novo assembled EST library.
                      How do you joined them contigs into a single sequence? Do you put them together according to some sort of order or just simply join all contig sequences?
                      Thanks.

                      Originally posted by MattB View Post
                      Hi,

                      you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs.

                      Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand.

                      Matt

                      Comment


                      • #12
                        Originally posted by MattB View Post
                        Hi,

                        you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs.

                        Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand.

                        Matt
                        Once you have your de novo assembly treat that as your reference (as MattB is saying here). After that, remap the reads back to the "new" reference and pileup the alignments. Finally you can setup your filters to try to get the best snps possible.

                        Let us know how it goes.
                        -drd

                        Comment


                        • #13
                          We just joined the contigs in the order they were output by the denovo assember, so essentially at random. Since I posted that however, I have been using the CLC NGS Cell software to perform de novo assembly, reference guided alignment and SNP detection on the contigs separately...

                          So naturally if the alignment/SNP detection software can handle thousands of separate contigs, then this is probably preferable, and makes life easier if you are BLASTing your assembled ESTs...

                          Matt

                          Comment


                          • #14
                            Hi, We are starting a project aiming to detect SNPs in a species without reference genome.
                            I also have thought to assembly my short reads de novo and use the obtained contigs as reference.
                            From your experience, what is the best NGS technology for an approach like this? We are wondering between 454 Titanium and Solexa (75 bp reads).
                            Then, how many individuals are necessary for a reliable SNPs detection?
                            Thanks for you help!
                            P

                            Comment


                            • #15
                              We worked with hybrid assemblies using the bigger PE 454 to builder bigger scaffolds (we used 8k because our lab had trouble with the 20k protocol) and we used illuminas 76 short insert PE to have bigger depth of coverge (we didn't use the 5k long inserts again because the lab had some trouble in the past).

                              We used wgs-celera to assemble and remapped the reads and used samtools to call the snps.

                              It worked rather well. The drawback is in costs, since you need double the number of librairies.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X