Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lletourn
    Member
    • Oct 2009
    • 63

    Snp discovery without a reference

    I have paired-end (76bp) output from a GA in which I would like to try snp discovery. The hiccup is there is no reference genome for my specie.

    Does anyone have any ideas, or know any tool that could do this?

    Most of the tools that do snp discovery well, use a pre aligned dataset to work on. If I were to assemble the data, is there something that could to ace->(snp discovery tool format) to do the work?

    Thanks
  • MattB
    Member
    • Aug 2008
    • 35

    #2
    Hi,

    you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs.

    Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand.

    Matt

    Comment

    • lletourn
      Member
      • Oct 2009
      • 63

      #3
      I thought about this and, without any good reason, I wondered if any 'bias' or something of the sort would be added to the results since the reads used to build an assembly would be aligned to themselves.

      Can't hurt trying though (except for a few lost CPU hours :-) )

      thanks

      Comment

      • MattB
        Member
        • Aug 2008
        • 35

        #4
        I can't think of any reason why this wouldn't work myself....but stand to be corrected In fact, I think it makes for an interesting comparison between the denovo assembly program and parameters that are used in that to the corresponding parameters in the reference guided assembler.

        Matt

        Comment

        • Nick Miller
          Junior Member
          • Jun 2009
          • 2

          #5
          I am in the middle of trying this approach for SNP discovery. My starting material was normalized cDNA from several individuals. I used SSAKE for the assembly and maq to look for SNPs. I am hoping to test some of the putative SNPs soon.

          Comment

          • bioenvisage
            Member
            • Oct 2009
            • 40

            #6
            Hi,


            why cant you try using the ESTs as the reference for aligning..

            Comment

            • lletourn
              Member
              • Oct 2009
              • 63

              #7
              There are no ESTs on my fungi genome, as far as I know.

              I tried MattB's approach and it seemed to work well. I have a bit too many snps compared to what would be expected, but the lab will validate a few as QC.

              Comment

              • MattB
                Member
                • Aug 2008
                • 35

                #8
                I'd be suspicious about SNPs only found on the last one or two bases of your reads (I posted a separate thread on this), as they could well be remnants of adaptor sequence (adaptor trimming won't work when only one or few bases of adaptor are present on the ends of your reads).

                Comment

                • Boonie
                  Junior Member
                  • Mar 2009
                  • 6

                  #9
                  Is there a need to obtain flanking sequence to design a genotyping assay? If so, how will you get sufficient flanking sequence if you are mapping short reads to the contig consensus seqs (assuming no reference genome).

                  Comment

                  • MattB
                    Member
                    • Aug 2008
                    • 35

                    #10
                    Boonie, it depends on the type of genotyping assay (ie. number of SNPs) that are interested in. For the Illumina Infinium iSelect assay, Illumina specify minimum 50bp on EITHER side of the SNP for probe design, so short contigs in theory aren't such a problem (although it would be nice to have 50bp both sides so Illumina can pick the 'best' probe). For other genotyping applications like Sequenom iPlex, then you will need more flanking sequence on both sides..

                    Comment

                    • little_beetle
                      Junior Member
                      • Mar 2010
                      • 1

                      #11
                      This is great MattB.
                      I am trying to develop SNP from a de novo assembled EST library.
                      How do you joined them contigs into a single sequence? Do you put them together according to some sort of order or just simply join all contig sequences?
                      Thanks.

                      Originally posted by MattB View Post
                      Hi,

                      you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs.

                      Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand.

                      Matt

                      Comment

                      • drio
                        Senior Member
                        • Oct 2008
                        • 323

                        #12
                        Originally posted by MattB View Post
                        Hi,

                        you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs.

                        Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand.

                        Matt
                        Once you have your de novo assembly treat that as your reference (as MattB is saying here). After that, remap the reads back to the "new" reference and pileup the alignments. Finally you can setup your filters to try to get the best snps possible.

                        Let us know how it goes.
                        -drd

                        Comment

                        • MattB
                          Member
                          • Aug 2008
                          • 35

                          #13
                          We just joined the contigs in the order they were output by the denovo assember, so essentially at random. Since I posted that however, I have been using the CLC NGS Cell software to perform de novo assembly, reference guided alignment and SNP detection on the contigs separately...

                          So naturally if the alignment/SNP detection software can handle thousands of separate contigs, then this is probably preferable, and makes life easier if you are BLASTing your assembled ESTs...

                          Matt

                          Comment

                          • pfranchini
                            Member
                            • May 2009
                            • 19

                            #14
                            Hi, We are starting a project aiming to detect SNPs in a species without reference genome.
                            I also have thought to assembly my short reads de novo and use the obtained contigs as reference.
                            From your experience, what is the best NGS technology for an approach like this? We are wondering between 454 Titanium and Solexa (75 bp reads).
                            Then, how many individuals are necessary for a reliable SNPs detection?
                            Thanks for you help!
                            P

                            Comment

                            • lletourn
                              Member
                              • Oct 2009
                              • 63

                              #15
                              We worked with hybrid assemblies using the bigger PE 454 to builder bigger scaffolds (we used 8k because our lab had trouble with the 20k protocol) and we used illuminas 76 short insert PE to have bigger depth of coverge (we didn't use the 5k long inserts again because the lab had some trouble in the past).

                              We used wgs-celera to assemble and remapped the reads and used samtools to call the snps.

                              It worked rather well. The drawback is in costs, since you need double the number of librairies.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 10:09 AM
                              0 responses
                              9 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              26 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...