Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by pfranchini View Post
    how many individuals are necessary for a reliable SNPs detection?
    P
    I'm still not sure about the right answer here. For mapping to a ref, to eliminate many of the false positives, I would say to go as high as 25x-30x (for hets, for homozygous, lower would still be good).

    But starting from an assembly which won't be perfect to start with, I don't really know but it should probably be around the same.

    Actually you could use only one individual for the 454 run, and use all the individuals (separately) for the alignment part.

    Use individual A 454PE + individual A GAPE to assemble
    Use all individuals on that assembly to find snps.

    Comment


    • #17
      We will be using paired-end 75bp Illumina reads for our next project, since we believe the higher sequence output will outweigh the longer read lengths of 454. Ultimately, if you are just trying to identify SNPs more or less at random then you don't necessarily need big contigs, just enough to have sufficient flanking sequence.

      Depth will of course be related to what you originally sequence, but I'd suggest transcriptome or reduced representation library sequencing to ensure adequate depth without resorting to huge amounts of sequencing.

      We have used 10-20 pooled individuals, I think it is reasonably important here that these individuals are representative of any downstream SNP genotyping that you have in mind (if that is what you plan to do).

      Comment


      • #18
        I agree it depends on what you want.

        In our case we wanted the assembly (we're working on finishing...the painful part), but if the only part of interest are snps, long PE aren't necessary like you mentioned.

        The transcriptome is fine for exonic snps, but if you're looking at regulatory or others, it's not really an option.

        Comment


        • #19
          yep, agree with lletourn that the optimal strategy very much depends on what type of SNPs you want to find and what you want to do with them afterwards

          Comment


          • #20
            Originally posted by MattB View Post
            We have used 10-20 pooled individuals
            Again it depends (I hate that sentence and it keeps croping up).

            The more individuals are pooled, the less you'll see rare snps except if you have higher coverage.

            But, the more 'frequent' snp in your population you'll see.

            If you want 'all' the snps between a ref and an individual, with a coverage around 30x you probably won't find false negatives using GA.

            But if you have 2 individual pooled, your reads a spread between them so you'll miss rarer snps.

            So if you want population genetics, pool away
            if you want a specific mutation for a phenotype (say ENU induced), don't pool. (this is extreme since you know only one individual has the mutation, but same goes for rare diseases).

            BTW, I never thanked you for the first reply...thanks :-)

            Comment


            • #21
              Good points there. In our case, we were not too concerned about rare alleles, in fact we planned to avoid those SNPs!

              I don't think I've really mentioned it above, but in our case the goal was to find 3000 or so polymorphic SNPs throughout the genome for subsequent genotyping in a linkage mapping study. Therefore, the strategy I describe is related to this goal:

              -20 individuals represented the subsequent mapping population
              -transcriptome sequencing gave us good depth and some annotation info
              -pooling enabled us to run only 2 lanes for cost efficiency

              All these things probably need to be modified like lletourn suggests if you have different goals in mind.

              Comment


              • #22
                Thanks for the info.

                Actually, we were wondering to sequence the transcriptome to obtain a higher coverage and find more SNPs as possible to use them in genotyping and for linkage map studies.
                We are more interested in most common SNPs and no in rare ones, for this reason we thought to use many animals and optimizing costs using less lanes as possible of a single Illumina run.
                We just have a preliminary transcriptome obtained by three Illumina lanes (short reads single and paired of about 40-45 bp) for a total of 18 different animals but closely related. The coverage of the contigs file we built with Velvet is around 30X. Theoretically, should be these data a good starting point to detect SNPs?
                P

                Comment


                • #23
                  Sure, the only disadvantage of transcriptomics if uniformity. Since some will be more expressed than others and the fact that this is tissue/time specific will biase the finding of snps for specific transcripts.

                  If you know that what you are looking for is in an moderately to highly expressed transcript this is a great and cheap way of getting the answer.

                  If all you know is that it's expressed but don't know in which tissue or if it's highly expressed or not, reduced genome approach targeting exons might be better...if you know the genome, which I guess, is not your case.

                  Abyss does a good job at assembling RNA, I don't know about velvet. There are special considerations when assembling RNA because alternate splice sites confuses assemblers if they don't know they're there.

                  Comment


                  • #24
                    Just to note: another strategy that has been employed for this problem is to pool genomic DNA from multiple individuals, digest with a restriction enzyme, size select and make libraries from that defined fraction. I believe the recent chicken paper in Nature did this and definitely it has been published for other species (cows & pigs?).

                    By performing the restriction digestion & size selection you essentially create a smaller genome, which can then be sequenced exhaustively.

                    Comment


                    • #25
                      We did think about reducing genome with some techniques like the one you suggested. We are oriented to transcriptome because we just have some sets Illumina short reads about 40 bp of length and we would like to improve our de novo transcriptome assembly, and with the increased data and depth of coverage obtain a more reliable SNPs detection.

                      Regarding the suggestions of Iletourn, I tried some preliminary analysis regarding SNPs detection with MAQ and SAMTOOLS and I had some results. The only thing/problem in my sample is the relation between individuals (19 animals but 16 of them are very closely related beeing sibling) and I think it could affect the analyses. What do you think about? As suggested by MattB, 20 animals are sufficient to start,but for genotyping and for linkage map studies the individuals should be not related. Am I correct? What do you think regarding sample composition?
                      Thanks for all suggestions and comments!

                      Comment


                      • #26
                        Well, you'll definitely pick up polymorphic SNPs in the 16 animal family...and I'd suspect they'll probably be polymorphic in other individuals/populations unless there are dramatic genetic differences between them and 'other' animals (and avoid the very low MAF SNPs if mapping is your goal).

                        The best thing to do is probably select a small number of your discovered SNPs, and test them in a validation panel of other individuals. This will tell you if your assembly and SNP discovery is doing a good job.

                        Comment


                        • #27
                          Originally posted by MattB View Post
                          The best thing to do is probably select a small number of your discovered SNPs, and test them in a validation panel of other individuals. This will tell you if your assembly and SNP discovery is doing a good job.
                          This is a very good approach to take for data validation. It tests your discovery pipeline at the same time and if you build a good panel, it's cheaper in the end to run on many of your individuals.

                          Comment


                          • #28
                            SNPs and de novo assembly

                            I'm working in a similar dataset that I've inherited - 454 transcriptome runs from 30 pooled individuals. I've been wondering how assembler handle major frequency SNPs. Would contigs be split in two at sites of a SNP that's present in a roughly 50:50 ratio in your reads or would one or the other variant be selected as a representative base at that position?

                            Does anyone have experience with different assemblers and how they handle polymorphims when constructing contigs? So far I have assemblies from Newbler and clc for this dataset and from ABySS and clc for an Illumina dataset but I'm not sure how to compare between the different assemblers really.

                            Comment


                            • #29
                              Does anyone have experience with different assemblers and how they handle polymorphims when constructing contigs? So far I have assemblies from Newbler and clc for this dataset and from ABySS and clc for an Illumina dataset but I'm not sure how to compare between the different assemblers really.
                              I think overall the de novo assemblers handle SNPs quite OK (at least from my experience with CLC; Abyss and SOAPdenovo). I think SOAPdenovo chooses one of the SNP alleles at random for the contig sequence (ie. consensus), but others may use the 'major' allele. Using CLC, I realigned my reads back to my de novo reference, and used the SNP detector ('find variants'). This actually allows you to replace the SNP allele in your reference with the 'major' SNP allele if it wasn't already there.

                              I can't really comment on how the de novo assemblers compare in terms of SNP handling performance, but those mentioned above have worked OK for me with 50:50 SNPs..

                              Matt

                              Comment


                              • #30
                                Originally posted by MattB View Post
                                I think overall the de novo assemblers handle SNPs quite OK (at least from my experience with CLC; Abyss and SOAPdenovo)....

                                Matt
                                Did you change any settings for ABySS or SOAPdenovo to specifically help handle SNPs or go with defaults?

                                Do you have any feel for how they cope with things that aren't 50:50? In a pool of individuals where you have lower frequency alleles coming from high quality reads do you know if this will cause contig splitting or is it that a consensus base will be called?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X