Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recommendations for yeast mutation identification

    Hi
    I am just starting a new project and fishing for what the latest recommendations are for bioinformatic tools or workflows to identify mutations in yeast- I have Illumina sequences of the parent and mutant strains.
    Thanks!
    noa

  • #2
    We've used a pipeline of BFAST -> Samtools -> Annovar with success for S. cerevisiae. However, be aware that the SNP density is very high, and you'll need high read coverage (at least 100X) to obtain accurate results.

    Comment


    • #3
      Thanks- can you please elaborate on that pipeline?
      Also- I am working on data previously generated by the new lab I joined- the data was collected on Illumina not on a single clone but rather on a mix of ~150 yeast clones together (lumped into one single Illumina lane without barcoding). The goal is to find genes that are causing a specific phenotype. Is this feasible or should I redo the experiment and sequence single clones?
      Thanks

      Comment


      • #4
        I am working with bwa->samtools->GATK. Didn't verify lot of my work, so far. But it looks good.

        Comment


        • #5
          Originally posted by Noa View Post
          Thanks- can you please elaborate on that pipeline?
          Also- I am working on data previously generated by the new lab I joined- the data was collected on Illumina not on a single clone but rather on a mix of ~150 yeast clones together (lumped into one single Illumina lane without barcoding). The goal is to find genes that are causing a specific phenotype. Is this feasible or should I redo the experiment and sequence single clones?
          Thanks
          150 clones together? So that a true mutation would be seen in < 1% of the reads? You'll need huge coverage to distinguish true rare mutations from background error, and I'm not sure off the top of my head what software will reliably call SNPs like that.

          If you redid, say, 10 clones, found their mutations, then sanger sequenced candidate genes in the rest of the clones, that might work better.

          Comment


          • #6
            Originally posted by Noa View Post
            Thanks- can you please elaborate on that pipeline?
            Also- I am working on data previously generated by the new lab I joined- the data was collected on Illumina not on a single clone but rather on a mix of ~150 yeast clones together (lumped into one single Illumina lane without barcoding). The goal is to find genes that are causing a specific phenotype. Is this feasible or should I redo the experiment and sequence single clones?
            Thanks
            What elaboration would you like? I'm happy to answer specific questions.

            Regarding the 150 pooled clones: are these merely independent segregants from the same diploid genotype, or isolates from 150 different mutant strains? If the latter, the data will be useless for identifying mutations. If the former, then you should be fine. See previous comment re: coverage.

            -Harold
            Last edited by HESmith; 03-20-2012, 10:48 AM.

            Comment


            • #7
              Thanks for all your help on this. OK so the way I understand it (and please dont ask why the experiment was done this way...I was not involved then)- we have ~200fold coverage of each of the parent lines, ~200x coverage of a lump from the 5th generation after various backcrosses to one of the parents (performed by just taking DNA from all the yeast, not from any number of individuals, so I dont even know if a few of the yeasts are more highly represented than others, etc). Then we have about 600x coverage of the 10% of the yeast that showed the phenotype of interest, and this was done by taking ~100 individual yeast clones, extracting DNA, and taking identical quantities of their DNA to build an Illumina library (so each of these 100 clones is roughly identically represented). I think the thinking was something like extreme QTL analysis. Is it possible/likely that a lot of these 100 clones will harbor the same few mutations (as they came from the same parents and presumably got the phenotype from one of the parents via introgression before the backcrossing), and that therefore the coverage would be enough to identify something??

              Comment


              • #8
                The coverage should be sufficient for mutation identification using the following criteria. 1) The causative mutation should be homozygous. 2) If the parental strains used for sequencing are pre-mutagenesis, then the causative mutation should be unique (i.e., absent in the parents). 3) Variants that were preexisting in the mutagenized strain and tightly linked to the causative mutation should also be homozygous (and, conversely, unique variants from the backcross strain should be absent in this interval). 4) Variants that are unique to either parent should be heterozygous at most loci.

                Good luck,
                Harold

                Comment


                • #9
                  1) how can the causative mutation be homozygous if my sequencing data is from 100 strains? can i just use allele frequency and assume that the frequency should be much higher than that sequenced in the entire generation (not looking at the clones of a particular phenotype)?
                  2) there was no mutagenesis so I cant know whether there was a SNP that occurred randomly and was selected for giving the particular phenotype, or whether it is one/a few genes given by the donor parent in the beginning of the introgression.
                  3) same problem as in 1 - how can i be sure it is homozygous if we are looking at a population? can i use allele frequency?
                  4) wasnt sure what you meant by #4- why heterozygous?

                  Comment


                  • #10
                    Originally posted by Noa View Post
                    1) how can the causative mutation be homozygous if my sequencing data is from 100 strains? can i just use allele frequency and assume that the frequency should be much higher than that sequenced in the entire generation (not looking at the clones of a particular phenotype)?
                    2) there was no mutagenesis so I cant know whether there was a SNP that occurred randomly and was selected for giving the particular phenotype, or whether it is one/a few genes given by the donor parent in the beginning of the introgression.
                    3) same problem as in 1 - how can i be sure it is homozygous if we are looking at a population? can i use allele frequency?
                    4) wasnt sure what you meant by #4- why heterozygous?
                    From the way you described the experiment, I assumed that you have a single variant locus that produces your phenotype of interest. The criteria I outlined are based on the parental and pooled data sets only. I also assumed that the pooled sample came from segregants of parent A crossed to parent B.

                    1) You said that you picked and pooled only those isolates that had the phenotype; each of those isolates should contain the causative mutation, which will appear as a homozygous variant in that sample (i.e., allele frequency should be 1).
                    2) Okay, so you can't use uniqueness as a criterion.
                    3 & 4) You have data from each of the parent strains. Identify all of the variants present in parent A and in parent B. Each variant will be unique to A, unique to B, or present in both. Ignore the last. Unlinked variants in your pooled sample will segregate randomly and be present in ~50% of the isolates; those will be reported as heterozygotes. Linked variants should be present or absent from all isolates for the same reason as in #1.

                    If the assumptions that I made were incorrect, then the analysis becomes more complicated. For example, if the phenotype results from two loci, then you'll have to look for two homozygous alleles in your pooled sample. Or, if the pooled sample was generated after five backcrosses to parent B, then you'll have to filter out the homozygous parent B variants from your pooled sample since they're a consequence of the backcrossing rather than the phenotype.

                    One more complication: since your mutation may be spontaneous, it may be a transposon insertion. Standard SNP pipelines will almost certainly not detect this type of lesion, so you'll need to screen your data by a different approach.

                    Comment


                    • #11
                      Thanks for all your help.
                      One more question: you mentioned a transposon insertion - I was planning on looking for INDELS as well. I assume I need something different for this. Any tools you know of?

                      And finally- one additional worry I have is with respect to what genome do I map back to? I have been mapping SNPs so far using the reference S288C yeast genome. This is more or less identical to one parent we used. Our other parent is a S cerevisiae from nature. My worry is - what if there is a gene/s present in the natural isolate- we could entirely miss this in the "unmapped" reads. Is this common (huge regions/genes) that are unmapped when mapping a natural isolate to the ref genome? Should I build the entire parental genome or should I BLAST contigs made from de novo f the unmapped reads?
                      Thanks again...

                      Comment


                      • #12
                        Check the wiki for recommended software for indel/structural variant analysis. You can also use split-end reads (found here) for both transposon and indel mapping. De novo assembly of the unmapped reads might be useful in identifying novel segments of the natural isolate.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        30 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        32 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X