SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pathway analysis recommendations? Pesto Bioinformatics 5 05-25-2012 11:55 AM
EMS causal mutation identification by NGS giorgifm Bioinformatics 1 01-09-2012 01:17 AM
Recommendations for sequencing facility crh Core Facilities 9 06-15-2011 02:13 PM
PubMed: Enhanced mismatch mutation analysis: simultaneous detection of point mutation Newsbot! Literature Watch 0 12-14-2010 03:20 AM
recommendations on multiplex? greigite Illumina/Solexa 24 08-30-2010 05:48 PM

Reply
 
Thread Tools
Old 02-29-2012, 01:44 AM   #1
Noa
Member
 
Location: haifa israel

Join Date: Jun 2011
Posts: 60
Default Recommendations for yeast mutation identification

Hi
I am just starting a new project and fishing for what the latest recommendations are for bioinformatic tools or workflows to identify mutations in yeast- I have Illumina sequences of the parent and mutant strains.
Thanks!
noa
Noa is offline   Reply With Quote
Old 02-29-2012, 06:29 AM   #2
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 499
Default

We've used a pipeline of BFAST -> Samtools -> Annovar with success for S. cerevisiae. However, be aware that the SNP density is very high, and you'll need high read coverage (at least 100X) to obtain accurate results.
HESmith is offline   Reply With Quote
Old 03-13-2012, 12:34 AM   #3
Noa
Member
 
Location: haifa israel

Join Date: Jun 2011
Posts: 60
Default

Thanks- can you please elaborate on that pipeline?
Also- I am working on data previously generated by the new lab I joined- the data was collected on Illumina not on a single clone but rather on a mix of ~150 yeast clones together (lumped into one single Illumina lane without barcoding). The goal is to find genes that are causing a specific phenotype. Is this feasible or should I redo the experiment and sequence single clones?
Thanks
Noa is offline   Reply With Quote
Old 03-20-2012, 03:24 AM   #4
matan8
Junior Member
 
Location: Jerusalem, Israel

Join Date: Feb 2012
Posts: 9
Default

I am working with bwa->samtools->GATK. Didn't verify lot of my work, so far. But it looks good.
matan8 is offline   Reply With Quote
Old 03-20-2012, 09:27 AM   #5
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by Noa View Post
Thanks- can you please elaborate on that pipeline?
Also- I am working on data previously generated by the new lab I joined- the data was collected on Illumina not on a single clone but rather on a mix of ~150 yeast clones together (lumped into one single Illumina lane without barcoding). The goal is to find genes that are causing a specific phenotype. Is this feasible or should I redo the experiment and sequence single clones?
Thanks
150 clones together? So that a true mutation would be seen in < 1% of the reads? You'll need huge coverage to distinguish true rare mutations from background error, and I'm not sure off the top of my head what software will reliably call SNPs like that.

If you redid, say, 10 clones, found their mutations, then sanger sequenced candidate genes in the rest of the clones, that might work better.
swbarnes2 is offline   Reply With Quote
Old 03-20-2012, 10:07 AM   #6
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 499
Default

Quote:
Originally Posted by Noa View Post
Thanks- can you please elaborate on that pipeline?
Also- I am working on data previously generated by the new lab I joined- the data was collected on Illumina not on a single clone but rather on a mix of ~150 yeast clones together (lumped into one single Illumina lane without barcoding). The goal is to find genes that are causing a specific phenotype. Is this feasible or should I redo the experiment and sequence single clones?
Thanks
What elaboration would you like? I'm happy to answer specific questions.

Regarding the 150 pooled clones: are these merely independent segregants from the same diploid genotype, or isolates from 150 different mutant strains? If the latter, the data will be useless for identifying mutations. If the former, then you should be fine. See previous comment re: coverage.

-Harold

Last edited by HESmith; 03-20-2012 at 11:48 AM.
HESmith is offline   Reply With Quote
Old 03-20-2012, 12:02 PM   #7
Noa
Member
 
Location: haifa israel

Join Date: Jun 2011
Posts: 60
Default

Thanks for all your help on this. OK so the way I understand it (and please dont ask why the experiment was done this way...I was not involved then)- we have ~200fold coverage of each of the parent lines, ~200x coverage of a lump from the 5th generation after various backcrosses to one of the parents (performed by just taking DNA from all the yeast, not from any number of individuals, so I dont even know if a few of the yeasts are more highly represented than others, etc). Then we have about 600x coverage of the 10% of the yeast that showed the phenotype of interest, and this was done by taking ~100 individual yeast clones, extracting DNA, and taking identical quantities of their DNA to build an Illumina library (so each of these 100 clones is roughly identically represented). I think the thinking was something like extreme QTL analysis. Is it possible/likely that a lot of these 100 clones will harbor the same few mutations (as they came from the same parents and presumably got the phenotype from one of the parents via introgression before the backcrossing), and that therefore the coverage would be enough to identify something??
Noa is offline   Reply With Quote
Old 03-20-2012, 12:26 PM   #8
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 499
Default

The coverage should be sufficient for mutation identification using the following criteria. 1) The causative mutation should be homozygous. 2) If the parental strains used for sequencing are pre-mutagenesis, then the causative mutation should be unique (i.e., absent in the parents). 3) Variants that were preexisting in the mutagenized strain and tightly linked to the causative mutation should also be homozygous (and, conversely, unique variants from the backcross strain should be absent in this interval). 4) Variants that are unique to either parent should be heterozygous at most loci.

Good luck,
Harold
HESmith is offline   Reply With Quote
Old 03-20-2012, 12:44 PM   #9
Noa
Member
 
Location: haifa israel

Join Date: Jun 2011
Posts: 60
Default

1) how can the causative mutation be homozygous if my sequencing data is from 100 strains? can i just use allele frequency and assume that the frequency should be much higher than that sequenced in the entire generation (not looking at the clones of a particular phenotype)?
2) there was no mutagenesis so I cant know whether there was a SNP that occurred randomly and was selected for giving the particular phenotype, or whether it is one/a few genes given by the donor parent in the beginning of the introgression.
3) same problem as in 1 - how can i be sure it is homozygous if we are looking at a population? can i use allele frequency?
4) wasnt sure what you meant by #4- why heterozygous?
Noa is offline   Reply With Quote
Old 03-20-2012, 01:24 PM   #10
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 499
Default

Quote:
Originally Posted by Noa View Post
1) how can the causative mutation be homozygous if my sequencing data is from 100 strains? can i just use allele frequency and assume that the frequency should be much higher than that sequenced in the entire generation (not looking at the clones of a particular phenotype)?
2) there was no mutagenesis so I cant know whether there was a SNP that occurred randomly and was selected for giving the particular phenotype, or whether it is one/a few genes given by the donor parent in the beginning of the introgression.
3) same problem as in 1 - how can i be sure it is homozygous if we are looking at a population? can i use allele frequency?
4) wasnt sure what you meant by #4- why heterozygous?
From the way you described the experiment, I assumed that you have a single variant locus that produces your phenotype of interest. The criteria I outlined are based on the parental and pooled data sets only. I also assumed that the pooled sample came from segregants of parent A crossed to parent B.

1) You said that you picked and pooled only those isolates that had the phenotype; each of those isolates should contain the causative mutation, which will appear as a homozygous variant in that sample (i.e., allele frequency should be 1).
2) Okay, so you can't use uniqueness as a criterion.
3 & 4) You have data from each of the parent strains. Identify all of the variants present in parent A and in parent B. Each variant will be unique to A, unique to B, or present in both. Ignore the last. Unlinked variants in your pooled sample will segregate randomly and be present in ~50% of the isolates; those will be reported as heterozygotes. Linked variants should be present or absent from all isolates for the same reason as in #1.

If the assumptions that I made were incorrect, then the analysis becomes more complicated. For example, if the phenotype results from two loci, then you'll have to look for two homozygous alleles in your pooled sample. Or, if the pooled sample was generated after five backcrosses to parent B, then you'll have to filter out the homozygous parent B variants from your pooled sample since they're a consequence of the backcrossing rather than the phenotype.

One more complication: since your mutation may be spontaneous, it may be a transposon insertion. Standard SNP pipelines will almost certainly not detect this type of lesion, so you'll need to screen your data by a different approach.
HESmith is offline   Reply With Quote
Old 03-21-2012, 01:48 AM   #11
Noa
Member
 
Location: haifa israel

Join Date: Jun 2011
Posts: 60
Default

Thanks for all your help.
One more question: you mentioned a transposon insertion - I was planning on looking for INDELS as well. I assume I need something different for this. Any tools you know of?

And finally- one additional worry I have is with respect to what genome do I map back to? I have been mapping SNPs so far using the reference S288C yeast genome. This is more or less identical to one parent we used. Our other parent is a S cerevisiae from nature. My worry is - what if there is a gene/s present in the natural isolate- we could entirely miss this in the "unmapped" reads. Is this common (huge regions/genes) that are unmapped when mapping a natural isolate to the ref genome? Should I build the entire parental genome or should I BLAST contigs made from de novo f the unmapped reads?
Thanks again...
Noa is offline   Reply With Quote
Old 03-21-2012, 06:04 AM   #12
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 499
Default

Check the wiki for recommended software for indel/structural variant analysis. You can also use split-end reads (found here) for both transposon and indel mapping. De novo assembly of the unmapped reads might be useful in identifying novel segments of the natural isolate.
HESmith is offline   Reply With Quote
Reply

Tags
bioinformatics, illumina, mutation, yeast

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO