So we want to develop a sequencing strategy for a 50 Mbp Ascomycota genome (plant pathogen). This genome has no other references and the size is based upon other Ascomycota genome sizes. We do not know how much variation within the species or G+C content.
Can you help me develop a sequencing strategy on a budget???
Illumina GAIIx seems to be the most widely used and best supported... So this will likely be the platform of choice.
It seems that a single lane of data from the GAIIx will be sufficient in achieving enough data for a draft assembly....96X coverage.... assuming 30-50X coverage is required for assembly.
Our goals are to create a draft assembly and ultimately a final high-quality assembly.... find microsatellite markers to identify variation within and among the species.... possibly find SNP's for the same purpose or QTL.... determine gene structure for later RNA-Seq or EST analysis.... comparison of genome-wide relationships with other fungi....???? Anything else????
Our ultimate goal is to find host-pathogen relationships.... Which will help eliminate the pathogen in the host species
So as far as I can tell.....
#1 Isolate the genomic DNA from a single haploid culture of the fungus
I think that coming from a single haploid culture will help in the assembly process....but will eliminate the possibility of finding SNP's. Will this also eliminate finding any microsatellites???
Should I instead combiine many isolates, since a single lane from the GAIIx will yield 96X coverage???
#2 Will using paired-end sequences provide for a better assembly? Yes...right???
Will paired-end reads provide better microsatellite detection?? Is it worth the cost for our immediate goals of microsatellite detection and determining gene structure???
#3 After you receive the sequence data, you must filter and trim the data based on quality scores...this helps eliminate bad sequences from confusing the assembly programs....right???
Anyone have any favorite programs for this.... Galaxy...FASTX....????
#4 Once the sequences are “cleaned”...you must remove the repeat regions.... right?? This reduces the complexity of assembly programs....right??
Anyone have any favorite programs???....RepeatMasker
Will de novo repeat finders essentially find what I am looking for....microsatellites???
de novo repeat finders???
#5 I believe that our collaborators are familiar with Velvet and Abyss, so these programs should be able to assemble the genome.....
Any other favorite assemblers???
But are there better options for variant detection?
genotyping-by-sequencing??
cortex_var???
RAD-sequencing??
These require a different experimental design than the one being proposed...I know...but are they cost effective???
Please correct me on any mistake in judgment.... Thank you
Can you help me develop a sequencing strategy on a budget???
Illumina GAIIx seems to be the most widely used and best supported... So this will likely be the platform of choice.
It seems that a single lane of data from the GAIIx will be sufficient in achieving enough data for a draft assembly....96X coverage.... assuming 30-50X coverage is required for assembly.
Our goals are to create a draft assembly and ultimately a final high-quality assembly.... find microsatellite markers to identify variation within and among the species.... possibly find SNP's for the same purpose or QTL.... determine gene structure for later RNA-Seq or EST analysis.... comparison of genome-wide relationships with other fungi....???? Anything else????
Our ultimate goal is to find host-pathogen relationships.... Which will help eliminate the pathogen in the host species
So as far as I can tell.....
#1 Isolate the genomic DNA from a single haploid culture of the fungus
I think that coming from a single haploid culture will help in the assembly process....but will eliminate the possibility of finding SNP's. Will this also eliminate finding any microsatellites???
Should I instead combiine many isolates, since a single lane from the GAIIx will yield 96X coverage???
#2 Will using paired-end sequences provide for a better assembly? Yes...right???
Will paired-end reads provide better microsatellite detection?? Is it worth the cost for our immediate goals of microsatellite detection and determining gene structure???
#3 After you receive the sequence data, you must filter and trim the data based on quality scores...this helps eliminate bad sequences from confusing the assembly programs....right???
Anyone have any favorite programs for this.... Galaxy...FASTX....????
#4 Once the sequences are “cleaned”...you must remove the repeat regions.... right?? This reduces the complexity of assembly programs....right??
Anyone have any favorite programs???....RepeatMasker
Will de novo repeat finders essentially find what I am looking for....microsatellites???
de novo repeat finders???
#5 I believe that our collaborators are familiar with Velvet and Abyss, so these programs should be able to assemble the genome.....
Any other favorite assemblers???
But are there better options for variant detection?
genotyping-by-sequencing??
cortex_var???
RAD-sequencing??
These require a different experimental design than the one being proposed...I know...but are they cost effective???
Please correct me on any mistake in judgment.... Thank you
Comment