best wishes from a Newbie,
please any advice from someone working on large plant genomes:
What is the standard , best approach :
I have 5 lanes of illumina paired -end data for a 300 Mb diploid plant genome - probably about 7 x coverage in total according to velvet-too low to make a decent assembly - very fragmented -1000s of contigs from both velvet and clc assemblers.(probably result of using only 200 bp inserts and all the regions of repeats)
I have some gene sequences and repeat markers that border regions enclosing the genes-
I need to find the similar gene sequences in the reads but preferably in the contigs-
the purpose being to find what differences between genes or enclosed regions expressing 2 different quality traits (SNP and gene discovery).
Blast the genes (make blastdb from gene sequence) with the contigs?
or using BWA low stringency align reads to genes (make BWA index of genes/fragments of interest)?
we'll be doing alot of this type of exercise - looking for differences between plant varieties using low coverage of large genomes without references.
Also what is the best tool for reference guided assembly (as in using a related/distant reference to help assembly) -velvet columbus?
please any advice from someone working on large plant genomes:
What is the standard , best approach :
I have 5 lanes of illumina paired -end data for a 300 Mb diploid plant genome - probably about 7 x coverage in total according to velvet-too low to make a decent assembly - very fragmented -1000s of contigs from both velvet and clc assemblers.(probably result of using only 200 bp inserts and all the regions of repeats)
I have some gene sequences and repeat markers that border regions enclosing the genes-
I need to find the similar gene sequences in the reads but preferably in the contigs-
the purpose being to find what differences between genes or enclosed regions expressing 2 different quality traits (SNP and gene discovery).
Blast the genes (make blastdb from gene sequence) with the contigs?
or using BWA low stringency align reads to genes (make BWA index of genes/fragments of interest)?
we'll be doing alot of this type of exercise - looking for differences between plant varieties using low coverage of large genomes without references.
Also what is the best tool for reference guided assembly (as in using a related/distant reference to help assembly) -velvet columbus?