View Single Post
Old 04-02-2010, 01:21 PM   #2
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Firstly, please DO NOT use the toplevel file from Ensembl. It has >1Gbp ambiguous bases as space holder. You may use the b37 reference file here: ftp://ftp.ncbi.nih.gov/1000genomes/f...cal/reference/

Secondly, bwa-short would not work well for large -n. This is by design. You may consider disable seeding with -l 10000, but I guess it will be impractically slow. What is the sequencing error rate you are simulating? It seems very high. Typical Illumina sequencing error rate is only ~1% which means an 100bp read only has a couple of errors.
lh3 is offline   Reply With Quote