Hi
I am new to using NGS and need advice. Let me start by thanking all of you who post replies to the forums- it takes time to share, but believe me- we newbies really appreciate it! The learning curve on this stuff is steep!
I am trying to assemble a 5Mb bacterial genome from Illumina 40bp single reads. FastQC says that we got about 30 million reads. The base quality on all of the reads is above 30 so I do not believe that I have to trim or filter.
I would like to try de novo assembly since the genes I am interested in are novel and likely to reside on a plasmid, hence it would be difficult to use a reference genome to make the contigs. In addition they are likely flanked by non- unique DNA. I am not computer savy, but I do have access to Velvet and have run it a few times before. I have been using Tablet to see the .afg file output.
Here are my questions:
1. Apart changing k-mer length, what other parameters should be manipulated to optimize assembly?
2. On a related note, until we get the recommendations back from projects like GAGE, is there a series of hints that anyone can forward or a post somewhere with hints that will help with this kind of analysis for single read data?
3. Is there free software that can take any of the output files from Velvet and calculate the N50 value, so that as we do our iterations, we can figure out what works better? As I said, I am not a programmer so I am looking for something that is plug and play.
4. Does anyone have advice on the contig sizes that are 'normally expected' for this kind of assembly. In other words, if I get about 1700 contigs that are longer than 100 bp with a few that are 50-70 kb, is this considered good, or do I have a long way to go for optimization?
Many thanks
I am new to using NGS and need advice. Let me start by thanking all of you who post replies to the forums- it takes time to share, but believe me- we newbies really appreciate it! The learning curve on this stuff is steep!
I am trying to assemble a 5Mb bacterial genome from Illumina 40bp single reads. FastQC says that we got about 30 million reads. The base quality on all of the reads is above 30 so I do not believe that I have to trim or filter.
I would like to try de novo assembly since the genes I am interested in are novel and likely to reside on a plasmid, hence it would be difficult to use a reference genome to make the contigs. In addition they are likely flanked by non- unique DNA. I am not computer savy, but I do have access to Velvet and have run it a few times before. I have been using Tablet to see the .afg file output.
Here are my questions:
1. Apart changing k-mer length, what other parameters should be manipulated to optimize assembly?
2. On a related note, until we get the recommendations back from projects like GAGE, is there a series of hints that anyone can forward or a post somewhere with hints that will help with this kind of analysis for single read data?
3. Is there free software that can take any of the output files from Velvet and calculate the N50 value, so that as we do our iterations, we can figure out what works better? As I said, I am not a programmer so I am looking for something that is plug and play.
4. Does anyone have advice on the contig sizes that are 'normally expected' for this kind of assembly. In other words, if I get about 1700 contigs that are longer than 100 bp with a few that are 50-70 kb, is this considered good, or do I have a long way to go for optimization?
Many thanks
Comment