I have made an assembly.
Here are my tasks. I do not know where do go from step 1 and I do not even know how to attempt step 2 and step 3.
1. Align assembly to reference genome.
Grab coordinates of the set of sequences that aligns to the reference genome and grab coordinates of the set of sequences that DO NOT align to the reference genome.
I used to MUMmer for this and got a .coords file
2. Take sequences that did not align and map them against a given plasmids database. Differentiate between nuclear genome and plasmids. Then take what's left over and map against a virulent gene database to see what the virulent genes are.
I was told to use BLAST for this but I have no idea what to do.
If there are still unaligned sequences left over, then I have to use a new reference to align remaining a sequences.
3. Gene annotation, obtain gene locations
4. SNP calling
Edit:
I have Step 4 down.
Here are my tasks. I do not know where do go from step 1 and I do not even know how to attempt step 2 and step 3.
1. Align assembly to reference genome.
Grab coordinates of the set of sequences that aligns to the reference genome and grab coordinates of the set of sequences that DO NOT align to the reference genome.
I used to MUMmer for this and got a .coords file
2. Take sequences that did not align and map them against a given plasmids database. Differentiate between nuclear genome and plasmids. Then take what's left over and map against a virulent gene database to see what the virulent genes are.
I was told to use BLAST for this but I have no idea what to do.
If there are still unaligned sequences left over, then I have to use a new reference to align remaining a sequences.
3. Gene annotation, obtain gene locations
4. SNP calling
Edit:
I have Step 4 down.
Comment