I am having difficulty with the multi-extract application that comes with glimmer3. I have used glimmer to predict CDSs in a couple thousand contigs coming from a de novo assembly of illumina sequences from a bacterial genome. I now have coordinates for all of the predicted CDSs in the contigs but when I run multi-extract to extract from the fasta file of contigs the predicted CDSs sequences there are errors. That is when translating the nt seqs to amino acids it is clear that not all of the extracted nt seqs are stemming from open reading frames. I have checked the CDS coordinates and they are correct it is the extraction process not the prediction that is not working. Some of the regions extracted are not what they are supposed to be and some are correct. It appears that the extractions that are in error are because some of the contigs are being treated as circular DNA this is despite a -l linear sequence parameter being specified. Does anybody have an insight as to the problem, a fix, or a suggestion for an alternative extraction tool to use.
SBB
SBB
Comment