![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extract sequence from multi fasta file with PERL | andreitudor | Bioinformatics | 27 | 07-07-2019 08:45 AM |
How to extract multi-mapped reads by samtools? | mavishou | RNA Sequencing | 5 | 12-05-2016 06:27 AM |
glimmer compile error | anyone1985 | Bioinformatics | 26 | 11-29-2012 01:54 PM |
Cuffdiff multi-protein vs multi-promoter | RockChalkJayhawk | Bioinformatics | 2 | 03-26-2010 11:26 AM |
How to use Glimmer to predict orf from Solexa contigs | anyone1985 | Bioinformatics | 2 | 09-07-2009 08:28 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Houston TX Join Date: Jan 2009
Posts: 22
|
![]()
I am having difficulty with the multi-extract application that comes with glimmer3. I have used glimmer to predict CDSs in a couple thousand contigs coming from a de novo assembly of illumina sequences from a bacterial genome. I now have coordinates for all of the predicted CDSs in the contigs but when I run multi-extract to extract from the fasta file of contigs the predicted CDSs sequences there are errors. That is when translating the nt seqs to amino acids it is clear that not all of the extracted nt seqs are stemming from open reading frames. I have checked the CDS coordinates and they are correct it is the extraction process not the prediction that is not working. Some of the regions extracted are not what they are supposed to be and some are correct. It appears that the extractions that are in error are because some of the contigs are being treated as circular DNA this is despite a -l linear sequence parameter being specified. Does anybody have an insight as to the problem, a fix, or a suggestion for an alternative extraction tool to use.
SBB |
![]() |
![]() |
![]() |
#2 |
Member
Location: Houston TX Join Date: Jan 2009
Posts: 22
|
![]()
Got a solution, I needed to add a -w parameter to tell multi-extract not to WRAP around the ends of contigs when extracting CDS sequence.
SBB |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: NY Join Date: Mar 2010
Posts: 1
|
![]()
I am wondering how to run glimmer3 to get coordinates for all ORFs in a multi-FASTA file of contigs.
Should I edit g3-iterated.csh for such multiple-sequence input files? I had errors when typing 'g3-iterated.csh genom.seq run3' as shown in the documentation (http://www.cbcb.umd.edu/software/gli...im302notes.pdf). Running 'g3-iterated.csh 454AllContigs.fna run3' printed Standard Error (STDERR) that 'Error allocating memory'. Running 'g3-iterated.csh 454Scaffolds.fna run3' printed Standard Error (STDERR) that 'Motif length is greater then input sequence orf00685'. Both runnings printed Standard Out (STDOUT) that 'Segmentation fault' and 'Failed to create PWM'. where byte count for each file is ca. 5M, 454AllContigs.fna is a FASTA file of all the consensus basecalled contigs longer than 100 bases, 454Scaffolds.fna is a FASTA file of the concatenated contig sequences that were scaffolded as a result of Paired End analysis. The contigs are separated by a number of ‘N’ corresponding to the estimated size of the gap between them (but with a minimum of 20 N’s to ensure the separation of the contigs) (http://xyala.cap.ed.ac.uk/Gene_Pool/...ls_Oct2009.pdf). |
![]() |
![]() |
![]() |
Thread Tools | |
|
|