SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract sequence from multi fasta file with PERL andreitudor Bioinformatics 27 07-07-2019 08:45 AM
How to extract multi-mapped reads by samtools? mavishou RNA Sequencing 5 12-05-2016 06:27 AM
glimmer compile error anyone1985 Bioinformatics 26 11-29-2012 01:54 PM
Cuffdiff multi-protein vs multi-promoter RockChalkJayhawk Bioinformatics 2 03-26-2010 11:26 AM
How to use Glimmer to predict orf from Solexa contigs anyone1985 Bioinformatics 2 09-07-2009 08:28 PM

Reply
 
Thread Tools
Old 01-22-2010, 06:22 AM   #1
sbberes
Member
 
Location: Houston TX

Join Date: Jan 2009
Posts: 22
Default Help with glimmer multi-extract

I am having difficulty with the multi-extract application that comes with glimmer3. I have used glimmer to predict CDSs in a couple thousand contigs coming from a de novo assembly of illumina sequences from a bacterial genome. I now have coordinates for all of the predicted CDSs in the contigs but when I run multi-extract to extract from the fasta file of contigs the predicted CDSs sequences there are errors. That is when translating the nt seqs to amino acids it is clear that not all of the extracted nt seqs are stemming from open reading frames. I have checked the CDS coordinates and they are correct it is the extraction process not the prediction that is not working. Some of the regions extracted are not what they are supposed to be and some are correct. It appears that the extractions that are in error are because some of the contigs are being treated as circular DNA this is despite a -l linear sequence parameter being specified. Does anybody have an insight as to the problem, a fix, or a suggestion for an alternative extraction tool to use.
SBB
sbberes is offline   Reply With Quote
Old 01-22-2010, 08:44 AM   #2
sbberes
Member
 
Location: Houston TX

Join Date: Jan 2009
Posts: 22
Default

Got a solution, I needed to add a -w parameter to tell multi-extract not to WRAP around the ends of contigs when extracting CDS sequence.
SBB
sbberes is offline   Reply With Quote
Old 03-19-2010, 02:35 PM   #3
hsusa
Junior Member
 
Location: NY

Join Date: Mar 2010
Posts: 1
Default glimmer3 with multi-fasta files

I am wondering how to run glimmer3 to get coordinates for all ORFs in a multi-FASTA file of contigs.
Should I edit g3-iterated.csh for such multiple-sequence input files?

I had errors when typing 'g3-iterated.csh genom.seq run3' as shown in the documentation (http://www.cbcb.umd.edu/software/gli...im302notes.pdf).
Running 'g3-iterated.csh 454AllContigs.fna run3' printed Standard Error (STDERR) that 'Error allocating memory'.
Running 'g3-iterated.csh 454Scaffolds.fna run3' printed Standard Error (STDERR) that 'Motif length is greater then input sequence orf00685'.
Both runnings printed Standard Out (STDOUT) that 'Segmentation fault' and 'Failed to create PWM'.
where byte count for each file is ca. 5M,
454AllContigs.fna is a FASTA file of all the consensus basecalled contigs longer than 100 bases,
454Scaffolds.fna is a FASTA file of the concatenated contig sequences that were scaffolded as a result of Paired End analysis. The contigs are separated by a number of ‘N’ corresponding to the estimated size of the gap between them (but with a minimum of 20 N’s to ensure the separation of the contigs)
(http://xyala.cap.ed.ac.uk/Gene_Pool/...ls_Oct2009.pdf).
hsusa is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:52 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO