Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with glimmer multi-extract

    I am having difficulty with the multi-extract application that comes with glimmer3. I have used glimmer to predict CDSs in a couple thousand contigs coming from a de novo assembly of illumina sequences from a bacterial genome. I now have coordinates for all of the predicted CDSs in the contigs but when I run multi-extract to extract from the fasta file of contigs the predicted CDSs sequences there are errors. That is when translating the nt seqs to amino acids it is clear that not all of the extracted nt seqs are stemming from open reading frames. I have checked the CDS coordinates and they are correct it is the extraction process not the prediction that is not working. Some of the regions extracted are not what they are supposed to be and some are correct. It appears that the extractions that are in error are because some of the contigs are being treated as circular DNA this is despite a -l linear sequence parameter being specified. Does anybody have an insight as to the problem, a fix, or a suggestion for an alternative extraction tool to use.
    SBB

  • #2
    Got a solution, I needed to add a -w parameter to tell multi-extract not to WRAP around the ends of contigs when extracting CDS sequence.
    SBB

    Comment


    • #3
      glimmer3 with multi-fasta files

      I am wondering how to run glimmer3 to get coordinates for all ORFs in a multi-FASTA file of contigs.
      Should I edit g3-iterated.csh for such multiple-sequence input files?

      I had errors when typing 'g3-iterated.csh genom.seq run3' as shown in the documentation (http://www.cbcb.umd.edu/software/gli...im302notes.pdf).
      Running 'g3-iterated.csh 454AllContigs.fna run3' printed Standard Error (STDERR) that 'Error allocating memory'.
      Running 'g3-iterated.csh 454Scaffolds.fna run3' printed Standard Error (STDERR) that 'Motif length is greater then input sequence orf00685'.
      Both runnings printed Standard Out (STDOUT) that 'Segmentation fault' and 'Failed to create PWM'.
      where byte count for each file is ca. 5M,
      454AllContigs.fna is a FASTA file of all the consensus basecalled contigs longer than 100 bases,
      454Scaffolds.fna is a FASTA file of the concatenated contig sequences that were scaffolded as a result of Paired End analysis. The contigs are separated by a number of ‘N’ corresponding to the estimated size of the gap between them (but with a minimum of 20 N’s to ensure the separation of the contigs)
      (http://xyala.cap.ed.ac.uk/Gene_Pool/...ls_Oct2009.pdf).

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        Yesterday, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      45 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Working...
      X