Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TrainGlimmerHMM error 28 Incorrect order for exon coordinates

    I try to use trainGlimmerHMM to train my data, I got this error : ERROR 28: Incorrect order for exon coordinates: gene scaffold104. Bad line: scaffold104 20180913 20180770

    So I checked this genes coordinates (list below), and couldn't find any wrong:
    scaffold104 20174822 20173995
    scaffold104 20180913 20180770
    scaffold104 20183506 20183397
    scaffold104 20184035 20183599
    scaffold104 20188275 20188173
    scaffold104 20190983 20190835
    scaffold104 20191959 20191897
    scaffold104 20194303 20194184
    scaffold104 20197104 20196972
    scaffold104 20198118 20198003
    scaffold104 20217496 20217292
    scaffold104 20221755 20221627
    scaffold104 20228524 20228383
    scaffold104 20238530 20238461
    scaffold104 20239760 20239596
    scaffold104 20243877 20243754
    scaffold104 20249167 20249095
    scaffold104 20249658 20249551
    scaffold104 20260983 20260867
    scaffold104 20261895 20261756
    scaffold104 20267526 20267477
    scaffold104 20272089 20272008
    scaffold104 20289554 20289440
    scaffold104 20291318 20291164
    scaffold104 20298180 20298032
    scaffold104 20305261 20305165
    scaffold104 20308792 20308673
    scaffold104 20317857 20317717
    scaffold104 20324247 20324165
    scaffold104 20326271 20326145
    scaffold104 20327459 20327360
    scaffold104 20327666 20327577
    scaffold104 20335394 20335188
    scaffold104 20341344 20341216
    scaffold104 20404252 20404189

    Could someone please tell me what is wrong with it and how to fix it?

    Thank you very much!

  • #2
    Are you sure that you have a single gene that spans >300Kb? Otherwise you have to include newlines in between genes.

    This is directly from the glimmer manual http://ccb.jhu.edu/software/glimmerh...html#training?:
    III.3 Training GlimmerHMM
    To train GlimmerM you should run trainGlimmerM with the following command:

    trainGlimmerHMM <mfasta_file> <exon_file> [optional_parameters]

    <mfasta_file> and <exon_file> are the multi-FASTA file and the file containing the exon coordinates of the known genes, respectively.
    <mfasta_file> is a multifasta file containing the sequences for training with the usual format:

    >seq1
    AGTCGTCGCTAGCTAGCTAGCATCGAGTCTTTTCGATCGAGGACTAGACTT
    CTAGCTAGCTAGCATAGCATACGAGCATATCGGTCATGAGACTGATTGGGC
    >seq2
    TTTAGCTAGCTAGCATAGCATACGAGCATATCGGTAGACTGATTGGGTTTA
    TGCGTTA

    <exon_file> is a file with the exon coordinates relative to the sequences contained in the <mfasta_file>; different genes are separated by a blank line; I am assuming a format like below:

    seq1 5 15
    seq1 20 34

    seq1 50 48
    seq1 45 36

    seq2 17 20

    In this example seq1 has two genes: one on the direct strand and another one on the complementary strand. Here you can find a real example of fasta and exon files.

    Comment


    • #3
      Originally posted by holmrenser View Post
      Are you sure that you have a single gene that spans >300Kb? Otherwise you have to include newlines in between genes.

      This is directly from the glimmer manual http://ccb.jhu.edu/software/glimmerh...html#training?:
      Thank you very much for your reply.
      Yes, it's a long gene. But even if I use some genes with normal size, I still get this error. I could give you another example:

      ERROR 28: Incorrect order for exon coordinates: gene scaffold110. Bad line: scaffold110 11235373 11235248

      The gene coordinate is below:
      scaffold110 11235103 11232923
      scaffold110 11235373 11235248
      scaffold110 11238367 11238215
      scaffold110 11241399 11241185
      scaffold110 11245142 11245006
      scaffold110 11247079 11246901
      scaffold110 11248130 11248013
      scaffold110 11251714 11251604
      scaffold110 11252262 11252142
      scaffold110 11254761 11254594
      scaffold110 11256826 11256760

      Thank you

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        Yesterday, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      58 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      45 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Working...
      X