Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TrainGlimmerHMM error 28 Incorrect order for exon coordinates

    I try to use trainGlimmerHMM to train my data, I got this error : ERROR 28: Incorrect order for exon coordinates: gene scaffold104. Bad line: scaffold104 20180913 20180770

    So I checked this genes coordinates (list below), and couldn't find any wrong:
    scaffold104 20174822 20173995
    scaffold104 20180913 20180770
    scaffold104 20183506 20183397
    scaffold104 20184035 20183599
    scaffold104 20188275 20188173
    scaffold104 20190983 20190835
    scaffold104 20191959 20191897
    scaffold104 20194303 20194184
    scaffold104 20197104 20196972
    scaffold104 20198118 20198003
    scaffold104 20217496 20217292
    scaffold104 20221755 20221627
    scaffold104 20228524 20228383
    scaffold104 20238530 20238461
    scaffold104 20239760 20239596
    scaffold104 20243877 20243754
    scaffold104 20249167 20249095
    scaffold104 20249658 20249551
    scaffold104 20260983 20260867
    scaffold104 20261895 20261756
    scaffold104 20267526 20267477
    scaffold104 20272089 20272008
    scaffold104 20289554 20289440
    scaffold104 20291318 20291164
    scaffold104 20298180 20298032
    scaffold104 20305261 20305165
    scaffold104 20308792 20308673
    scaffold104 20317857 20317717
    scaffold104 20324247 20324165
    scaffold104 20326271 20326145
    scaffold104 20327459 20327360
    scaffold104 20327666 20327577
    scaffold104 20335394 20335188
    scaffold104 20341344 20341216
    scaffold104 20404252 20404189

    Could someone please tell me what is wrong with it and how to fix it?

    Thank you very much!

  • #2
    Are you sure that you have a single gene that spans >300Kb? Otherwise you have to include newlines in between genes.

    This is directly from the glimmer manual http://ccb.jhu.edu/software/glimmerh...html#training?:
    III.3 Training GlimmerHMM
    To train GlimmerM you should run trainGlimmerM with the following command:

    trainGlimmerHMM <mfasta_file> <exon_file> [optional_parameters]

    <mfasta_file> and <exon_file> are the multi-FASTA file and the file containing the exon coordinates of the known genes, respectively.
    <mfasta_file> is a multifasta file containing the sequences for training with the usual format:

    >seq1
    AGTCGTCGCTAGCTAGCTAGCATCGAGTCTTTTCGATCGAGGACTAGACTT
    CTAGCTAGCTAGCATAGCATACGAGCATATCGGTCATGAGACTGATTGGGC
    >seq2
    TTTAGCTAGCTAGCATAGCATACGAGCATATCGGTAGACTGATTGGGTTTA
    TGCGTTA

    <exon_file> is a file with the exon coordinates relative to the sequences contained in the <mfasta_file>; different genes are separated by a blank line; I am assuming a format like below:

    seq1 5 15
    seq1 20 34

    seq1 50 48
    seq1 45 36

    seq2 17 20

    In this example seq1 has two genes: one on the direct strand and another one on the complementary strand. Here you can find a real example of fasta and exon files.

    Comment


    • #3
      Originally posted by holmrenser View Post
      Are you sure that you have a single gene that spans >300Kb? Otherwise you have to include newlines in between genes.

      This is directly from the glimmer manual http://ccb.jhu.edu/software/glimmerh...html#training?:
      Thank you very much for your reply.
      Yes, it's a long gene. But even if I use some genes with normal size, I still get this error. I could give you another example:

      ERROR 28: Incorrect order for exon coordinates: gene scaffold110. Bad line: scaffold110 11235373 11235248

      The gene coordinate is below:
      scaffold110 11235103 11232923
      scaffold110 11235373 11235248
      scaffold110 11238367 11238215
      scaffold110 11241399 11241185
      scaffold110 11245142 11245006
      scaffold110 11247079 11246901
      scaffold110 11248130 11248013
      scaffold110 11251714 11251604
      scaffold110 11252262 11252142
      scaffold110 11254761 11254594
      scaffold110 11256826 11256760

      Thank you

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      48 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      34 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Working...
      X