Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CEGMA - FATAL ERROR when running local map 6400

    Hello Everyone,

    I am trying to get the CEGMA score for our new transcriptome assembly. But when I tried to run cegma on my assemled transcripts, I am getting following error:-

    "RUNNING: local_map -n local -f -h /sw/opt/CEGMA_v2.5/data/hmm_profiles -i KOG genome.chunks.fa 2>output.cegma.errors
    FATAL ERROR when running local map 6400: "No such file or directory"

    AND

    "genewise: error while loading shared libraries: libglib-1.2.so.0: cannot open shared object file: No such file or directory
    Can't run genewise -splice_gtag -quiet -gff -pretty -alb -hmmer /sw/opt/CEGMA_v2.5/data/hmm_profiles/KOG0002.hmm genomic12139.fa >genewise12139 "

    I am running after setting every dependency in the path. libglib-1.2.so.0 is also in the path,even I tried running with the libglib-2.0.so.0 [as mentioned in https://gist.github.com/robsyme/1153173]. Here's the commands which I am using for setting path:-

    source .bashrc
    export PATH=$PATH:/sw/opt/geneid/bin/
    export PATH=$PATH:/sw/opt/blast+/bin/
    export CEGMA=/sw/opt/CEGMA_v2.5
    export PERL5LIB=/sw/opt/CEGMA_v2.5/lib/
    export PATH=$PATH:/sw/opt/wise2.4.1/src/bin/
    export WISECONFIGDIR=/sw/opt/wise2.4.1/wisecfg
    export LD_LIBRARY_PATH=/usr/lib64:${LD_LIBRARY_PATH}
    export PATH=$PATH:/sw/opt/CEGMA_v2.5/bin/

    I tried so many times after making little changes, but still no luck. I wonder, if anyone come across with this kind of problem? Any suggestion would be very helpful.

    Many Thanks,
    Reema Singh

  • #2
    I made an amazon machine image for CEGMA - search CEGMA on AWS and you should find it. It is ami-18935a70. It's preconfigured, just upload your data and run.

    Comment


    • #3
      Genewise is usually the problem step in most errant runs of CEGMA. As well as the Amazon instance that peromhc mentioned there are many other ways of running CEGMA where you don't have to install it yourself (including a VM). Check the CEGMA FAQ:

      Comment


      • #4
        Hello peromhc and kbradnam,

        Thank you very much for your reply. Sorry for getting back late- As I first want to run cegma on my assembly before posting the reply. Here's the update:-

        1) CEGMA works fine now on our cluster. As the problem was because of a missing library on the execution host and that has been updated by our IT people.

        2) But I would like to ask one more question:- What is the best Cegma score? I was looking at the http://korflab.ucdavis.edu/Datasets/...faq.html#link4 , but couldn't relate it with our results. our assembly contains:-

        a) Assembly 1 = 202(complete) and 229(partial)
        b) Assembly 2 = 226(complete) and 239(partial)

        Do we have good score?

        Any explanation/suggestion would be very helpful.

        Thanks,
        Reema,

        Comment


        • #5
          202 is better than 201 but not as good as 203. It's all relative. CEGMA is most useful in this regard only if you have made multiple assemblies from the same input data. This allows you to assess the relative performance of different assemblers and/or assembly parameters.

          I have previously reported on variation in many different runs of CEGMA: http://figshare.com/articles/Variati...etrics/1011961

          Comment


          • #6
            Hello kbradnam

            Thanks for sharing the link. But i have one more quick question.

            partial score is higher than complete score from both assembly(generated from two different samples). As far as i understand from http://korflab.ucdavis.edu/Datasets/cegma/, the number of partial set would be higher as it also include the complete set. So my question is :- If partial score is higher than complete score than is this indicates that assembly is fragmented?
            Also should partial score lower than complete score in ideal situation?

            Thanks,
            Reema,
            Last edited by reema; 09-12-2014, 06:52 AM.

            Comment


            • #7
              Originally posted by reema View Post
              If partial score is higher than complete score than is this indicates that assembly is fragmented?
              Also should partial score lower than complete score in ideal situation?
              Remember, these are not scores per se. 'Complete' and 'partial' refer to the number of full-length, or full-length *and* partial length core genes detected by the CEGMA pipeline.

              Our ideal (fantasy) result — for the purpose of qualifying the completeness of the gene space — is to have 248 complete proteins present. This would also give a partial figure of 248 as this category is really a superset of complete + partial.

              Note that even if CEGMA says something is 'complete' there is still the possibility that parts of the protein is missing. You have to decide on some artificial cut-off as expecting 100% of the sequence to be present is a) unrealistic and b) not possible because you may not know what 100% means in a newly sequenced species (e.g. in that species there may have been a 3 bp insertion leading to 1 extra amino acid).


              So from CEGMA's point of view, 'complete' means about 70% present (I say 'about' because this is based on alignments to 6 different profile HMMs, which may each vary in length).

              What if you don't have 248 core genes 'completely' present. Well the next thing is to look at the partial results, how close to 248 are they? If you have 200 (complete) and 240 (complete + partial) then this at least suggests that most of the core gene set is present in your assembly, but some may be split across contigs or missing from the assembly. Remember, CEGMA only looks for genes that are inside individual contigs or scaffolds. You could have an assembly that splits every gene across contigs which might lead to a 'complete' result of zero, and a partial result of '248'.

              From looking at results of many different runs of CEGMA, it is common to see something like 90–95% of core gene present in the 'complete' category, and another 1–5% present as partial genes.

              On their own, the 'complete' and 'partial' figures are not that useful. But when you compare results from multiple genome assemblies (all using the same input data), then you might be able to say something about the differences.

              Update: just looking through some old CEGMA results, I also found one case where the results were 157/223. This is more unusual, suggesting that a relatively large number (27%) of the core genes were present as fragments. This might simply reflect lots of short contigs/scaffolds in the assembly. In contrast to this, one of the best results that I have seen is 245/248. It is rare to see all core genes present, even when you allow for partial matches.
              Last edited by kbradnam; 09-12-2014, 03:32 PM.

              Comment


              • #8
                Thanks kbradnam for explaining this so clearly and nicely. I understand now

                Many Thanks,
                Reema Singh

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                69 views
                0 likes
                Last Post seqadmin  
                Working...
                X