Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how complete is the draft assembly? cegma?

    Hi guys,
    I am doing de novo assembly and was wondering about how to assess the completeness of the assembly. I thought cegma is a nice way of getting an idea, but I cannot get it to run properly. Any Cegma users here who could help me with the error below? Do you maybe have other suggestions, or could point me to an older thread dealing with this issue..

    This is what cegma says when running the following: cegma -g test.fa


    ********************************************************************************
    ** MAPPING PROTEINS TO GENOME (TBLASTN) **
    ********************************************************************************

    RUNNING: genome_map -n genome -p 6 -o 5000 -c 2000 -t 1 ./data/kogs.fa test.fa 2>output.cegma.errors
    FATAL ERROR when running genome_map 32512: ""

    Much obliged!

    cheers,
    Christoph

  • #2
    Hi Christoph,
    we're also looking into using Cegma as a step in a validation process of an assembly. It was not easy to get running, and I still don't have it up, but I might have some pointers.

    Have you looked at the output.cegma.errors file? I usually run into problems with missing programs, in my case it's the lack of a BLAST+ installation (and I couldn't get BLAST+ to find my correct Boost library, so I am not able to install that myself).

    Sincerely,
    Ole

    Comment


    • #3
      Hei Ole,

      I managed to get the cegma up and running by now. It was quite a struggle and with considerable help from Keith Bradnam. Also apparently it is much easier with the 2.4 version of cegma.
      The attached file shows roughly what I did.

      By the way, I also requested the cegma installation on the UIO cluster, so with access to titan you can use it. you just have to load the appropriate modules..
      module load hmmer/3.0 ## thats important! default is hmmer 2.3 and cegma needs v 3
      module load geneid
      module load blast
      module load cegma

      and off you go...

      hope that helps!

      cheers,
      Christoph
      Attached Files

      Comment


      • #4
        Christoph,
        thanks very much for the installation guide. Last year I gave up trying to get cegma running, but using your notes had in installed and working in <1hr
        thanks again,
        Stuart

        Comment


        • #5
          FATAL ERROR when running genome_map 32512: ""

          I also had this error, which was solved (at least in my case) by fixing the fasta headers:

          >scaffold1|size2472247
          changed to
          >scaffold1-size2472247

          Stuart

          Comment


          • #6
            i am getting this error.. pls help me out

            *******************************************************************************
            ** MAPPING PROTEINS TO GENOME (TBLASTN) **
            ********************************************************************************

            RUNNING: genome_map -n genome -p 6 -o 5000 -c 2000 -t 1 gms.prot vavaria.dna 2>vav.cegma.errors


            Building a new DB, current time: 04/25/2012 21:23:36
            New DB name: /tmp/genome4881.blastdb
            New DB title: vavaria.dna
            Sequence type: Nucleotide
            Keep Linkouts: T
            Keep MBits: T
            Maximum file size: 1073741824B
            Adding sequences from FASTA; added 61 sequences in 0.0710599 seconds.
            FATAL ERROR when running genome_map 65280: ""

            Comment


            • #7
              You can get a lot more useful information out of cegma if you look in the error log (in your case, "vav.cegma.errors").

              Comment


              • #8
                Can't download geneid

                Hey guys,

                I'm also trying to get CEGMA to work, but I run into problems much earlier than you. My problem is that I can't access geneid from geneid's webpage. Seems like a server problem to me. Is there someone who may send me a zip-file with the linux64 version?

                Comment


                • #9
                  Just to confirm StuartG1's advice that CEGMA will fail if any FASTA header contains a pipe character |

                  This is not the problem of CEGMA itself, but the NCBI blast+ command that makes database files from FASTA files.

                  Keith (co-developer of CEGMA)

                  Comment


                  • #10
                    Hi all,

                    I followed christoph installation log file for cegma 2.4 which is very well explained.


                    I followed your log file and tried to install cegma 2.5 in an ubuntu machine but got the same error of genome_map not found.

                    Note: my fasta headers are ok (without weird characters such "|")

                    copied to /etc/perl
                    - FAlite.pm
                    - Cegma.pm

                    installed
                    - wise
                    - wise-doc
                    - wise2.2.3-rc7
                    - geneid_v1.4.4
                    - blast+2.2.30
                    - hmmer3.0

                    also set the environment for
                    ~./profile
                    PATH=$PATH:~/geneid/bin/./
                    export PATH

                    PATH=$PATH:~/ncbi-blast-2.2.30+/bin/./
                    export PATH

                    PATH=$PATH:~/hmmer-3.0/bin/./
                    export PATH

                    PATH=$PATH:~/CEGMA_v2.5/bin/./
                    export PATH

                    ~./bashrc
                    export CEGMA=/home/wrparks/Applications/CEGMA_v2.5
                    export PERL5LIB="/usr/lib/perl/5.18:/home/wrparks/Applications/CEGMA_v2.5/lib"
                    export WISECONFIGDIR=/home/wrparks/Applications/wise2.2.3-rc7/wisecfg

                    so if a run cegma inside the bin folder as ./cegma it seems to be running ok as shown below but...
                    wrparks@wrparks-Precision-T7610:~/Applications/CEGMA_v2.5/bin$ ./cegma
                    cegma


                    PROGRAM:
                    cegma - 2.5

                    Core Eukaryotic Genes Mapping Approach

                    ----
                    ....but when i tried to run my file it fails and gives me the genome_map not found error as shown below:

                    wrparks@wrparks-Precision-T7610:~/Applications/CEGMA_v2.5/bin$ ./cegma --genome Ph_HDM502AA_DNAassbly_velvetk71_filt500contigs_2905.fan --output Ph_HDM502AA_DNAassbly_velvetk71_filt500contigs_2905_cegma.output


                    ********************************************************************************
                    ** MAPPING PROTEINS TO GENOME (TBLASTN) **
                    ********************************************************************************

                    RUNNING: genome_map -n genome -p 6 -o 5000 -c 2000 -t 1 ./data/kogs.fa Ph_HDM502AA_DNAassbly_velvetk71_filt500contigs_2905.fan 2>Ph_HDM502AA_DNAassbly_velvetk71_filt500contigs_2905_cegma.output.cegma.errors
                    FATAL ERROR when running genome_map 512: ""


                    Ending CEGMA

                    wrparks@wrparks-Precision-T7610:~/Applications/CEGMA_v2.5/bin$ ls


                    any suggestions would be greatly appreciated!

                    Lili
                    Last edited by lilicano; 02-11-2015, 03:57 PM.

                    Comment


                    • #11
                      Can you run CEGMA with the supplied sample data to check whether the problem is with your CEGMA installation or with your sequence data file?

                      Comment


                      • #12
                        I think it's some missing dependencies to CEGMA. I got this error recently.

                        On my cluster you use source command to get some software in your path..

                        I did

                        source cegma-2.5
                        source blast-2.2.30

                        then ran my CEGMA command and got the error "FATAL ERROR when running genome_map 6400:"

                        Then I realised I was missing dependencies so I did:

                        source hmmer-3.1b1
                        source blast-2.2.29
                        source geneid-1.4.4
                        source wise-2.4.1
                        source cegma-2.5

                        wise-2.4.1 is GeneWise

                        Ran my CEGMA command with no error.

                        So it looks like you might be missing hmmer or geneid, or genewise or some combination of those three.

                        My command is

                        cegma --genome mygenome.fa -o cegma_out

                        Comment


                        • #13
                          Hi kbradnam,

                          Thanks for your suggestion. Yes I run the sample/sample.dna and sample/sample.prot without any error.

                          Thanks also to whataBamBam for the suggestion, with testing the sample files i noticed the sources must be ok.

                          Then I got another suggestion from a colleague that maybe cegma was not finding my files so what I did was to create a folder within cegma where i placed my fasta genome file like

                          “/Applications/CEGMAv2.5/assemblies” where i put my assembly fasta file and run it from /Applications/CEGMAv2.5/bin and then it was ok.

                          Thanks,

                          Lili

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          30 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          32 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          28 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X