SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
CEGMA error flobpf Bioinformatics 12 01-27-2015 06:54 AM
De novo SNP calling in absence of complete reference assembly fcr De novo discovery 15 09-21-2012 03:34 AM
How to generate de novo assembly sequence from complete genomics data? ssnowfox Bioinformatics 2 04-19-2012 10:34 PM
Mapping to SOLiD reads to draft genome saha SOLiD 1 04-16-2010 08:17 AM
Complete Genomics releases its first draft genome. Come get our data! thondeboer The Pipeline 1 02-08-2009 12:06 PM

Reply
 
Thread Tools
Old 01-06-2012, 09:05 AM   #1
chrishah
Member
 
Location: Oslo

Join Date: Jul 2011
Posts: 18
Default how complete is the draft assembly? cegma?

Hi guys,
I am doing de novo assembly and was wondering about how to assess the completeness of the assembly. I thought cegma is a nice way of getting an idea, but I cannot get it to run properly. Any Cegma users here who could help me with the error below? Do you maybe have other suggestions, or could point me to an older thread dealing with this issue..

This is what cegma says when running the following: cegma -g test.fa


********************************************************************************
** MAPPING PROTEINS TO GENOME (TBLASTN) **
********************************************************************************

RUNNING: genome_map -n genome -p 6 -o 5000 -c 2000 -t 1 ./data/kogs.fa test.fa 2>output.cegma.errors
FATAL ERROR when running genome_map 32512: ""

Much obliged!

cheers,
Christoph
chrishah is offline   Reply With Quote
Old 01-20-2012, 03:59 AM   #2
Ole
Member
 
Location: Oslo, Norway

Join Date: Oct 2011
Posts: 17
Default

Hi Christoph,
we're also looking into using Cegma as a step in a validation process of an assembly. It was not easy to get running, and I still don't have it up, but I might have some pointers.

Have you looked at the output.cegma.errors file? I usually run into problems with missing programs, in my case it's the lack of a BLAST+ installation (and I couldn't get BLAST+ to find my correct Boost library, so I am not able to install that myself).

Sincerely,
Ole
Ole is offline   Reply With Quote
Old 01-20-2012, 05:40 AM   #3
chrishah
Member
 
Location: Oslo

Join Date: Jul 2011
Posts: 18
Default

Hei Ole,

I managed to get the cegma up and running by now. It was quite a struggle and with considerable help from Keith Bradnam. Also apparently it is much easier with the 2.4 version of cegma.
The attached file shows roughly what I did.

By the way, I also requested the cegma installation on the UIO cluster, so with access to titan you can use it. you just have to load the appropriate modules..
module load hmmer/3.0 ## thats important! default is hmmer 2.3 and cegma needs v 3
module load geneid
module load blast
module load cegma

and off you go...

hope that helps!

cheers,
Christoph
Attached Files
File Type: txt CEGMA-install.txt (7.6 KB, 114 views)
chrishah is offline   Reply With Quote
Old 01-25-2012, 01:51 PM   #4
StuartG1
Junior Member
 
Location: Sydney, Australia

Join Date: Sep 2011
Posts: 2
Default

Christoph,
thanks very much for the installation guide. Last year I gave up trying to get cegma running, but using your notes had in installed and working in <1hr
thanks again,
Stuart
StuartG1 is offline   Reply With Quote
Old 01-26-2012, 12:55 AM   #5
StuartG1
Junior Member
 
Location: Sydney, Australia

Join Date: Sep 2011
Posts: 2
Default

FATAL ERROR when running genome_map 32512: ""

I also had this error, which was solved (at least in my case) by fixing the fasta headers:

>scaffold1|size2472247
changed to
>scaffold1-size2472247

Stuart
StuartG1 is offline   Reply With Quote
Old 04-25-2012, 09:17 AM   #6
bhoomi_1789
Junior Member
 
Location: India

Join Date: Apr 2012
Posts: 1
Default

i am getting this error.. pls help me out

*******************************************************************************
** MAPPING PROTEINS TO GENOME (TBLASTN) **
********************************************************************************

RUNNING: genome_map -n genome -p 6 -o 5000 -c 2000 -t 1 gms.prot vavaria.dna 2>vav.cegma.errors


Building a new DB, current time: 04/25/2012 21:23:36
New DB name: /tmp/genome4881.blastdb
New DB title: vavaria.dna
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 61 sequences in 0.0710599 seconds.
FATAL ERROR when running genome_map 65280: ""
bhoomi_1789 is offline   Reply With Quote
Old 05-07-2012, 01:15 AM   #7
bryand
Junior Member
 
Location: Goettingen, Germany

Join Date: Aug 2010
Posts: 9
Default

You can get a lot more useful information out of cegma if you look in the error log (in your case, "vav.cegma.errors").
bryand is offline   Reply With Quote
Old 05-24-2013, 08:09 AM   #8
Robson
Junior Member
 
Location: Germany

Join Date: Mar 2012
Posts: 2
Default Can't download geneid

Hey guys,

I'm also trying to get CEGMA to work, but I run into problems much earlier than you. My problem is that I can't access geneid from geneid's webpage. Seems like a server problem to me. Is there someone who may send me a zip-file with the linux64 version?
Robson is offline   Reply With Quote
Old 02-03-2014, 02:29 PM   #9
kbradnam
Member
 
Location: Davis, CA

Join Date: May 2011
Posts: 53
Default

Just to confirm StuartG1's advice that CEGMA will fail if any FASTA header contains a pipe character |

This is not the problem of CEGMA itself, but the NCBI blast+ command that makes database files from FASTA files.

Keith (co-developer of CEGMA)
kbradnam is offline   Reply With Quote
Old 02-11-2015, 03:46 PM   #10
lilicano
Junior Member
 
Location: Raleigh, NC

Join Date: Feb 2015
Posts: 4
Default

Hi all,

I followed christoph installation log file for cegma 2.4 which is very well explained.
http://korflab.ucdavis.edu/datasets/...ructions_2.txt

I followed your log file and tried to install cegma 2.5 in an ubuntu machine but got the same error of genome_map not found.

Note: my fasta headers are ok (without weird characters such "|")

copied to /etc/perl
- FAlite.pm
- Cegma.pm

installed
- wise
- wise-doc
- wise2.2.3-rc7
- geneid_v1.4.4
- blast+2.2.30
- hmmer3.0

also set the environment for
~./profile
PATH=$PATH:~/geneid/bin/./
export PATH

PATH=$PATH:~/ncbi-blast-2.2.30+/bin/./
export PATH

PATH=$PATH:~/hmmer-3.0/bin/./
export PATH

PATH=$PATH:~/CEGMA_v2.5/bin/./
export PATH

~./bashrc
export CEGMA=/home/wrparks/Applications/CEGMA_v2.5
export PERL5LIB="/usr/lib/perl/5.18:/home/wrparks/Applications/CEGMA_v2.5/lib"
export WISECONFIGDIR=/home/wrparks/Applications/wise2.2.3-rc7/wisecfg

so if a run cegma inside the bin folder as ./cegma it seems to be running ok as shown below but...
wrparks@wrparks-Precision-T7610:~/Applications/CEGMA_v2.5/bin$ ./cegma
cegma


PROGRAM:
cegma - 2.5

Core Eukaryotic Genes Mapping Approach

----
....but when i tried to run my file it fails and gives me the genome_map not found error as shown below:

wrparks@wrparks-Precision-T7610:~/Applications/CEGMA_v2.5/bin$ ./cegma --genome Ph_HDM502AA_DNAassbly_velvetk71_filt500contigs_2905.fan --output Ph_HDM502AA_DNAassbly_velvetk71_filt500contigs_2905_cegma.output


********************************************************************************
** MAPPING PROTEINS TO GENOME (TBLASTN) **
********************************************************************************

RUNNING: genome_map -n genome -p 6 -o 5000 -c 2000 -t 1 ./data/kogs.fa Ph_HDM502AA_DNAassbly_velvetk71_filt500contigs_2905.fan 2>Ph_HDM502AA_DNAassbly_velvetk71_filt500contigs_2905_cegma.output.cegma.errors
FATAL ERROR when running genome_map 512: ""


Ending CEGMA

wrparks@wrparks-Precision-T7610:~/Applications/CEGMA_v2.5/bin$ ls


any suggestions would be greatly appreciated!

Lili

Last edited by lilicano; 02-11-2015 at 03:57 PM.
lilicano is offline   Reply With Quote
Old 02-11-2015, 05:10 PM   #11
kbradnam
Member
 
Location: Davis, CA

Join Date: May 2011
Posts: 53
Default

Can you run CEGMA with the supplied sample data to check whether the problem is with your CEGMA installation or with your sequence data file?
kbradnam is offline   Reply With Quote
Old 02-12-2015, 04:19 AM   #12
whataBamBam
Member
 
Location: Italy

Join Date: May 2013
Posts: 27
Default

I think it's some missing dependencies to CEGMA. I got this error recently.

On my cluster you use source command to get some software in your path..

I did

source cegma-2.5
source blast-2.2.30

then ran my CEGMA command and got the error "FATAL ERROR when running genome_map 6400:"

Then I realised I was missing dependencies so I did:

source hmmer-3.1b1
source blast-2.2.29
source geneid-1.4.4
source wise-2.4.1
source cegma-2.5

wise-2.4.1 is GeneWise

Ran my CEGMA command with no error.

So it looks like you might be missing hmmer or geneid, or genewise or some combination of those three.

My command is

cegma --genome mygenome.fa -o cegma_out
whataBamBam is offline   Reply With Quote
Old 02-12-2015, 11:39 AM   #13
lilicano
Junior Member
 
Location: Raleigh, NC

Join Date: Feb 2015
Posts: 4
Default

Hi kbradnam,

Thanks for your suggestion. Yes I run the sample/sample.dna and sample/sample.prot without any error.

Thanks also to whataBamBam for the suggestion, with testing the sample files i noticed the sources must be ok.

Then I got another suggestion from a colleague that maybe cegma was not finding my files so what I did was to create a folder within cegma where i placed my fasta genome file like

“/Applications/CEGMAv2.5/assemblies” where i put my assembly fasta file and run it from /Applications/CEGMAv2.5/bin and then it was ok.

Thanks,

Lili
lilicano is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO