Seqanswers Leaderboard Ad

**ddgenome** · 02-03-2012, 07:06 AM

I don't recall the exact error we were seeing with quake but, like yours, it had to do with jellyfish failures. We were able to avoid it by supplying the --no_jelly command-line option (causing quake to use its own k-mer counter).

**zhongj** · 02-06-2012, 01:28 AM

My error is：
terminate called after throwing an instance of 'mapped_file::ErrorMMap'

what(): Can't open file 'file_list.dbm':

/bin/sh: line 1: 21058 Aborted /disk2/Quake/bin/jellyfish qdump -c file_list.dbm > file_list.txt.qcts

Optimization of distribution likelihood function to choose k-mer cutoff failed. Very likely you have set the value of k too high or not provided adequate coverage (>15x). Inspect the k-mer counts for a clear separation of the error and true k-mer distributions.

I will try “ --no_jelly ”.

My genome is very big, about 3G, maybe “ --no_jelly ” will take a long time.

**chrishah** · 02-06-2012, 04:22 AM

Hi,

Thanks for the suggestion. Tried --no_jelly and got the following:

Processing sequences...
....................................................................................................................................................................................................................................................................260205970 sequences processed, -1699182760 bp scanned
WARNING: Input had 11499462 non-DNA (ACGT) characters whose kmers were not counted
358433106 total distinct mers
358433106 mers occur at least 0 times
Optimization of distribution likelihood function to choose k-mer cutoff failed. Very likely you have set the value of k too high or not provided adequate coverage (>15x). Inspect the k-mer counts for a clear separation of the error and true k-mer distributions.
Guessing quality values are on ascii 64 scale

tried to run cov_model.py on the kmer.txt file I got from that:

-bash-3.2$ python ~/programmes/Quake/bin/cov_model.py kmers.txt
Optimization of distribution likelihood function to choose k-mer cutoff failed. Inspect the k-mer counts for a clear separation of the error and true k-mer distributions.

So, it looks to me as if I get accross the first step, when using the --no_jelly option, but encounter problems in the next step (the cov_model.py script).

The r.log file says:
.
.
Read 30000 items
Error in est.cov(cov) :
Cannot identify reliable trusted k-mer coverage peak. Please verify the integrity of your dataset (e.g. by running kmer_hist.r). If there is no clear trusted k-mer distribution, the coverage is too low to correct these reads. If there is, please e-mail [email protected] and choose the cutoff by hand in the meantime
Execution halted

As suggested I run the kmer_hist.r script and got the attached kmer_hist.pdf. Not quite sure how to read it, though..

Guess I should try to set the cutoff by hand as suggested. Any ideas?

Much obliged!
Christoph

Attached Files

kmer_hist.pdf (15.8 KB, 42 views)

**zhongj** · 02-06-2012, 04:50 AM

Would you tell me the genome size of your sample and the sequencing coverage?

**chrishah** · 02-06-2012, 09:05 AM

Hi,

The genome size is estimated to be about 100MB and I have about 250Million Illumnina reads (~70bp), so in theory I sould have very high coverage (>150x). What makes everything a bit difficult is that the reads contain probably quite a high percentage of contamination from the host organism. How many percent is difficult to guess.
From first assembly attempts I would say at least 25% of the reads are contamination. For the host contamination I would expect the coverage to be very low and I guess thats what causes problems for Quake..??

cheers,
Christoph

**zhongj** · 02-06-2012, 06:31 PM

I see. I think you should separate the real reads from the host contamination reads firstly. Do you have a reference genome?

**chrishah** · 02-07-2012, 02:10 AM

No, no reference genome. There is a draft genome of the host on the web, but we are reluctant to just fish out reads that map onto that draft because in that way we might loose conserved regions, if you see what I mean.

**Ole** · 02-23-2012, 12:55 AM

Hi Christoph!
At the coverage you have, I don't think that graph is that useful. If you change the kmer_hist.r script to show more of your data, I think you could make a better educated guess.

On line 51 in that script, change

Code:

hist(cov, breaks=seq(0,ceiling(cov.max)), xlim=c(0, round(1.7*cov.mean)), ylim=c(0, round(1.2*hc[cov.mean])), xlab="Coverage", main="k-mer counts")

to something like this:

Code:

hist(cov, breaks=seq(0,ceiling(cov.max)), xlim=c(0, round(8*cov.mean)), ylim=c(0, round(5*hc[cov.mean])), xlab="Coverage", main="k-mer counts")

or you could just set xlim and ylim to suitable values by hand. For example, xlim should be at least set to xlim=c(0, 200) since the coverage of your parasite is 150x.

When you've done this, post again or send the author of the program an email and I guess it will be easier to get some help.

You might end up with a cutoff at 20 or something.

Ole

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Who's running Quake?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News