SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
High amounts of Kmer at the 3' end after Trimming Zapages RNA Sequencing 15 04-13-2014 12:33 PM
FastQC,kmer content, per base sequence content: is this good enough mgg Bioinformatics 10 11-06-2013 10:45 PM
Duplicate Sequence/High Overall Kmer Content lala2013 Bioinformatics 4 10-15-2013 02:01 PM
What is good coverage on an RNA-seq experiment LP_SEP23 RNA Sequencing 36 08-19-2013 12:50 AM
kmer coverage in Trinity Kiroro Bioinformatics 0 09-11-2011 07:24 PM

Reply
 
Thread Tools
Old 05-20-2014, 05:05 AM   #1
konika
Member
 
Location: Norway

Join Date: Sep 2010
Posts: 14
Question Is high Kmer coverage in PE reads, good?

Hi
I am doing de novo assembly of a bacterial genome, through paired end reads using velvet.
If I use Kmer length 75, in the Kmer coverage formula along with other parameters I get:

Ck = C * (L - k + 1) / L
Ck=(2935775*250*2/2900000) *(250-75+1)/250
=356

where length of reads in 250
Number of reads:2935775
expected genome size is 2900000

Velvet manual says : If Ck is above 20, you might be "wasting" coverage. What does it mean and what Kmer should I choose?

Last edited by konika; 05-21-2014 at 04:10 AM.
konika is offline   Reply With Quote
Old 05-20-2014, 09:03 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Typically, the longer the kmer, the better the assembly, until you hit the point of too little coverage. Since you have nice long reads and fairly high coverage, you will probably get a better assembly with longer kmers, maybe K=127 or even higher. The assembly should be fast, so just try with a range of kmers and look at the L50 to see which appears to be best.

However, I find that Velvet assemblies often get worse when the coverage is really high, so you may want to reduce it with normalization or subsampling.
Brian Bushnell is offline   Reply With Quote
Old 05-21-2014, 03:00 AM   #3
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Your genome coverage is pretty high, so I would recommend subsampling the data and then try VelvetOptimiser to find the best assembly parameters. The benefit of subsampling is usually a better assembly, but it also means the job will have lower computational requirements and faster run times.
SES is offline   Reply With Quote
Old 05-26-2014, 01:21 AM   #4
konika
Member
 
Location: Norway

Join Date: Sep 2010
Posts: 14
Default

Hi, I have tried using subsample (40% of original number of reads) and run VelvetOptimiser on it, but get the result
Velvet details:
Velvet version: 1.2.08
Compiled categories: 10
Compiled max kmer length: 191
Maximum number of velvetinstances to run: 1
Will run velvet optimiser with the following paramters:
Velveth parameter string:
-shortPaired -fastq file1.fastq -shortPaired2 -fastq file2.fastq
Velveth start hash values: 151
Velveth end hash value: 153
Velveth hash step value: 2
Velvetg minimum coverage cutoff to use: 0

Read tracking for final assembly off.
File: file1.fastq has 1174310 reads of length 250
File: file2.fastq has 1174310 reads of length 250
Total reads: 2.3 million. Avg length: 250.0

Memory use estimated to be: 230512.9GB for 1 threads.

You probably won't have enough memory to run this job.
Try decreasing the maximum number of threads used.
(use the -t option to set max threads.)

Any ideas, what should be done for this..
Thanks
konika is offline   Reply With Quote
Old 05-26-2014, 01:25 AM   #5
konika
Member
 
Location: Norway

Join Date: Sep 2010
Posts: 14
Default

Apart from this, I run velvetg on the subsample for Kmer hash 75-151, and kmercov 320-370 for each kmer length. I then compare all results for n50. Below is the result:
file n50 total.length longest ncontig
154 stats_h151_cov350.txt 20880 33577 20880 6
155 stats_h151_cov360.txt 20880 33577 20880 6
156 stats_h151_cov370.txt 20880 33577 20880 6
151 stats_h151_cov320.txt 20880 33538 20880 6
152 stats_h151_cov330.txt 20880 33518 20880 6
153 stats_h151_cov340.txt 20880 33518 20880 6
148 stats_h149_cov350.txt 20878 33569 20878 6
149 stats_h149_cov360.txt 20878 33569 20878 6
150 stats_h149_cov370.txt 20878 33569 20878 6
146 stats_h149_cov330.txt 20878 33533 20878 6
147 stats_h149_cov340.txt 20878 33508 20878 6
145 stats_h149_cov320.txt 20878 33468 20878 6


Looks like its in decreasing order of Kmer length for highest n50, and very small total length, something wrong with it?

Last edited by konika; 05-26-2014 at 02:42 AM. Reason: more to ask
konika is offline   Reply With Quote
Old 05-26-2014, 03:23 AM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Quote:
Originally Posted by konika View Post
Velvet details:
Velvet version: 1.2.08
Compiled categories: 10
Try recompiling velvet with 'CATEGORIES=1'. I am assuming you only have 1 set of PE reads.


Quote:
Originally Posted by konika View Post
Will run velvet optimiser with the following paramters:
Velveth parameter string:
-shortPaired -fastq file1.fastq -shortPaired2 -fastq file2.fastq
What parameters did you use to run Velvet Optimiser? It looks like you should have

'-shortPaired -separate -fastq file1.fastq file2.fastq'

if what you have is PE reads.
mastal is offline   Reply With Quote
Old 05-26-2014, 03:34 AM   #7
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Quote:
Originally Posted by konika View Post
Apart from this, I run velvetg on the subsample for Kmer hash 75-151, and kmercov 320-370 for each kmer length. I then compare all results for n50. Below is the result:
file n50 total.length longest ncontig
154 stats_h151_cov350.txt 20880 33577 20880 6
155 stats_h151_cov360.txt 20880 33577 20880 6
156 stats_h151_cov370.txt 20880 33577 20880 6
151 stats_h151_cov320.txt 20880 33538 20880 6
152 stats_h151_cov330.txt 20880 33518 20880 6
153 stats_h151_cov340.txt 20880 33518 20880 6
148 stats_h149_cov350.txt 20878 33569 20878 6
149 stats_h149_cov360.txt 20878 33569 20878 6
150 stats_h149_cov370.txt 20878 33569 20878 6
146 stats_h149_cov330.txt 20878 33533 20878 6
147 stats_h149_cov340.txt 20878 33508 20878 6
145 stats_h149_cov320.txt 20878 33468 20878 6


Looks like its in decreasing order of Kmer length for highest n50, and very small total length, something wrong with it?


The differences in n50 and total length are very small, I doubt that they are significant. How did you calculate the coverage, did Velvet Optimiser calculate the coverage? Velvet doesn't do so well with very high coverage.

Last edited by mastal; 05-26-2014 at 03:38 AM.
mastal is offline   Reply With Quote
Old 05-26-2014, 03:38 AM   #8
konika
Member
 
Location: Norway

Join Date: Sep 2010
Posts: 14
Default

Quote:
Originally Posted by mastal View Post
Try recompiling velvet with 'CATEGORIES=1'. I am assuming you only have 1 set of PE reads.




What parameters did you use to run Velvet Optimiser? It looks like you should have

'-shortPaired -separate -fastq file1.fastq file2.fastq'

if what you have is PE reads.
Hi, Yes I am using just one set of paired end reads, will try with correct command and add 'CATEGORIES=1' there. New command will be :


VelvetOptimiser.pl -s 75 -e 159 -t 1 -f '-shortPaired -separate -fastq file1.fastq -file2.fastq' --optFuncKmer 'n50' -g 2800000 -o '-exp_cov 350' 'CATEGORIES=1'
looks ok?
konika is offline   Reply With Quote
Old 05-26-2014, 03:41 AM   #9
konika
Member
 
Location: Norway

Join Date: Sep 2010
Posts: 14
Default

Quote:
Originally Posted by mastal View Post
The differences in n50 and total length are very small, I doubt that they are significant. How did you calculate the coverage, did Velvet Optimiser calculate the coverage? Velvet doesn't do so well with very high coverage.
Hi, I describe the Kmer coverage calculation above in the first post. This result was from a subsample of total reads. Yes,it is quite high coverage, waiting for velvetOptimiser to run.

Last edited by konika; 05-26-2014 at 04:27 AM.
konika is offline   Reply With Quote
Old 05-26-2014, 03:50 AM   #10
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Not sure that you can adjust the number of categories when running velvet optimiser, it may just be reporting the settings you have on velvet.

You may need to recompile velvet. Reducing the MAXKMERLENGTH (to the longest kmer you actually are going to use) when recompiling velvet should also reduce the memory usage.
mastal is offline   Reply With Quote
Old 05-26-2014, 03:54 AM   #11
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Quote:
Originally Posted by konika View Post
Hi, I describe the Kmer coverage calculation above in the first post. This result was from a subsample of total reads. Yes,it is quite high coverage, waiting for velvetOptimer to run.
OK, if you have subsampled your data, then you are using fewer reads and so your coverage will be proportionally less. Can velvet optimiser give you the correct value to use for coverage?
mastal is offline   Reply With Quote
Old 05-26-2014, 04:46 AM   #12
konika
Member
 
Location: Norway

Join Date: Sep 2010
Posts: 14
Default

I solved the memory thing, It was due to the wrong genome size.
The size should be in megabases so 2.8. Will post if I get useful result from VelvetOptimiser. Thanks
konika is offline   Reply With Quote
Reply

Tags
coverage, kmer, paired end

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:22 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO