Seqanswers Leaderboard Ad

**seb567** · 05-31-2011, 09:28 AM

Originally posted by NGS_user View Post

Hi Folks,
I am using Velvet to assemble a number of genes where the reads are of 75bp length.

Are the reads paired ?

Originally posted by NGS_user View Post

An issue I am having is that some of these genes are a result of duplications, where the parent and duplicate gene are very similar. Am I right in thinking that a high k-mer length will reduce that chances of an assembly error (smaller k-mers being merged as one contig despite coming from reads generated from duplicates).

Surely this will account for some differences in assemblers using bubble merging or bubble popping approaches such as Velvet or ABySS.

In general, increasing the k-mer length increases the uniqueness of k-mers in the resulting graph.

Two things disallow the use of a very large k-mer length. The first is obviously the read length. The second is the error rate.

Originally posted by NGS_user View Post

I realize sequencing errors may be unavoidable, hopefully good coverage will help avoid these.

If sequencing errors occur randomly, they won't stack and therefore can be weeded out to some extent. Different assemblers will do that in different manners.

For example, In Ray (see http://denovoassembler.sf.net; I am the author), these errors are just avoided, but are not removed from the graph.

Originally posted by NGS_user View Post

If longer k-mers are better for duplicates would it be better to generate longer reads?

Longer reads is always better if the throughput scales as well.

This is one of the goals that Pacific Biosciences aims to achieve -- longer reads.

Maybe you can try Ray on your dataset. Ray does not merge similar paths in the assembly process so that might help.

seb

**NGS_user** · 06-01-2011, 12:51 AM

The reads are single end but if I am to generate new data I could have paired end reads of either 100 or 150 bp (GAII). I am just concerned that the high error rate will affect my assemblies as I am not assembling a genome, rather a family of mammalian genes

**seb567** · 06-10-2011, 07:59 PM

Originally posted by NGS_user View Post

The reads are single end but if I am to generate new data I could have paired end reads of either 100 or 150 bp (GAII). I am just concerned that the high error rate will affect my assemblies as I am not assembling a genome, rather a family of mammalian genes

Perhaps you could first perform simulations on those genes (if they are known) or on closely-related or similar genes.

You can do that with Ray right away.

First, you need these packages (available in all GNU/Linux distros):

make
g++
open-mpi
git (to get the development version of Ray)
boost (to compile the read simulator shipped with Ray)

What follows is the workflow you could use.

Install Ray and VirtualNextGenSequencer

Code:

git clone [email protected]:sebhtml/ray.git
cd ray
make PREFIX=build MAXKMERLENGTH=128 VIRTUAL_SEQUENCER=y
make install

Sequencer your genes in silico

Code:

N=600000 #number of pairs of reads
readLength=75
errorRate=0.005 # 0.5%
ref=~/nuccore/genes.fasta
mean=400 # average insert size
sd=40 # standard deviation

./build/VirtualNextGenSequencer $ref $errorRate \
$mean $sd $N $readLength L1_1.fasta L1_2.fasta

Build an assembly

Code:

mpirun -np 64 ./build/Ray -k 70 -p L1_1.fasta L2_2.fasta \
 -o GeneBuild

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Large K-mer Velvet

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News