Seqanswers Leaderboard Ad

**swbarnes2** · 12-06-2012, 05:02 PM

I would wonder if velvet will work properly with so many reads.

Try passing it just a fraction of your reads, see if you get better contigs. Aim for maybe 200x, at the most.

**e.dobbs** · 12-07-2012, 01:47 AM

Thanks for the reply..I got the same answer in a similar thread. I tried running 1/6000th of my reads yesterday and I got a much better assembly. I'm looking forward to seeing what happens when I do the same with the remaining 5999/6000th and merge it all

**nareshvasani** · 08-21-2013, 06:39 AM

Hi e.dobbs

Hi,

I am also learning bioinformatics same as you from scratch.
Hope you can help me out.
I got fastq files from ion proton instrument, it has single end read from 50-340 sequence length.
I don't have reference genome.
Here is how I did my analysis:
1]Fastqc
2] Trimmmimg some read using fastx tool

For de novo assembly:
3]Velveth with kmer 31
4]velvetg
5] bowtie-build== to build reference index from contig.fa file created from velvetg
6] Mapping my fastq read with above build reference index, to

So my questions are:
1] How you select parameter of velvetg and velveth
2] Which kmer value to select
3] Am I running bowtie in correct manner?
4]If yes, how do i confirm assembly created using velvet contain preserved input information and it's accuracy.

Thanks a bunch in advance.

Naresh

Originally posted by e.dobbs View Post

I'll start by saying that I am a biologist learning bioinformatics from scratch so please bear with me!

I'm doing de novo sequencing of a complex of dsRNA viruses using MiSeq 150PE reads. Some sanger sequencing has been done before so I have some partial sequences already which I'm hoping to use to validate my assemblies.

As a quick test of my new data I aligned my reads to the sanger assemblies using bowtie and used Tablet to visualise my alignments. I found that I had average read depth of 3300x to 81000x depending on the virus but sometimes the reads density was quite patchy, producing "islands" of very high coverage (suggesting that the sanger seq may not be entirely correct?).

When I produce a de novo assembly using velvet:

Code:

velvetg Trimmed/234/_51 -cov_cutoff 10 -ins_length 170 -exp_cov 13200 -scaffolding no -min_contig_lgth 100 -unused_reads yes -read_trkg yes &

and then use the contigs as a reference sequence in bowtie to align the original reads, I find that I still get patchy coverage of the contigs.

How can I get patchy coverage of my velvet-derived contigs when I use the same reads to produce a bowtie alignment? Is there a better/easier way of visualising my reads on the contigs that I produce (bearing in mind that my command line skills are very basic and programming is non-existent)?

Thanks, Ed

**e.dobbs** · 08-22-2013, 01:21 AM

Hi nareshvasani,

As a newbie I'm probably not the best person to ask but I can certainly try to answer your questions:

You are definitely doing everything the right way and in the right order, from what you have said.

For velvet most people seem to try a range of different kmers and then chose the ones that give the highest n50 value (ie average contig length). Often using a number of velvet assemblies and then further assembling the contigs with a separate program (eg cap3) can give you longer contigs.

When I was doing my analysis using velvet I chose a range of kmer values and then merged the results using velvet-oases. I was looking for ~30 different contigs ranging in size from 500bp to 20kb so the n50 wasn't the best reflection of the quality of the assemblies in my case. Oases produced thousands of contigs so I used Cap3 to reduce the redundancy of the assemblies and provide me with a sensible number of final assemblies (100-250) which I could then assess using BLASTn, BLASTp and by searching for conserved domains.

In terms of validating the contigs produced I have been using PCR and Sanger sequencing but again this may not be ideal for your purposes if you are looking at a large genome.

Good luck!

Ed

Originally posted by nareshvasani View Post

Hi,

I am also learning bioinformatics same as you from scratch.
Hope you can help me out.
I got fastq files from ion proton instrument, it has single end read from 50-340 sequence length.
I don't have reference genome.
Here is how I did my analysis:
1]Fastqc
2] Trimmmimg some read using fastx tool

For de novo assembly:
3]Velveth with kmer 31
4]velvetg
5] bowtie-build== to build reference index from contig.fa file created from velvetg
6] Mapping my fastq read with above build reference index, to

So my questions are:
1] How you select parameter of velvetg and velveth
2] Which kmer value to select
3] Am I running bowtie in correct manner?
4]If yes, how do i confirm assembly created using velvet contain preserved input information and it's accuracy.

Thanks a bunch in advance.

Naresh

**nareshvasani** · 08-22-2013, 07:12 AM

Hi e.dobbs,

Thanks for your reply.

Originally posted by e.dobbs View Post

Hi nareshvasani,

As a newbie I'm probably not the best person to ask but I can certainly try to answer your questions:

You are definitely doing everything the right way and in the right order, from what you have said.

For velvet most people seem to try a range of different kmers and then chose the ones that give the highest n50 value (ie average contig length). Often using a number of velvet assemblies and then further assembling the contigs with a separate program (eg cap3) can give you longer contigs.

When I was doing my analysis using velvet I chose a range of kmer values and then merged the results using velvet-oases. I was looking for ~30 different contigs ranging in size from 500bp to 20kb so the n50 wasn't the best reflection of the quality of the assemblies in my case. Oases produced thousands of contigs so I used Cap3 to reduce the redundancy of the assemblies and provide me with a sensible number of final assemblies (100-250) which I could then assess using BLASTn, BLASTp and by searching for conserved domains.

In terms of validating the contigs produced I have been using PCR and Sanger sequencing but again this may not be ideal for your purposes if you are looking at a large genome.

Good luck!

Ed

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Bowtie alignment using de novo velvet contigs

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News