SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using Velvet options and the results, examine contigs kim De novo discovery 16 01-16-2014 05:49 AM
meta-velvet returns nodes instead of contigs in assembly? deprekate Bioinformatics 2 10-25-2012 01:53 PM
Acquiring contigs, de Bruijn graphs, velvet bioinf Bioinformatics 8 10-14-2012 07:27 PM
velvet assembler contigs into gbrowse Zimbobo Bioinformatics 0 04-15-2010 01:36 PM
number of contigs in velvet bioenvisage Bioinformatics 6 03-24-2010 08:10 PM

Reply
 
Thread Tools
Old 12-06-2012, 03:33 AM   #1
e.dobbs
Junior Member
 
Location: Kent

Join Date: Nov 2011
Posts: 6
Default Bowtie alignment using de novo velvet contigs

I'll start by saying that I am a biologist learning bioinformatics from scratch so please bear with me!

I'm doing de novo sequencing of a complex of dsRNA viruses using MiSeq 150PE reads. Some sanger sequencing has been done before so I have some partial sequences already which I'm hoping to use to validate my assemblies.

As a quick test of my new data I aligned my reads to the sanger assemblies using bowtie and used Tablet to visualise my alignments. I found that I had average read depth of 3300x to 81000x depending on the virus but sometimes the reads density was quite patchy, producing "islands" of very high coverage (suggesting that the sanger seq may not be entirely correct?).

When I produce a de novo assembly using velvet:
Code:
velvetg Trimmed/234/_51 -cov_cutoff 10 -ins_length 170 -exp_cov 13200 -scaffolding no -min_contig_lgth 100 -unused_reads yes -read_trkg yes &
and then use the contigs as a reference sequence in bowtie to align the original reads, I find that I still get patchy coverage of the contigs.

How can I get patchy coverage of my velvet-derived contigs when I use the same reads to produce a bowtie alignment? Is there a better/easier way of visualising my reads on the contigs that I produce (bearing in mind that my command line skills are very basic and programming is non-existent)?

Thanks, Ed
e.dobbs is offline   Reply With Quote
Old 12-06-2012, 04:02 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

I would wonder if velvet will work properly with so many reads.

Try passing it just a fraction of your reads, see if you get better contigs. Aim for maybe 200x, at the most.
swbarnes2 is offline   Reply With Quote
Old 12-07-2012, 12:47 AM   #3
e.dobbs
Junior Member
 
Location: Kent

Join Date: Nov 2011
Posts: 6
Default

Thanks for the reply..I got the same answer in a similar thread. I tried running 1/6000th of my reads yesterday and I got a much better assembly. I'm looking forward to seeing what happens when I do the same with the remaining 5999/6000th and merge it all
e.dobbs is offline   Reply With Quote
Old 08-21-2013, 06:39 AM   #4
nareshvasani
Member
 
Location: NC

Join Date: Apr 2013
Posts: 57
Post Hi e.dobbs

Hi,

I am also learning bioinformatics same as you from scratch.
Hope you can help me out.
I got fastq files from ion proton instrument, it has single end read from 50-340 sequence length.
I don't have reference genome.
Here is how I did my analysis:
1]Fastqc
2] Trimmmimg some read using fastx tool

For de novo assembly:
3]Velveth with kmer 31
4]velvetg
5] bowtie-build== to build reference index from contig.fa file created from velvetg
6] Mapping my fastq read with above build reference index, to

So my questions are:
1] How you select parameter of velvetg and velveth
2] Which kmer value to select
3] Am I running bowtie in correct manner?
4]If yes, how do i confirm assembly created using velvet contain preserved input information and it's accuracy.

Thanks a bunch in advance.

Naresh






Quote:
Originally Posted by e.dobbs View Post
I'll start by saying that I am a biologist learning bioinformatics from scratch so please bear with me!

I'm doing de novo sequencing of a complex of dsRNA viruses using MiSeq 150PE reads. Some sanger sequencing has been done before so I have some partial sequences already which I'm hoping to use to validate my assemblies.

As a quick test of my new data I aligned my reads to the sanger assemblies using bowtie and used Tablet to visualise my alignments. I found that I had average read depth of 3300x to 81000x depending on the virus but sometimes the reads density was quite patchy, producing "islands" of very high coverage (suggesting that the sanger seq may not be entirely correct?).

When I produce a de novo assembly using velvet:
Code:
velvetg Trimmed/234/_51 -cov_cutoff 10 -ins_length 170 -exp_cov 13200 -scaffolding no -min_contig_lgth 100 -unused_reads yes -read_trkg yes &
and then use the contigs as a reference sequence in bowtie to align the original reads, I find that I still get patchy coverage of the contigs.

How can I get patchy coverage of my velvet-derived contigs when I use the same reads to produce a bowtie alignment? Is there a better/easier way of visualising my reads on the contigs that I produce (bearing in mind that my command line skills are very basic and programming is non-existent)?

Thanks, Ed
nareshvasani is offline   Reply With Quote
Old 08-22-2013, 01:21 AM   #5
e.dobbs
Junior Member
 
Location: Kent

Join Date: Nov 2011
Posts: 6
Default

Hi nareshvasani,

As a newbie I'm probably not the best person to ask but I can certainly try to answer your questions:

You are definitely doing everything the right way and in the right order, from what you have said.

For velvet most people seem to try a range of different kmers and then chose the ones that give the highest n50 value (ie average contig length). Often using a number of velvet assemblies and then further assembling the contigs with a separate program (eg cap3) can give you longer contigs.

When I was doing my analysis using velvet I chose a range of kmer values and then merged the results using velvet-oases. I was looking for ~30 different contigs ranging in size from 500bp to 20kb so the n50 wasn't the best reflection of the quality of the assemblies in my case. Oases produced thousands of contigs so I used Cap3 to reduce the redundancy of the assemblies and provide me with a sensible number of final assemblies (100-250) which I could then assess using BLASTn, BLASTp and by searching for conserved domains.

In terms of validating the contigs produced I have been using PCR and Sanger sequencing but again this may not be ideal for your purposes if you are looking at a large genome.

Good luck!

Ed

Quote:
Originally Posted by nareshvasani View Post
Hi,

I am also learning bioinformatics same as you from scratch.
Hope you can help me out.
I got fastq files from ion proton instrument, it has single end read from 50-340 sequence length.
I don't have reference genome.
Here is how I did my analysis:
1]Fastqc
2] Trimmmimg some read using fastx tool

For de novo assembly:
3]Velveth with kmer 31
4]velvetg
5] bowtie-build== to build reference index from contig.fa file created from velvetg
6] Mapping my fastq read with above build reference index, to

So my questions are:
1] How you select parameter of velvetg and velveth
2] Which kmer value to select
3] Am I running bowtie in correct manner?
4]If yes, how do i confirm assembly created using velvet contain preserved input information and it's accuracy.

Thanks a bunch in advance.

Naresh
e.dobbs is offline   Reply With Quote
Old 08-22-2013, 07:12 AM   #6
nareshvasani
Member
 
Location: NC

Join Date: Apr 2013
Posts: 57
Default Hi e.dobbs,

Thanks for your reply.

Quote:
Originally Posted by e.dobbs View Post
Hi nareshvasani,

As a newbie I'm probably not the best person to ask but I can certainly try to answer your questions:

You are definitely doing everything the right way and in the right order, from what you have said.

For velvet most people seem to try a range of different kmers and then chose the ones that give the highest n50 value (ie average contig length). Often using a number of velvet assemblies and then further assembling the contigs with a separate program (eg cap3) can give you longer contigs.

When I was doing my analysis using velvet I chose a range of kmer values and then merged the results using velvet-oases. I was looking for ~30 different contigs ranging in size from 500bp to 20kb so the n50 wasn't the best reflection of the quality of the assemblies in my case. Oases produced thousands of contigs so I used Cap3 to reduce the redundancy of the assemblies and provide me with a sensible number of final assemblies (100-250) which I could then assess using BLASTn, BLASTp and by searching for conserved domains.

In terms of validating the contigs produced I have been using PCR and Sanger sequencing but again this may not be ideal for your purposes if you are looking at a large genome.

Good luck!

Ed
nareshvasani is offline   Reply With Quote
Reply

Tags
bowtie velvet gaps patchy

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO