SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SPAdes contigs vpi Bioinformatics 3 12-02-2015 04:49 PM
SPAdes: with different read length bio_informatics Illumina/Solexa 6 06-10-2015 09:04 AM
Running Blast+ on multiple nodes on a cluster -- what is the best way to that? TauOvermind Bioinformatics 10 01-11-2015 05:54 PM
n50 or nodes mmmm Bioinformatics 17 04-09-2014 01:08 PM
Running Tophat/Cufflinks on a cluster with *Multiple* nodes kitinje Bioinformatics 12 03-31-2014 06:42 PM

Reply
 
Thread Tools
Old 01-11-2017, 11:36 AM   #1
kingcohn1
Junior Member
 
Location: Wisconsin

Join Date: Aug 2016
Posts: 2
Default Determining SPADEs Cx with multiple nodes

I am hoping to determine the depth of coverage at each contig, but I'm hung up by the headers. Each contiguous sequence has a top line of:

>NODE_1_length_56345_cov_1.86348

Now, I know how to get coverage for this region using: Cx=Ck*L/(L-k+1)

But, I'm confused as to why there are multiple NODE_1's. I hope to get total coverage for each contig, so how can I distinguish between multiple NODE_#'s?
kingcohn1 is offline   Reply With Quote
Old 01-11-2017, 09:10 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,404
Default

There should only be a single node 1; if you see multiple, please report that to the devs. I don't personally recall ever seeing that happen.

Also, you'll usually get a more accurate idea of the coverage by mapping the input reads to the assembly than from Spades' coverage value. For one thing, Spades is a multi-kmer assembler, so it's not even clear what k means in this context. But for another, well... mapping to calculate coverage is simply more accurate even for single-kmer assemblers.
Brian Bushnell is online now   Reply With Quote
Old 01-12-2017, 12:23 PM   #3
kingcohn1
Junior Member
 
Location: Wisconsin

Join Date: Aug 2016
Posts: 2
Default

Hi Brian and thanks for responding to me, again! I think my egrep regex was appending all of the node_1's from all of my draft assemblies...So, what should I use to map my reads to the contigs-LASTz? The next step in my pipeline is to create larger scaffolds via a reference guided algorithm (ragout or pyscaf, do you have any recs?) and then compare outputs in mauve.

PS thanks for bbmaps, I recommend it when I can.
kingcohn1 is offline   Reply With Quote
Old 01-12-2017, 12:25 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,135
Default

Since you are familiar with bbmap use it to map the original reads back to the contigs.
GenoMax is offline   Reply With Quote
Old 01-12-2017, 01:57 PM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,404
Default

To capture the per-contig coverage using BBMap, just add the flag "covstats=", e.g.:

Code:
bbmap.sh in=reads.fq ref=contigs.fa covstats=covstats.txt
Brian Bushnell is online now   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:52 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO