View Single Post
Old 04-10-2015, 10:08 PM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

You should align your reads to the contigs, not the other way around. Generally, though, if you assembled the sequences together, most of the contigs will probably come from both datasets, so I don't think you can answer the question you asked. You can, however, find out which contigs are better represented by which dataset.

Once you do the alignment, you can use pileup (from BBTools) to calculate the coverage of each contig from the two sets: in=mapped.sam out=stats.txt

You can, alternatively, output the coverage directly from BBMap, using the "covstats" flag. BBMap also has a "kfilter" flag which will prevent a read from being mapped to a location unless there are at least k consecutive matching bases. This is designed to ensure that if you assembled with some kmer length K, only reads with at least K consecutive matching kmers will be mapped to that location - therefore, that specific read actually was used to assemble that specific contig.
Brian Bushnell is offline   Reply With Quote