Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
UV spectra of some common contaminants of DNA or RNA solutions. pmiguel Sample Prep / Library Generation 7 04-19-2017 07:55 AM
Fastq_quality_filter no removing sequences oneofmany Bioinformatics 2 08-28-2013 04:09 PM
Software for aligning DNA onto genomic sequences to retrieve INTRONS atape Bioinformatics 8 07-05-2013 09:34 AM
Removing contaminants, transposons, dark matter before mapping blindtiger454 RNA Sequencing 0 02-08-2011 05:30 PM
PubMed: Determination of genomic DNA sequences for beta-tubulin isotype 1 from multip Newsbot! Literature Watch 0 09-26-2009 02:04 AM

Thread Tools
Old 03-18-2014, 11:22 AM   #1
Location: USA

Join Date: Sep 2012
Posts: 41
Question Removing contaminants (bacteria, phage) in genomic dna sequences

Hi Everyone,
I have dna, paired-end genomic sequences that I want to perform de novo assembly on, but I want to 'clean' them up first ie remove contaminants, before de novo assembly.
I have already done the trimming out of low quality reads and adapter removal. I would like to preferably map the reads or blast them to a bacterial and/or phage database, then keep the un mapped reads for further analysis downstream.

Could any one please guide me on how I can approach this?

I was thinking of downloading the bacterial and/or phage genomes to my local computer, but then there are 1000's of genomes presently.

Ideas and suggestions will be, as always, very appreciated!
NGS_New_User is offline   Reply With Quote
Old 03-18-2014, 01:50 PM   #2
Carrot Scientist
Location: Madison WI USA

Join Date: Nov 2009
Posts: 42

I used DeconSeq for this, but I had to install and run it locally, since my job was in the queue for two weeks waiting and then vanished. (this was back in November).

I was lucky, only a very tiny trace of fungal+bacterial ribosomal genes. Though I was glad to see those, it indicated that the filter worked.

But, of course you can only check for contamination of sequenced organisms in the Deconseq database. I would be curious if there could be a more general filter, but I can't really see how, for a newly assembled genome.

Edit: Oh, and if you do this with assembled contigs, instead of reads, realize that the entire sequence is flagged as contaminated or not. I chopped up my assembled sequence into scaftigs prior to running (does anyone else ever use the term scaftigs?)

Last edited by dsenalik; 03-18-2014 at 01:56 PM.
dsenalik is offline   Reply With Quote
Old 03-18-2014, 02:53 PM   #3
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

You could BLAST a few thousand reads, see what they hit, and then just download the genomes of organisms that appear to be contaminants for your filtering. If you have a reference of your target genome, map to the reference, then only BLASTthe unmapped reads.

Once you have references, bbduk or bbmap can decontaminate using kmers or mapping, respectively, with the "outu" (output unmapped/unmatched) stream.
Brian Bushnell is offline   Reply With Quote
Old 03-19-2014, 06:27 AM   #4
Senior Member
Location: Ohio

Join Date: Jan 2010
Posts: 144

My approach has been to do the assembly first, then try to remove contaminant contigs. This greatly reduces the amount of computation that has to be done. Chimeric contigs of contaminant and target should be extremely rare.

If you have a microbial genome, IMG has some tools for finding contamination, eg:
cliffbeall is offline   Reply With Quote

bacteria, blast, contaminant, phage

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:05 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO