SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how ro see adapter contamination in Illumina reads paa6 Illumina/Solexa 4 03-10-2014 01:31 AM
Checking for Mycoplasma contamination id0 Bioinformatics 7 10-25-2013 09:04 AM
Solexa reads contamination l.miozzi Bioinformatics 2 11-26-2011 03:14 PM
FASTQC for checking quality of 120 bp reads madsaan Bioinformatics 4 06-06-2011 11:17 PM
Massive (viral?) contamination of Illumina reads modmp General 6 09-24-2010 09:53 AM

Reply
 
Thread Tools
Old 06-22-2015, 12:39 PM   #1
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default Checking reads for contamination

Is there a tool to check for the source of contamination in sequencing reads? I am looking for something like BLAST, but that would summarize across many reads.

For example, I have a FASTQ that is supposed to be human. Only 50% of the reads align to human. Where are the other reads coming from?
id0 is offline   Reply With Quote
Old 06-22-2015, 01:41 PM   #2
skbrimer
Member
 
Location: OP Kansas

Join Date: Mar 2014
Posts: 53
Default

maybe this https://github.com/blaxterlab/blobology it looks like a good tool for quick looks.
skbrimer is offline   Reply With Quote
Old 06-22-2015, 02:04 PM   #3
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default

Quote:
Originally Posted by skbrimer View Post
maybe this https://github.com/blaxterlab/blobology it looks like a good tool for quick looks.
It looks like it performs an ABySS assembly. That seems computationally intensive. More importantly, I am not sure how well it would do with dilute samples.
id0 is offline   Reply With Quote
Old 06-22-2015, 02:50 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

I suggest that you use BBSplit from BBMap with human reference and then collect the unmapped reads in a separate file for examination.

Last edited by GenoMax; 06-22-2015 at 03:58 PM.
GenoMax is offline   Reply With Quote
Old 06-22-2015, 03:14 PM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Both BBMap and BBDuk (and BBSplit) can output a file indicating the percent and number of reads matching a given sequence, and can do so quickly for large numbers of reads. We run all of our reads through BBDuk for screening against small synthetic contaminants (primers, spike-ins, vectors, etc), and it does a nice job of quantifying their absolute abundance, but it would run out of memory processing a reference as big as nt (I don't normally give BBDuk a reference bigger than 1Gbp or so). If you follow GenoMax's advice, just grab a handful (~1000) of the reads that don't map to human and blast them against nt; hopefully something will turn up.
Brian Bushnell is offline   Reply With Quote
Old 06-23-2015, 05:27 AM   #6
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

If you barcode check the reads which are not de-multiplexed. Any reads which have a barcode you didn't make the libraries with are contamination.
NextGenSeq is offline   Reply With Quote
Old 06-23-2015, 10:35 AM   #7
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 444
Default

In my sequencing class the students take a cheek swab, do a Nextera prep and get 10M reads. The first exercise is to see what is living in their mouth. So they align to the human reference with Novoalign, then pull out the non-aligners, convert to fasta and submit a blastn job to see which bacteria are in there. Students report a huge increase in flossing frequency after seeing the typical results! The one-liner to find the non-aligners and make a fasta file is:

Quote:
cat yourname_vs_hg19.align | grep NM | head -500 | cut -f 3 | awk '{print ">" $1 "\n" $1}'
This is for Novoalign which reports a 'NM' for non-aligners and has the sequence in column 3. You can modify for other aligners, I think, pretty easily.

As part of our genotyping of populations we always check 1000 reads from each sample. It often explains some discordant results (lots of reads but low depth at the loci because most the sample is something else!).
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO