![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
bowtie with humongous reference database | rtyagi | Bioinformatics | 4 | 07-06-2015 06:13 AM |
Blasting contigs against reference database | cyanoevo | Bioinformatics | 4 | 01-27-2015 05:54 AM |
snpEff Reference Genome Database | Pepper_and_Tomato | Bioinformatics | 0 | 07-23-2012 02:23 AM |
How can I estimate overall coverage against a reference database? | dacotahm | Bioinformatics | 1 | 11-22-2011 05:01 PM |
Super Large Reference Genome | PatrickReed | Bioinformatics | 3 | 10-12-2011 11:07 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: France Join Date: Sep 2010
Posts: 27
|
![]()
Hi,
I would like to check for contaminants using both phiX and the human genome. My data is metagenomics data and i want to remove any read mapping to both phiX and the Human genome. So far bbduk can handle this by using the ref=phiX.fa However for checking contaminations from human samples i would like to ust the non redundant nucleotide database. It is split into small pieces and usually i access them through blast using the reference nt.nal file. Is that is also feasible with bbduk ?? |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
I don't completely understand what you mean by "i would like to use the non redundant nucleotide database" to remove contamination from human samples. It may still be easier to do what you have been doing (separate human reads from other stuff).
You should be able to use BBSplit or seal, which can accept a folder of references. Whether BBSplit can accept a "nr" size folder may need to be experimented with. |
![]() |
![]() |
![]() |
#3 |
Member
Location: France Join Date: Sep 2010
Posts: 27
|
![]()
Sorry for the confusion. I was confused with large blast databases (.nal file). bbduk does its own indexing....so no way to use blast index databases.
Which Human database does people mots frequently use to discard human contamintation reads from metagenomes ? I tough to use the nt database (nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ; excluding bulk divisions (gss, sts, pat, est, htg) ) ??? |
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]() Quote:
You can just use the human genome sequence (multi-fasta concatenated chromosomes in single file, from UCSC/Ensembl/NCBI/iGenomes) with bbduk (or bbsplit). BBSplit may be better since you can bin all sequences that align to human in one file and capture the rest of the data in second output file. |
|
![]() |
![]() |
![]() |
#5 |
Member
Location: France Join Date: Sep 2010
Posts: 27
|
![]()
great iŽll work on that.... combining with bbsplit
thanks |
![]() |
![]() |
![]() |
#6 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
After using BBDuk for PhiX removal, the protocol JGI uses for human removal is this, with BBMap and a masked human reference. Using BBSplit is strictly better, if you know your intended organism's genome. But, JGI rarely knows that, which is why we are sequencing it
![]() You can download the masked human reference from the link provided. It constitutes around 98% of the human genome. That means some reads will intentionally slip through, in regions that are highly conserved down to early eukaryotes, or those with very low complexity. But, the point is to remove virtually all human contamination with no risk of false positives. If you absolutely need to remove ALL human contamination and don't know the organism's genome, you should use the unmasked reference, and you probably will get some false positive removals. For assembly of a new organism, I think it is best to remove human contaminants using the above very safe procedure, then assemble, then BLAST the assembly and remove anything long (say, >400bp) that hits human with >98% identity, and hits nothing else other than other primates (typically chimp, gorilla, and orangutan). Also, note that I do not recommend using nt/nr in any primary decontamination procedure for which you know the possible contaminants (like determining which reads are, specifically, human) - they are incomplete, poorly-curated, and the process becomes extremely slow because they are huge. Rather, using the references (or masked versions of the references) will give you a better signal-to-noise ratio. nt/nr are much better for diagnosing which things may be present than actually removing them. Since you're doing metagenomics, using an unmasked human genome is probably fine since humans and bacteria are very dissimilar. But, unless you are doing a human-related microbiome, you might consider removing common human-associated microbes such as E.coli and Salmonella. They seem to be anywhere humans are. Masking things like ribosomes is probably prudent if you do this. There are also some others like Delftia and Pseudomonas that seem to be common sequencing contaminants and cause problems with metagenome analysis, as they seem to show up everywhere, even if human-related DNA is not present, and even in single-cell experiments of other species. Anyway, something to consider. |
![]() |
![]() |
![]() |
#7 |
Member
Location: France Join Date: Sep 2010
Posts: 27
|
![]()
Thanks Brian,
Thanks for the masked version on Hg19. Do you hava also masked version hg38 ? Just another quick question. Have you published BBmap or how to cite your software ? |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
You can use bbmask.sh from BBMap to create masked version of hg38.
BBMap has not been published yet. In the past @Brian has asked people to cite the project's SourceForge (http://sourceforge.net/projects/bbmap/) website in publications. |
![]() |
![]() |
![]() |
#9 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
I would not worry about HG19 versus HG38 for the purposes of contaminant removal. They mainly differ in their coordinates, not contents.
|
![]() |
![]() |
![]() |
Tags |
bbduk |
Thread Tools | |
|
|