View Single Post
Old 02-04-2020, 05:56 AM   #1
Location: Hanover, Germany

Join Date: Sep 2018
Posts: 15
Lightbulb Human read filtering with Kraken 2

Hey everyone,

I was wondering something regarding the usage of Kraken 2 for read filtering.
I sequence/assemble viruses and inherited a pipeline in which human reads were removed by a simple Bowtie2 mapping to hg19.
While doing some metagenomic analysis of a sample with Kraken 2, I noticed how fast it was, so I build a Kraken database with just the human genome and tried filtering my reads with that. I tested it on a sample with around 2 million reads which we already assembled previously.

Filtering with Kraken was ~10x faster and both methods removed roughly half of the reads. Kraken, however, removed ~6k reads more, 2k of which seemed to be viral (985k of the remaining 1028k reads reads mapped to the assembled virus consensus, compared to 987k of the remaining 1034k reads with Bowtie).

I looked online but found noone who (mis)uses Kraken that way. The database is even smaller than the indexed hg19 for Bowtie. Am I missing something? At least, I feel a little bad for Kraken (if I'm allowed to anthropomorphise a bit) as it provides me with all this taxonomic data which I'm completely ignoring.

Last edited by JasperGeh; 02-04-2020 at 06:02 AM.
JasperGeh is offline   Reply With Quote