Hello, everyone. I am trying to establish an optimal sequence of Illumina PE reads pre-processing steps and I have come up with the following protocol so far (the best tool for the step, in my opinion, is given in parentheses):
I am not completely sure about the best order of several steps; for instance, what is it better to do first, deduplication or adapters removal? Also, isn't it better to conduct quality trimming first and then merge PE reads? Thank you in advance.
- Initial quality control (FastQC)
- Quality and/or length-based reads discarding; trimming/discarding of N-containing reads (bbduk.sh from BBMap/BBtools package)
- Deduplication (dedupe.sh from BBMap/BBtools package)
- Adapters removal (bbduk.sh from BBMap/BBtools package)
- [optional] Error-correction (ecc.sh from BBMap/BBtools package)
- Merging of PE reads (bbmerge.sh from BBMap/BBtools package)
- Hard and/or soft quality trimming (bbduk.sh from BBMap/BBtools package)
- Contamination check (Fastq Screen, maybe bbduk.sh from BBMap/BBtools package)
I am not completely sure about the best order of several steps; for instance, what is it better to do first, deduplication or adapters removal? Also, isn't it better to conduct quality trimming first and then merge PE reads? Thank you in advance.
Comment