View Single Post
Old 10-13-2021, 09:57 AM   #1
reliscu
Junior Member
 
Location: USA

Join Date: May 2021
Posts: 7
Default BBDuk quality filtering not producing expected result

I'm trying to trim/filter low quality reads from paired-end exome-seq data, using BBDuk.

I used the command:

```
for ea in $files;
do
R1="$ea"
R2=$(echo $R1 | sed "s/R1/R2/")
/home/shared/programs/bbmap/bbduk.sh -Xmx1g in1=$R1 in2=$R2 \
out1="$(echo $ea | sed s/.fastq.gz/_trimmed_filtered.fastq.gz/)" \
out2="$(echo $(echo $ea | sed s/R1/R2/) | sed s/.fastq.gz/_trimmed_filtered.fastq.gz/)" \
ref=/home/shared/programs/bbmap/resources/adapters.fa \
t=10 ktrim=r k=23 kmin=11 hdist=1 maq=10 minlen=60 tpe tbo
done;
```

After running fastqc on the output of this, I'm seeing that R2 files have some reads with low quality scores (see per sequence quality score), and the overrepresented sequence "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN".

Looking at these reads in the fastq:
```
@HISEQ:525:HMFYNBCXX:1:1101:1380:2167 2:N:0:CAGATC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@HISEQ:525:HMFYNBCXX:1:1101:1276:2219 2:N:0:CAGATC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@HISEQ:525:HMFYNBCXX:1:1101:1238:2328 2:N:0:CAGATC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
```

Shouldn't these reads have been filtered out?


Any help here would be much appreciated.

Last edited by reliscu; 10-13-2021 at 10:15 AM.
reliscu is offline   Reply With Quote