SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Extract reads from paired-end fastq based on specific adapters with bbduk (http://seqanswers.com/forums/showthread.php?t=91601)

gspirito 11-08-2019 02:53 AM

Extract reads from paired-end fastq based on specific adapters with bbduk
 
Hello everyone, I am using bbduk.sh (from bbmap toolkit) to extract reads from paired-end fastq files based on the presence of specific adapters in the 5' of the sequence in the "_1" fastq file.

I am using this command:

Code:

./bbmap/bbduk.sh -Xmx1g in1=reads_1.fastq.gz in2=reads_2.fastq.gz outm1=matched1.fastq.gz outm2=matched2.fastq.gz literal=AAACCTGAGAAACCTA k=16 hdist=0 -rcomp=f
The problem is that other that the correct reads, the output file contains also other reads which do not include the adapter sequence, es:

# from reads_1.fastq.gz
@SRR9262917.232075 232075/1
GCATGCGAGTAGCGGTGGTTCTTATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFF

# from reads_2.fastq.gz
@SRR9262917.232075 232075/2
AAGCAGTGGTATCAACGCAGAGTACATGGGATTCCATAGCCCTGTGGTTTTTATAGATCTTGTAAACCCCAAACCTGGGAAACCTAGTGGC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF,FFFFFFFFF

Does anyone know why this may be happening and how to avoid this?

Thanks in advance.

GenoMax 11-08-2019 06:32 AM

You could add a "restrictleft=N" N=certain number of bases to look only in that area. Also adding "minlength=N" will exclude small reads like the first example. Also try setting k to something smaller (8) so it has better chances of matching correctly.

I hope "-rcomp=f" is a typo. There should be no - at beginning.

gspirito 11-12-2019 07:37 AM

Thank you! That worked


All times are GMT -8. The time now is 07:20 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.