View Single Post
Old 01-09-2020, 02:40 PM   #1
unionicola
Junior Member
 
Location: Wisconsin

Join Date: Feb 2009
Posts: 4
Default bbduk and removing adapters of varying length

I am analyzing some published Tn-seq data. There appears to be residual transposon sequence in the reads, preventing alignment. Unfortunately, this sequence is of variable length. Here are some example reads, the transposon sequence is undrelined:

ATTCCGCTCTTCCGATCTAGTCATGCGCGGCCGCATAACATAACCGGTTGGATGATAAGTCCCCGGTCTATAT
ATTCTTCCCTACACGACGCTCTTCCGATCTAGTCATGCGCCGGAGCATTAGGTAACAGGTTGGATGATAAGTC
ATGCAGTCATGCAAATGATAACAGGTTGGATGATAAGTCCCCGGTCTATATTGAGAGTAACTACATTTACCGT
ATTCATCATTGCGGCAGTCATGCCTATTGTTCCTGGTGTAACAGGTTGGATGATAAGTCCCCGGTCTATATTG


I want to search for the last 8 bases of the transposon sequence (AGTCATGC) and remove any sequence to the left of it, but retaining all the remaining sequence in the read and any read wherein there is no transposon sequence. I've been trying bbduk using the following command:

Code:
bbduk.sh in=$in.fastq literal=AGTCATGC ktrim=l k=8 rcomp=f out=$out.fastq
But this seems to result in nearly every read being removed (the values drop from over 4 million to about 20,000).

Can any one help me with this issue? Thanks!
unionicola is offline   Reply With Quote