![]() |
bbduk and removing adapters of varying length
I am analyzing some published Tn-seq data. There appears to be residual transposon sequence in the reads, preventing alignment. Unfortunately, this sequence is of variable length. Here are some example reads, the transposon sequence is undrelined:
ATTCCGCTCTTCCGATCTAGTCATGCGCGGCCGCATAACATAACCGGTTGGATGATAAGTCCCCGGTCTATAT ATTCTTCCCTACACGACGCTCTTCCGATCTAGTCATGCGCCGGAGCATTAGGTAACAGGTTGGATGATAAGTC ATGCAGTCATGCAAATGATAACAGGTTGGATGATAAGTCCCCGGTCTATATTGAGAGTAACTACATTTACCGT ATTCATCATTGCGGCAGTCATGCCTATTGTTCCTGGTGTAACAGGTTGGATGATAAGTCCCCGGTCTATATTG I want to search for the last 8 bases of the transposon sequence (AGTCATGC) and remove any sequence to the left of it, but retaining all the remaining sequence in the read and any read wherein there is no transposon sequence. I've been trying bbduk using the following command: Code:
bbduk.sh in=$in.fastq literal=AGTCATGC ktrim=l k=8 rcomp=f out=$out.fastq Can any one help me with this issue? Thanks! |
I just tried your code on a test set with a random kmer in the 3rd read in bold
Code:
@NGSNJ-086:222:GW191226409th:1:1103:5556:31187 1:N:0:GAAGCGGCAC+CGGCTCTACT Input: 3 reads 450 bases. KTrimmed: 1 reads (33.33%) 65 bases (14.44%) Total Removed: 0 reads (0.00%) 65 bases (14.44%) Result: 3 reads (100.00%) 385 bases (85.56%) cat test_out.fastq Code:
@NGSNJ-086:222:GW191226409th:1:1103:5556:31187 1:N:0:GAAGCGGCAC+CGGCTCTACT You could also try kmask=# to see if it is finding the kmer desired. |
Thank you for your help.
Previously, I was getting the following output results summary: Code:
Input: 29978820 reads 2188453860 bases. Code:
bbduk.sh in=input.fastq literal=AGTCATGC,ACGTACTG,TGACTGCA,TCGACGAT,CTAGCATG,GACTGTAC ktrim=l k=8 rcomp=f outm=removed.fastq out=trimmed.fastq Code:
Input: 29978820 reads 2188453860 bases. Nonetheless, at least I'm getting the results I want! Thanks for your help again! |
All times are GMT -8. The time now is 02:49 PM. |
Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.