Seqanswers Leaderboard Ad

**BhariD** · 01-14-2014, 07:10 AM

Hi,

I have to trim full-length adapter sequences with zero number of mismatches. I do not want to trim reads on any other criteria at this point.

I am using the following command line:
./skewer-0.1.99-linux-x86_64 -x ACACTCTTTCCCTACACGACGCTCTTCCGATCT -y GATCGGAAGAGCGGTTCA
GCAGGAATGCCGAG -r 0 -d 0 -o exact_trim_15 -t 8 read_1.fastq paired_read2.fastq

Log file includes:
Parameters used:
-- 3' end adapter sequence (-x): ACACTCTTTCCCTACACGACGCTCTTCCGATCT
-- paired 3' end adapter sequence (-y): GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG
-- maximum error ratio allowed (-r): 0.000
-- maximum indel error ratio allowed (-d): 0.000
-- minimum read length allowed after trimming (-l): 18
-- file format (-f): Sanger/Illumina 1.8+ FASTQ (auto detected)
-- number of concurrent threads (-t): 8
Tue Jan 14 02:18:14 2014 >> started

Tue Jan 14 02:19:33 2014 >> done (78.699s)
47656840 read pairs processed; of these:
0 ( 0.00%) short read pairs filtered out after trimming by size control
0 ( 0.00%) empty read pairs filtered out after trimming by size control
47656840 (100.00%) read pairs available; of these:
3202 ( 0.01%) trimmed read pairs available after processing
47653638 (99.99%) untrimmed read pairs available after processing

Length distribution of reads after trimming:
length count percentage
97 1 0.00%
98 4 0.00%
99 3197 0.01%
100 47653638 99.99%

My questions are:

1) The 3197 read pairs trimmed, given the input parameter settings, are they really trimmed just based on exact full-length adapter sequence match? any default parameter that I should be aware of?
2) What is the overlap length for adapter detection in paired-end mode? is it like initial 17 bp of the total length? Is there a way I can change this?
3) How can I change the number of mismatches to detect the adapter region in the read? Let's say if I want to allow only 2 mismatches (instead of zero mismatches) in the full-length adapter sequence?
4) How can I specify multiple adapter sequences for read 1 and read 2 data files?

I would appreciate your help! Thank you!

**relipmoc** · 01-14-2014, 08:31 AM

Thank you so much for your feedback!

Quick answers to your questions:
1) The searching process is based on exact full-length adapter sequence, but for the 3197 read pairs, only the last nucleotides were identified as the first nucleotides of corresponding adapter sequences. In current implementation, adapter sequence longer than 64 nt will be cut to 64 nt before processing.

2) There's no need to specify the overlap length in paired-end mode. The program knows how to do it correctly.

3) The program only provides a parameter of error ratio (by -r) and detect the most possible adapter location by a statistical scheme which takes into account the quality values. If you just want to specify the number of maximum allowed mismatches in the full-length adapter sequence, you can use fq2fa.sh to transfer the FASTQ files to FASTA files, and specify the maximum allowed error ratio (-r) as 2/33=0.06. For small RNA adapter trimming, it is something like the following command:
$ fq2fa.sh srnaReads.fq | skewer -x TCGTATGCCGTCTTCTGCTTGAAAAAAA -L 30 -r 0.06 -o trimmed -

4) For multiple adapter sequences, you just need to specify two FASTA files which contain adapter sequences, and input something like:
$ skewer -x adapters1.fa -y adapters2.fa flowcell1_lane7_pair1.fastq.gz flowcell1_lane7_pair2.fastq.gz

Attached Files

fq2fa.zip (286 Bytes, 106 views)

**BhariD** · 01-14-2014, 01:00 PM

Thank you for your prompt response!

I am sorry, I couldn't quite get the "In current implementation, adapter sequence longer than 64 nt will be cut to 64 nt before processing"? I don't think I have adapter more than 62 bp so then why its looking for last few nucleotides (3 I guess here?)?

**BhariD** · 01-14-2014, 02:40 PM

skewer: A fast and sensitive adapter trimmer for paired-end reads

Also, what is the base quality value threshold used by the tool to be considered as a mismatch? in "3) The program only provides a parameter of error ratio (by -r) and detect the most possible adapter location by a statistical scheme which takes into account the quality values"

Thanks!

**relipmoc** · 01-14-2014, 05:29 PM

As I said, "there's no need to specify the overlap length in paired-end mode", actually there's no parameter or default parameter for the overlap length in paired-end mode.

The 64 nt statement is irrelevant to your question. I just misunderstood your question "any default parameter that I should be aware of". ^_^

"why its looking for last few nucleotides (3 I guess here?)". Unfortunately your guess is not the truth. It's by chance that you got this result.

Originally posted by BhariD View Post

Thank you for your prompt response!

I am sorry, I couldn't quite get the "In current implementation, adapter sequence longer than 64 nt will be cut to 64 nt before processing"? I don't think I have adapter more than 62 bp so then why its looking for last few nucleotides (3 I guess here?)?

**relipmoc** · 01-14-2014, 05:32 PM

There's no base quality value threshold. That's all integrated into the statistical scheme. Since we have not published the paper, I can not tell you the details at the moment. Sorry for that!

Originally posted by BhariD View Post

Also, what is the base quality value threshold used by the tool to be considered as a mismatch? in "3) The program only provides a parameter of error ratio (by -r) and detect the most possible adapter location by a statistical scheme which takes into account the quality values"

Thanks!

**roryk** · 02-17-2014, 09:15 AM

Hi relipmoc,

I have a couple of questions:

1) How does skewer handle partial matches? For example if I have a sequence that goes SEQUENCE-ADAPTER-BARCODE, and I just input ADAPTER, will I end up with SEQUENCE?

2) Why is this sequence not being trimmed? Does skewer only match the entire adapter sequence?

@test_truseq/1
CGATGATCAAGACCCAAGTGTGAGATTACGGAGATCGGAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@test_truseq/2
CGATGATCAAGACCCAAGTGTGAGATTACTCAGATCGGAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

~/tmp/skewer-0.1.104-linux-x86_64 -x AGATCGGAAGAG -y AGATCGGAAGAG test_cutadapt_1.fastq test_cutadapt_2.fastq

Thanks! I've been looking around for a faster trimmer and was hoping skewer would be the solution.

**frozenlyse** · 02-17-2014, 04:59 PM

Originally posted by relipmoc View Post

There's no base quality value threshold. That's all integrated into the statistical scheme. Since we have not published the paper, I can not tell you the details at the moment. Sorry for that!

That's not a good way to get people to use your software!

**relipmoc** · 02-18-2014, 07:38 AM