![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
paired-end adapter trimming | vinay052003 | Bioinformatics | 16 | 05-02-2017 07:58 PM |
Paired-end Illumina RNA-seq adapter trimming | fabrice | Bioinformatics | 8 | 01-05-2015 07:48 AM |
Illumina paired-end reads. More than 2 adapter sequences. | RedLightPanic | Illumina/Solexa | 8 | 03-07-2013 12:27 PM |
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? | danwiththeplan | Bioinformatics | 2 | 09-22-2011 02:06 AM |
PerM is an ultra-fast and sensitive SOLiD reads mapping tool | KevinLam | Bioinformatics | 7 | 06-18-2010 03:03 AM |
![]() |
|
Thread Tools |
![]() |
#21 |
Senior Member
Location: US Join Date: Dec 2010
Posts: 453
|
![]()
Hi Replimoc,
thanks for the tip with the barcoded adapters. A very nice feature. I had the strange results when trimming paired end data using the parameter "-l 20" . All the read pairs containing forward reads shorter than 20 bases were indeed filtered out, but not all of the read pairs containing reverse reads shorter than 20 bases. Btw, does skewer search for the reverse complements of the adapters by default (likely not in the paired mode)? |
![]() |
![]() |
![]() |
#22 | |
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]() Quote:
The answer is NO. |
|
![]() |
![]() |
![]() |
#23 |
Member
Location: auburn Join Date: Jan 2013
Posts: 12
|
![]()
Hi Relipmoc,
Thank you for this software. I met a problem may need your help. I am dealing with the Hiseq 2500 data with Nextra Mate Pair and following is the parameters used: skewer-0.1.114-linux-x86_64 -x GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -y GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -j CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -m mp -k 9 -f sanger -l 30 -L 150 -o skewer_library1_2 1.fastq 2.fastq -- 3' end adapter sequence (-x): GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -- paired 3' end adapter sequence (-y): GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -- junction adapter sequence (-j): CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -- maximum error ratio allowed (-r): 0.100 -- maximum indel error ratio allowed (-d): 0.030 -- minimum read length allowed after trimming (-l): 30 -- maximum read length for output (-L): 150 -- file format (-f): Sanger/Illumina 1.8+ FASTQ -- minimum overlap length for junction adapter detection (-k): 9 Wed Jun 4 15:28:27 2014 >> started Thu Jun 5 10:40:33 2014 >> done (69126.658s) 208936993 read pairs processed; of these: 93035 ( 0.04%) non-junction read pairs filtered out by contaminant control 29290940 (14.02%) short read pairs filtered out after trimming by size control 6182785 ( 2.96%) empty read pairs filtered out after trimming by size control 173370233 (82.98%) read pairs available; of these: 94951230 (54.77%) trimmed read pairs available after processing 78419003 (45.23%) untrimmed read pairs available after processing And the Length distribution of reads after trimming provided by skewer shows the maximum reads are 150bp. However, when I test the result with FastQC, I found there are many reads longer than 150bp ( please see the attachment). I also found those "long" reads by eyeballing in the result file. I would like to know have you ever experienced something like this? What would be the reason you think? P.S I have tried this with and without -L 150, and there are longer reads in both cases. Thanks, |
![]() |
![]() |
![]() |
#24 | |
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]() Quote:
Thank you very much for your feedback! The name of the parameter is misleading. Its actual meaning is the maximum equivalent read length. For example, if the length of trimmed read 1 is 224 and the length of trimmed read 2 is 40, then the equivalent read length is int((224 + 40) / 2) = 132. Therefore, using "-L 150" can not filter out this read pair. But if you use "-L 120", you can filter out this read pair. For your case, you can try "-L 75". But I guess this is not what you want. we may upgrade skewer to add another parameter for clipping bases after a specified length. |
|
![]() |
![]() |
![]() |
#25 | |
Member
Location: auburn Join Date: Jan 2013
Posts: 12
|
![]() Quote:
Actually, I am more curious about why would skewer produce trimmed reads longer than original one? Then we may avoid getting the long reads and do not need another parameter to deal with it. By the way, skewer is really fast ![]() |
|
![]() |
![]() |
![]() |
#26 | ||
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]() Quote:
Quote:
Otherwise, non-trimmed reads correspond to fragments that are originally equal to or greater than the read length. These read pairs can be classified into three classes. 1) junction adapters are found in the middle of both reads of the pair; 2) junction adapter is found in the middle of one read of the pair; 3) junction adapter is not found in either read of the pair. For class 1), skewer just trims the junction adapters as in single end (SE) cases; for class 2), without loss of generality, suppose read 1 contains junction adapter while read 2 does not contain junction adapter, skewer searches the best overlap between 3' end of read 1 and 5' end of the reverse complement of read 2 , if the overlap is after the junction adapter region of read 1, then the sub-sequences after junction adapter region of read 1 is transferred to its reverse-complemented counterpart and appended to read 2. Then you can find some reads have lengths greater than read length after adapter trimming. Thank you for the praise! Last edited by relipmoc; 06-14-2014 at 04:00 PM. |
||
![]() |
![]() |
![]() |
#27 |
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]()
If you find skewer is useful for your study, please kindly cite it in your paper. Thank you!
BMC Bioinformatics.2014, 15:182 DOI: 10.1186/1471-2105-15-182 URL: http://www.biomedcentral.com/1471-2105/15/182 Last edited by relipmoc; 06-13-2014 at 09:00 AM. Reason: :) |
![]() |
![]() |
![]() |
#28 |
Junior Member
Location: England Join Date: Jul 2014
Posts: 2
|
![]()
The source code: https://github.com/relipmoc/skewer is here.
I would have thought it would be on sourceforge but github is way better. Thanks for sharing this Last edited by ug14cxb; 07-24-2014 at 02:13 AM. |
![]() |
![]() |
![]() |
#29 | |
Member
Location: NZ Join Date: Mar 2014
Posts: 15
|
![]() Quote:
![]() 1. For the 1st case does "SE trimming" mean removing junction adapter and following sequence till the 5' end as well? 2. For the second case - adaptor in a one read only (A) what is "the best overlap" - length? mismatches? (B) what Skewer does is there is no overlap between reads? 3. How to switch of trimming of external adaptors? 4. In the analysis below it is not clear what is "549499 (24.26%) untrimmed read pairs available after processing", how can any untrimmed reads being present in result? not removed to "5968 ( 0.20%) non-junction read pairs filtered out by contaminant control" skewer -m mp -t 16 -k 30 -l 40 -b S4-R1.fastq S4-R2.fastq Parameters used: -- 3' end adapter sequence (-x): AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -- paired 3' end adapter sequence (-y): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA -- junction adapter sequence (-j): CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -- maximum error ratio allowed (-r): 0.100 -- maximum indel error ratio allowed (-d): 0.030 -- minimum read length allowed after trimming (-l): 40 -- file format (-f): Sanger/Illumina 1.8+ FASTQ (auto detected) -- minimum overlap length for junction adapter detection (-k): 30 -- number of concurrent threads (-t): 16 3016744 read pairs processed; of these: 5968 ( 0.20%) non-junction read pairs filtered out by contaminant control 725620 (24.05%) short read pairs filtered out after trimming by size control 20306 ( 0.67%) empty read pairs filtered out after trimming by size control 2264850 (75.08%) read pairs available; of these: 1715351 (75.74%) trimmed read pairs available after processing 549499 (24.26%) untrimmed read pairs available after processing Barcode dispatch after trimming: category count percentage: X01Y01 1422074 82.90% Thank you... |
|
![]() |
![]() |
![]() |
#30 | |||
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]() Quote:
Quote:
(B) no additional action for this case Do you mean to trim the external adapters only? For research purpose, you may use PE mode instead of MP mode. But it is not recommended. Quote:
|
|||
![]() |
![]() |
![]() |
#31 | |
Member
Location: NZ Join Date: Mar 2014
Posts: 15
|
![]()
Thank you much for the reply!
Quote:
1. 5968 ( 0.20%) non-junction read pairs filtered out by contaminant control having miseq read of 300pb, do only fragments shorter than 300 bp w/o JA belong to this group? or only that with an overlap between R1 and R2? 2. 549499 (24.26%) untrimmed read pairs available after processing here are almost all pairs w/o detected JA? so having the fragment of the size (300)+N1+JA+N2+(300) - is it in this group? do you recommend to exclude this group from the de-novo assembly? |
|
![]() |
![]() |
![]() |
#32 | |||
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]() Quote:
Quote:
Quote:
Note that, having JA is a prerequisite for a correctly constructed MP (Mate Pair) read. |
|||
![]() |
![]() |
![]() |
#33 | ||
Member
Location: Cologne,Germany Join Date: May 2012
Posts: 12
|
![]()
Hi, replimoc
I'm realy new to all this staff so I would like a guide. I did an MNase-seq experiment I got paired end reads and I got the following fastqc results: Quote:
and for the second Quote:
in both cases fastqc said both per base pair qualities are ok 1. what is the best way to remove those adapters without doing any filtering of the reads, per base quality or anyother. 2. also, in addition how can I chop stuff from the 3' end of both files without again doing any quality control filtering. |
||
![]() |
![]() |
![]() |
#34 |
Member
Location: NZ Join Date: Mar 2014
Posts: 15
|
![]()
Sorry, one dull question more
![]() Is there any way to redirect result files into directory different from one with input data? Not to stdout... seems that -o option is for base name only? |
![]() |
![]() |
![]() |
#35 |
Member
Location: NZ Join Date: Mar 2014
Posts: 15
|
![]()
And one more
![]() I can can easily manipulate number of "trimmed read pairs available after processing" by changing stringency options, but this slightly affects real output... ![]() default settings Wed Aug 13 22:42:51 2014 >> done (0.139s) 1000 read pairs processed; of these: 2 ( 0.20%) degenerative read pairs filtered out 2 ( 0.20%) non-junction read pairs filtered out by contaminant control 86 ( 8.60%) short read pairs filtered out after trimming by size control 2 ( 0.20%) empty read pairs filtered out after trimming by size control 908 (90.80%) read pairs available; of these: 718 (79.07%) trimmed read pairs available after processing 190 (20.93%) untrimmed read pairs available after processing Barcode dispatch after trimming: category count percentage: X01Y01 575 80.08% relaxed settings Wed Aug 13 22:48:16 2014 >> done (0.257s) 1000 read pairs processed; of these: 0 ( 0.00%) degenerative read pairs filtered out 5 ( 0.50%) non-junction read pairs filtered out by contaminant control 65 ( 6.50%) empty read pairs filtered out after trimming by size control 930 (93.00%) read pairs available; of these: 908 (97.63%) trimmed read pairs available after processing 22 ( 2.37%) untrimmed read pairs available after processing Barcode dispatch after trimming: category count percentage: X01Y01 767 84.47% And it is very strange, that even having k=14 (pls check settings below), I've got plenty of adaptors of 17-24bp (w/o mm or indels) left in the final untrimmed file... for example from 286 untrimmed sequences (from 1000) 95 contain single JA. So almost all single JA were left in the final untrimmed file. Strong reducing of stringency to -r 0.3 -d 0.2 doesn't affect the result significantly. What else should I change to detect these single JAs? Parameters used: -- 3' end adapter sequence (-x): AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -- paired 3' end adapter sequence (-y): GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG -- junction adapter sequence (-j): CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -- maximum error ratio allowed (-r): 0.100 -- maximum indel error ratio allowed (-d): 0.030 -- minimum read length allowed after trimming (-l): 0 -- file format (-f): Sanger/Illumina 1.8+ FASTQ (auto detected) -- minimum overlap length for junction adapter detection (-k): 14 -- number of concurrent threads (-t): 4 Last edited by MikhailFokin; 08-13-2014 at 06:57 PM. Reason: more questions and details |
![]() |
![]() |
![]() |
#36 | ||
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]() Quote:
For your case, you may use '-u' instead of '-b' to filter out the so-called "undetermined mate-pair reads" (The original fragments are equal to or greater than the read length, meanwhile JA is not found in either read of the pair). It is not recommended to include only those reads that have JA found. BTW: you helped me to found a bug in the program, the statistics for "barcode dispatch after trimming" is not correct! It should be those that have PE adapters detected. I'll update the program and release it after fully testing. Thanks! Quote:
|
||
![]() |
![]() |
![]() |
#37 |
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]()
Thanks for this question! Now the result files can be redirected into a directory using -o. The difference is that a directory name must end with a slash '/'. I'll release the updated version soon.
Last edited by relipmoc; 08-25-2014 at 05:24 AM. Reason: typo |
![]() |
![]() |
![]() |
#38 | |
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#39 | |||
Member
Location: Poland Join Date: Jun 2013
Posts: 37
|
![]()
Hi,
First I used Quote:
Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#40 | |||||
Member
Location: Los Angeles, CA Join Date: Jul 2011
Posts: 58
|
![]() Quote:
Quote:
Nevertheless, the junction adapter is what we want. |
|||||
![]() |
![]() |
![]() |
Thread Tools | |
|
|