SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
paired-end adapter trimming vinay052003 Bioinformatics 16 05-02-2017 07:58 PM
Paired-end Illumina RNA-seq adapter trimming fabrice Bioinformatics 8 01-05-2015 07:48 AM
Illumina paired-end reads. More than 2 adapter sequences. RedLightPanic Illumina/Solexa 8 03-07-2013 12:27 PM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM
PerM is an ultra-fast and sensitive SOLiD reads mapping tool KevinLam Bioinformatics 7 06-18-2010 03:03 AM

Reply
 
Thread Tools
Old 06-04-2014, 10:09 PM   #21
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 453
Default

Hi Replimoc,

thanks for the tip with the barcoded adapters. A very nice feature.

I had the strange results when trimming paired end data using the parameter "-l 20" .
All the read pairs containing forward reads shorter than 20 bases were indeed filtered out, but not all of the read pairs containing reverse reads shorter than 20 bases.

Btw, does skewer search for the reverse complements of the adapters by default (likely not in the paired mode)?
luc is offline   Reply With Quote
Old 06-05-2014, 07:01 AM   #22
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by luc View Post
I had the strange results when trimming paired end data using the parameter "-l 20" .
All the read pairs containing forward reads shorter than 20 bases were indeed filtered out, but not all of the read pairs containing reverse reads shorter than 20 bases.
Could you show us the problematic PE reads in FASTQ format? So that I can figure out what's wrong with the program.

Quote:
Originally Posted by luc View Post
Btw, does skewer search for the reverse complements of the adapters by default (likely not in the paired mode)?
The answer is NO.
relipmoc is offline   Reply With Quote
Old 06-06-2014, 08:35 AM   #23
blsfoxfox
Member
 
Location: auburn

Join Date: Jan 2013
Posts: 12
Default trimmed reads longer than the length!

Hi Relipmoc,

Thank you for this software. I met a problem may need your help.

I am dealing with the Hiseq 2500 data with Nextra Mate Pair and following is the parameters used:

skewer-0.1.114-linux-x86_64 -x GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -y GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -j CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -m mp -k 9 -f sanger -l 30 -L 150 -o skewer_library1_2 1.fastq 2.fastq

-- 3' end adapter sequence (-x): GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
-- paired 3' end adapter sequence (-y): GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
-- junction adapter sequence (-j): CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG
-- maximum error ratio allowed (-r): 0.100
-- maximum indel error ratio allowed (-d): 0.030
-- minimum read length allowed after trimming (-l): 30
-- maximum read length for output (-L): 150
-- file format (-f): Sanger/Illumina 1.8+ FASTQ
-- minimum overlap length for junction adapter detection (-k): 9
Wed Jun 4 15:28:27 2014 >> started

Thu Jun 5 10:40:33 2014 >> done (69126.658s)
208936993 read pairs processed; of these:
93035 ( 0.04%) non-junction read pairs filtered out by contaminant control
29290940 (14.02%) short read pairs filtered out after trimming by size control
6182785 ( 2.96%) empty read pairs filtered out after trimming by size control
173370233 (82.98%) read pairs available; of these:
94951230 (54.77%) trimmed read pairs available after processing
78419003 (45.23%) untrimmed read pairs available after processing

And the Length distribution of reads after trimming provided by skewer shows the maximum reads are 150bp.

However, when I test the result with FastQC, I found there are many reads longer than 150bp ( please see the attachment). I also found those "long" reads by eyeballing in the result file.

I would like to know have you ever experienced something like this? What would be the reason you think?

P.S I have tried this with and without -L 150, and there are longer reads in both cases.

Thanks,
Attached Images
File Type: jpg skewer fastqc.jpg (94.8 KB, 21 views)
blsfoxfox is offline   Reply With Quote
Old 06-08-2014, 09:51 AM   #24
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by blsfoxfox View Post
Hi Relipmoc,

Thank you for this software. I met a problem may need your help.

I am dealing with the Hiseq 2500 data with Nextra Mate Pair and following is the parameters used:

skewer-0.1.114-linux-x86_64 -x GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -y GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -j CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -m mp -k 9 -f sanger -l 30 -L 150 -o skewer_library1_2 1.fastq 2.fastq

-- 3' end adapter sequence (-x): GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
-- paired 3' end adapter sequence (-y): GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
-- junction adapter sequence (-j): CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG
-- maximum error ratio allowed (-r): 0.100
-- maximum indel error ratio allowed (-d): 0.030
-- minimum read length allowed after trimming (-l): 30
-- maximum read length for output (-L): 150
-- file format (-f): Sanger/Illumina 1.8+ FASTQ
-- minimum overlap length for junction adapter detection (-k): 9
Wed Jun 4 15:28:27 2014 >> started

Thu Jun 5 10:40:33 2014 >> done (69126.658s)
208936993 read pairs processed; of these:
93035 ( 0.04%) non-junction read pairs filtered out by contaminant control
29290940 (14.02%) short read pairs filtered out after trimming by size control
6182785 ( 2.96%) empty read pairs filtered out after trimming by size control
173370233 (82.98%) read pairs available; of these:
94951230 (54.77%) trimmed read pairs available after processing
78419003 (45.23%) untrimmed read pairs available after processing

And the Length distribution of reads after trimming provided by skewer shows the maximum reads are 150bp.

However, when I test the result with FastQC, I found there are many reads longer than 150bp ( please see the attachment). I also found those "long" reads by eyeballing in the result file.

I would like to know have you ever experienced something like this? What would be the reason you think?

P.S I have tried this with and without -L 150, and there are longer reads in both cases.

Thanks,
Hi blsfoxfox,

Thank you very much for your feedback! The name of the parameter is misleading. Its actual meaning is the maximum equivalent read length. For example, if the length of trimmed read 1 is 224 and the length of trimmed read 2 is 40, then the equivalent read length is int((224 + 40) / 2) = 132. Therefore, using "-L 150" can not filter out this read pair. But if you use "-L 120", you can filter out this read pair.

For your case, you can try "-L 75". But I guess this is not what you want. we may upgrade skewer to add another parameter for clipping bases after a specified length.
relipmoc is offline   Reply With Quote
Old 06-11-2014, 09:36 PM   #25
blsfoxfox
Member
 
Location: auburn

Join Date: Jan 2013
Posts: 12
Default

Quote:
Originally Posted by relipmoc View Post
Hi blsfoxfox,

Thank you very much for your feedback! The name of the parameter is misleading. Its actual meaning is the maximum equivalent read length. For example, if the length of trimmed read 1 is 224 and the length of trimmed read 2 is 40, then the equivalent read length is int((224 + 40) / 2) = 132. Therefore, using "-L 150" can not filter out this read pair. But if you use "-L 120", you can filter out this read pair.

For your case, you can try "-L 75". But I guess this is not what you want. we may upgrade skewer to add another parameter for clipping bases after a specified length.
Thank you for the response! You're right, I would like to clip bases in each reads file.

Actually, I am more curious about why would skewer produce trimmed reads longer than original one? Then we may avoid getting the long reads and do not need another parameter to deal with it.

By the way, skewer is really fast
blsfoxfox is offline   Reply With Quote
Old 06-13-2014, 08:47 AM   #26
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by blsfoxfox View Post
Thank you for the response! You're right, I would like to clip bases in each reads file.
We will add a parameter for clipping bases in the future versions.

Quote:
Originally Posted by blsfoxfox View Post
Actually, I am more curious about why would skewer produce trimmed reads longer than original one? Then we may avoid getting the long reads and do not need another parameter to deal with it.
Good question! For Nextera long mate-pair (LMP) reads, skewer first treats them as normal paired-end (PE) reads and trims adapters from them. The trimmed reads correspond to fragments that were originally shorter than the read length. If no junction adapter was found within it, then the trimmed read pair is marked as a non-junction read pair which should be removed as it is contaminant.

Otherwise, non-trimmed reads correspond to fragments that are originally equal to or greater than the read length. These read pairs can be classified into three classes. 1) junction adapters are found in the middle of both reads of the pair; 2) junction adapter is found in the middle of one read of the pair; 3) junction adapter is not found in either read of the pair. For class 1), skewer just trims the junction adapters as in single end (SE) cases; for class 2), without loss of generality, suppose read 1 contains junction adapter while read 2 does not contain junction adapter, skewer searches the best overlap between 3' end of read 1 and 5' end of the reverse complement of read 2 , if the overlap is after the junction adapter region of read 1, then the sub-sequences after junction adapter region of read 1 is transferred to its reverse-complemented counterpart and appended to read 2. Then you can find some reads have lengths greater than read length after adapter trimming.

Quote:
Originally Posted by blsfoxfox View Post
By the way, skewer is really fast
Thank you for the praise!

Last edited by relipmoc; 06-14-2014 at 04:00 PM.
relipmoc is offline   Reply With Quote
Old 06-13-2014, 08:56 AM   #27
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default skewer has been accepted as a methodology paper in BMC Bioinformatics

If you find skewer is useful for your study, please kindly cite it in your paper. Thank you!

BMC Bioinformatics.2014, 15:182
DOI: 10.1186/1471-2105-15-182
URL: http://www.biomedcentral.com/1471-2105/15/182

Last edited by relipmoc; 06-13-2014 at 09:00 AM. Reason: :)
relipmoc is offline   Reply With Quote
Old 07-14-2014, 07:40 AM   #28
ug14cxb
Junior Member
 
Location: England

Join Date: Jul 2014
Posts: 2
Default

The source code: https://github.com/relipmoc/skewer is here.
I would have thought it would be on sourceforge but github is way better.
Thanks for sharing this

Last edited by ug14cxb; 07-24-2014 at 02:13 AM.
ug14cxb is offline   Reply With Quote
Old 07-30-2014, 08:25 PM   #29
MikhailFokin
Member
 
Location: NZ

Join Date: Mar 2014
Posts: 15
Question

Quote:
Originally Posted by relipmoc View Post
Good question! For Nextera long mate-pair (LMP) reads, skewer first treats them as normal paired-end (PE) reads and trims adapters from them. The trimmed reads correspond to fragments that were originally shorter than the read length. If no junction adapter was found within it, then the trimmed read pair is marked as a non-junction read pair which should be removed as it is contaminant.

Otherwise, non-trimmed reads correspond to fragments that are originally equal to or greater than the read length. These read pairs can be classified into three classes. 1) junction adapters are found in the middle of both reads of the pair; 2) junction adapter is found in the middle of one read of the pair; 3) junction adapter is not found in either read of the pair. For class 1), skewer just trims the junction adapters as in single end (SE) cases; for class 2), without loss of generality, suppose read 1 contains junction adapter while read 2 does not contain junction adapter, skewer searches the best overlap between 3' end of read 1 and 5' end of the reverse complement of read 2 , if the overlap is after the junction adapter region of read 1, then the sub-sequences after junction adapter region of read 1 is transferred to its reverse-complemented counterpart and appended to read 2. Then you can find some reads have lengths greater than read length after adapter trimming.
Thank you much for this explanation! It is somehow strange why the SEQanswers is the only place where it was explained And there are still few questions about the way how Skewer process Nextera libraries.

1. For the 1st case does "SE trimming" mean removing junction adapter and following sequence till the 5' end as well?
2. For the second case - adaptor in a one read only
(A) what is "the best overlap" - length? mismatches?
(B) what Skewer does is there is no overlap between reads?
3. How to switch of trimming of external adaptors?
4. In the analysis below it is not clear what is "549499 (24.26%) untrimmed read pairs available after processing", how can any untrimmed reads being present in result? not removed to "5968 ( 0.20%) non-junction read pairs filtered out by contaminant control"

skewer -m mp -t 16 -k 30 -l 40 -b S4-R1.fastq S4-R2.fastq

Parameters used:
-- 3' end adapter sequence (-x): AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
-- paired 3' end adapter sequence (-y): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
-- junction adapter sequence (-j): CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG
-- maximum error ratio allowed (-r): 0.100
-- maximum indel error ratio allowed (-d): 0.030
-- minimum read length allowed after trimming (-l): 40
-- file format (-f): Sanger/Illumina 1.8+ FASTQ (auto detected)
-- minimum overlap length for junction adapter detection (-k): 30
-- number of concurrent threads (-t): 16

3016744 read pairs processed; of these:
5968 ( 0.20%) non-junction read pairs filtered out by contaminant control
725620 (24.05%) short read pairs filtered out after trimming by size control
20306 ( 0.67%) empty read pairs filtered out after trimming by size control
2264850 (75.08%) read pairs available; of these:
1715351 (75.74%) trimmed read pairs available after processing
549499 (24.26%) untrimmed read pairs available after processing

Barcode dispatch after trimming:
category count percentage:
X01Y01 1422074 82.90%



Thank you...
MikhailFokin is offline   Reply With Quote
Old 08-04-2014, 10:17 PM   #30
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by MikhailFokin View Post
1. For the 1st case does "SE trimming" mean removing junction adapter and following sequence till the 5' end as well?
"SE trimming" means removing junction adapter and its following sequence at the 3' end.

Quote:
Originally Posted by MikhailFokin View Post
2. For the second case - adaptor in a one read only
(A) what is "the best overlap" - length? mismatches?
(B) what Skewer does is there is no overlap between reads?
(A) There may be several candidate overlap sites, the best overlap is selected according to the scoring scheme presented in the paper. The threshold for the overlap detection is proportional to the -r threshold specified by the user.
(B) no additional action for this case

Quote:
Originally Posted by MikhailFokin View Post
3. How to switch of trimming of external adaptors?
Do you mean to trim the external adapters only? For research purpose, you may use PE mode instead of MP mode. But it is not recommended.

Quote:
Originally Posted by MikhailFokin View Post
4. In the analysis below it is not clear what is "549499 (24.26%) untrimmed read pairs available after processing", how can any untrimmed reads being present in result? not removed to "5968 ( 0.20%) non-junction read pairs filtered out by contaminant control"

skewer -m mp -t 16 -k 30 -l 40 -b S4-R1.fastq S4-R2.fastq

Parameters used:
-- 3' end adapter sequence (-x): AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
-- paired 3' end adapter sequence (-y): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
-- junction adapter sequence (-j): CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG
-- maximum error ratio allowed (-r): 0.100
-- maximum indel error ratio allowed (-d): 0.030
-- minimum read length allowed after trimming (-l): 40
-- file format (-f): Sanger/Illumina 1.8+ FASTQ (auto detected)
-- minimum overlap length for junction adapter detection (-k): 30
-- number of concurrent threads (-t): 16

3016744 read pairs processed; of these:
5968 ( 0.20%) non-junction read pairs filtered out by contaminant control
725620 (24.05%) short read pairs filtered out after trimming by size control
20306 ( 0.67%) empty read pairs filtered out after trimming by size control
2264850 (75.08%) read pairs available; of these:
1715351 (75.74%) trimmed read pairs available after processing
549499 (24.26%) untrimmed read pairs available after processing

Barcode dispatch after trimming:
category count percentage:
X01Y01 1422074 82.90%



Thank you...
It means the 3rd case which is different from the case of non-junction read pairs. For the 3rd case, we can not declare that there is no junction adapter in the fragment. However, for the non-junction read pairs, the fragment length is shorter than the read length, we can declare confidently that they do not contain junction adapters.
relipmoc is offline   Reply With Quote
Old 08-05-2014, 01:43 AM   #31
MikhailFokin
Member
 
Location: NZ

Join Date: Mar 2014
Posts: 15
Question

Thank you much for the reply!

Quote:
Originally Posted by relipmoc View Post
It means the 3rd case which is different from the case of non-junction read pairs. For the 3rd case, we can not declare that there is no junction adapter in the fragment. However, for the non-junction read pairs, the fragment length is shorter than the read length, we can declare confidently that they do not contain junction adapters.
just to be sure. correct me if i am not right. there are two cases of non-detected JA(junction adapter) in a pair
1. 5968 ( 0.20%) non-junction read pairs filtered out by contaminant control
having miseq read of 300pb, do only fragments shorter than 300 bp w/o JA belong to this group? or only that with an overlap between R1 and R2?
2. 549499 (24.26%) untrimmed read pairs available after processing
here are almost all pairs w/o detected JA?
so having the fragment of the size (300)+N1+JA+N2+(300) - is it in this group? do you recommend to exclude this group from the de-novo assembly?
MikhailFokin is offline   Reply With Quote
Old 08-07-2014, 05:37 PM   #32
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by MikhailFokin View Post
just to be sure. correct me if i am not right. there are two cases of non-detected JA(junction adapter) in a pair
1. 5968 ( 0.20%) non-junction read pairs filtered out by contaminant control having miseq read of 300pb, do only fragments shorter than 300 bp w/o JA belong to this group? or only that with an overlap between R1 and R2?
Fragments equal to or shorter than 300 bp without JA belong to this group.

Quote:
Originally Posted by MikhailFokin View Post
2. 549499 (24.26%) untrimmed read pairs available after processing
here are almost all pairs w/o detected JA?
so having the fragment of the size (300)+N1+JA+N2+(300) - is it in this group?
Fragments longer than 300 bp and none of the two paired reads having JA detected belong to this group.
Quote:
Originally Posted by MikhailFokin View Post
do you recommend to exclude this group from the de-novo assembly?
These fragments tend to have long insert sizes. They may have JA or not. For an aggressive de-novo assembly, it is recommended to include this group; otherwise, to be more accurate and conservative, it is recommended to exclude this group.

Note that, having JA is a prerequisite for a correctly constructed MP (Mate Pair) read.
relipmoc is offline   Reply With Quote
Old 08-11-2014, 02:09 AM   #33
trotos
Member
 
Location: Cologne,Germany

Join Date: May 2012
Posts: 12
Default recomended command for paired ends

Hi, replimoc

I'm realy new to all this staff so I would like a guide. I did an MNase-seq experiment I got paired end reads and I got the following fastqc results:
Quote:
Overrepresented sequences
Sequence Count Percentage Possible Source
CGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGA 9634298 3.823442763780559 Illumina Paired End PCR Primer 2 (100% over 38bp)
CAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCG 402848 0.15987322278213426 Illumina Paired End PCR Primer 2 (100% over 43bp)
CGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAA 362862 0.14400448150461417 Illumina Paired End PCR Primer 2 (100% over 49bp)
with per base sequence quality like in index1 file for the first pair

and for the second

Quote:
Overrepresented sequences
Sequence Count Percentage Possible Source
CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATC 3344349 1.3272297559829216 Illumina Single End PCR Primer 1 (100% over 38bp)
CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGGGTAGATC 857909 0.34046756266333217 Illumina Single End PCR Primer 1 (97% over 38bp)
CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGGAAGAGGGTAGATC 275742 0.10943026202535763 Illumina Single End PCR Primer 1 (96% over 30bp)
CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGGAAGAGTGTAGATC 255724 0.10148596995079658 Illumina Single End PCR Primer 1 (97% over 38bp)
with per base sequence quality like in index2 file for the second pair

in both cases fastqc said both per base pair qualities are ok

1. what is the best way to remove those adapters without doing any filtering of the reads, per base quality or anyother.
2. also, in addition how can I chop stuff from the 3' end of both files without again doing any quality control filtering.
Attached Images
File Type: png index1.png (9.3 KB, 5 views)
File Type: png index2.png (10.2 KB, 3 views)
trotos is offline   Reply With Quote
Old 08-12-2014, 11:09 PM   #34
MikhailFokin
Member
 
Location: NZ

Join Date: Mar 2014
Posts: 15
Default

Sorry, one dull question more
Is there any way to redirect result files into directory different from one with input data? Not to stdout... seems that -o option is for base name only?
MikhailFokin is offline   Reply With Quote
Old 08-13-2014, 03:07 AM   #35
MikhailFokin
Member
 
Location: NZ

Join Date: Mar 2014
Posts: 15
Question

And one more Why there is the difference in count of reads in "trimmed read pairs available after processing" and "after trimming barcode dispatch"? Should not be the same? For example - I dont' want to include reads w/o JA into further processing - and put -b option to have them in separate files? Is it right?

I can can easily manipulate number of "trimmed read pairs available after processing" by changing stringency options, but this slightly affects real output... see 2 cases below

default settings
Wed Aug 13 22:42:51 2014 >> done (0.139s)
1000 read pairs processed; of these:
2 ( 0.20%) degenerative read pairs filtered out
2 ( 0.20%) non-junction read pairs filtered out by contaminant control
86 ( 8.60%) short read pairs filtered out after trimming by size control
2 ( 0.20%) empty read pairs filtered out after trimming by size control
908 (90.80%) read pairs available; of these:
718 (79.07%) trimmed read pairs available after processing
190 (20.93%) untrimmed read pairs available after processing

Barcode dispatch after trimming:
category count percentage:

X01Y01 575 80.08%

relaxed settings

Wed Aug 13 22:48:16 2014 >> done (0.257s)
1000 read pairs processed; of these:
0 ( 0.00%) degenerative read pairs filtered out
5 ( 0.50%) non-junction read pairs filtered out by contaminant control
65 ( 6.50%) empty read pairs filtered out after trimming by size control
930 (93.00%) read pairs available; of these:
908 (97.63%) trimmed read pairs available after processing
22 ( 2.37%) untrimmed read pairs available after processing

Barcode dispatch after trimming:
category count percentage:

X01Y01 767 84.47%

And it is very strange, that even having k=14 (pls check settings below), I've got plenty of adaptors of 17-24bp (w/o mm or indels) left in the final untrimmed file... for example from 286 untrimmed sequences (from 1000) 95 contain single JA. So almost all single JA were left in the final untrimmed file. Strong reducing of stringency to -r 0.3 -d 0.2 doesn't affect the result significantly. What else should I change to detect these single JAs?


Parameters used:
-- 3' end adapter sequence (-x): AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
-- paired 3' end adapter sequence (-y): GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG
-- junction adapter sequence (-j): CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG
-- maximum error ratio allowed (-r): 0.100
-- maximum indel error ratio allowed (-d): 0.030
-- minimum read length allowed after trimming (-l): 0
-- file format (-f): Sanger/Illumina 1.8+ FASTQ (auto detected)
-- minimum overlap length for junction adapter detection (-k): 14
-- number of concurrent threads (-t): 4

Last edited by MikhailFokin; 08-13-2014 at 06:57 PM. Reason: more questions and details
MikhailFokin is offline   Reply With Quote
Old 08-24-2014, 02:04 PM   #36
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by MikhailFokin View Post
And one more Why there is the difference in count of reads in "trimmed read pairs available after processing" and "after trimming barcode dispatch"? Should not be the same? For example - I dont' want to include reads w/o JA into further processing - and put -b option to have them in separate files? Is it right?
Sorry for not giving you a quick reply! In fact, JA (Junction Adapter) has nothing to do with barcode dispatch. JA can be regarded as a marker for correctly constructed LMP pairs. However, JA may be undetectable for some correctly constructed LMP pairs. For barcode dispatch, PE adapters instead of JA are used.

For your case, you may use '-u' instead of '-b' to filter out the so-called "undetermined mate-pair reads" (The original fragments are equal to or greater than the read length, meanwhile JA is not found in either read of the pair). It is not recommended to include only those reads that have JA found. BTW: you helped me to found a bug in the program, the statistics for "barcode dispatch after trimming" is not correct! It should be those that have PE adapters detected. I'll update the program and release it after fully testing. Thanks!


Quote:
Originally Posted by MikhailFokin View Post
And it is very strange, that even having k=14 (pls check settings below), I've got plenty of adaptors of 17-24bp (w/o mm or indels) left in the final untrimmed file... for example from 286 untrimmed sequences (from 1000) 95 contain single JA. So almost all single JA were left in the final untrimmed file. Strong reducing of stringency to -r 0.3 -d 0.2 doesn't affect the result significantly. What else should I change to detect these single JAs?
Could you send me the FastQ files that cause this problem?
relipmoc is offline   Reply With Quote
Old 08-24-2014, 02:43 PM   #37
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by MikhailFokin View Post
Sorry, one dull question more
Is there any way to redirect result files into directory different from one with input data? Not to stdout... seems that -o option is for base name only?
Thanks for this question! Now the result files can be redirected into a directory using -o. The difference is that a directory name must end with a slash '/'. I'll release the updated version soon.

Last edited by relipmoc; 08-25-2014 at 05:24 AM. Reason: typo
relipmoc is offline   Reply With Quote
Old 09-27-2014, 05:55 AM   #38
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by blsfoxfox View Post
Actually, I am more curious about why would skewer produce trimmed reads longer than original one? Then we may avoid getting the long reads and do not need another parameter to deal with it.
Now skewer provides an option for it. Please download the updated version from http://sourceforge.net/projects/skewer.
relipmoc is offline   Reply With Quote
Old 09-29-2014, 12:35 AM   #39
Medhat
Member
 
Location: Poland

Join Date: Jun 2013
Posts: 37
Default

Hi,

First I used

Quote:
prinseq-lite-0.20.4/prinseq-lite -fastq R1.fq -fastq2 R2.fq -out_format 3 -out_good good_reads -out_bad bad_reads -phred64 -log 8_.log -graph_data 8_.gd -graph_stats gc,qd,ns,pt,ts,da -min_qual_mean 30 -trim_qual_right 30 -trim_qual_window 2 -trim_qual_type mean -min_len 15 > run.log 2>&1
Then I used your software as:

Quote:
skewer/skewer-0.1.118-linux-x86_64 -x GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -y GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -j CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -m mp -t 20 good_reads_1.fastq good_reads_2.fastq > terminning-run.log 2>&1
After that I used bowtie to see how the alignment will go but;

Quote:
Time loading reference: 00:00:08
Time loading forward index: 00:00:12
Time loading mirror index: 00:00:12
Seeded quality full-index search: 00:07:55
# reads processed: 1866894
# reads with at least one reported alignment: 131512 (7.04%)
# reads that failed to align: 1735382 (92.96%)
Reported 926715 paired-end alignments to 1 output stream(s)
Time searching: 00:08:27
Overall time: 00:08:27
so did I used the parameter in wrong way? or what I shall add or change?
Medhat is offline   Reply With Quote
Old 09-29-2014, 08:48 AM   #40
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by Medhat View Post
Hi,

First I used

Quote:
prinseq-lite-0.20.4/prinseq-lite -fastq R1.fq -fastq2 R2.fq -out_format 3 -out_good good_reads -out_bad bad_reads -phred64 -log 8_.log -graph_data 8_.gd -graph_stats gc,qd,ns,pt,ts,da -min_qual_mean 30 -trim_qual_right 30 -trim_qual_window 2 -trim_qual_type mean -min_len 15 > run.log 2>&1
Then I used your software as:

Quote:
skewer/skewer-0.1.118-linux-x86_64 -x GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -y GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -j CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -m mp -t 20 good_reads_1.fastq good_reads_2.fastq > terminning-run.log 2>&1
After that I used bowtie to see how the alignment will go but;

Quote:
Time loading reference: 00:00:08
Time loading forward index: 00:00:12
Time loading mirror index: 00:00:12
Seeded quality full-index search: 00:07:55
# reads processed: 1866894
# reads with at least one reported alignment: 131512 (7.04%)
# reads that failed to align: 1735382 (92.96%)
Reported 926715 paired-end alignments to 1 output stream(s)
Time searching: 00:08:27
Overall time: 00:08:27
so did I used the parameter in wrong way? or what I shall add or change?
Suggestion: 1) use adapter trimming before quality trimming, because quality trimming may decay the paired information which is useful for adapter detection; 2) in your case of using skewer, you may input the following command:

Quote:
skewer/skewer-0.1.118-linux-x86_64 -m mp -t 20 good_reads_1.fastq good_reads_2.fastq > terminning-run.log 2>&1
Note that the adapter sequences provided by Illumina is not appropriate for adapter trimming. In fact, there's an additional step called "Adenylate the 3' Ends" before "Ligate Paired-End Adapters" in the Paired-End Sample Preparation Protocol. Therefore, an additional leading 'A' should be added before the vendor provided sequence.

Nevertheless, the junction adapter is what we want.
relipmoc is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO