SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Align sRNA reads from two libraries bornanarchist Bioinformatics 0 05-28-2013 07:36 PM
Trimmomatic Help Single Reads Error Hilary April Smith Bioinformatics 3 10-17-2012 06:05 AM
Hello and a Question: 50 or 100 bp reads? kerhard Introductions 0 02-11-2011 01:13 PM
Bowtie and reads that failed to align: (100.00%) michy Bioinformatics 7 02-08-2011 06:42 PM
Duplicated bases in 100 bp GA2 reads wraithnot Illumina/Solexa 4 10-26-2010 01:04 PM

Reply
 
Thread Tools
Old 01-12-2014, 07:25 PM   #1
BADE
Member
 
Location: Boston

Join Date: Jan 2014
Posts: 13
Default Trimmomatic dropping 100% sRNA reads

Hi All,

This is my first post on SEQanswers and I am hoping some help from senior members . I am analyzing sRNA single end sequencing data and using Trimmomatic for trimming adapters. The problem is that after trimming process all the reads are getting dropped. Here is the summary for one file:

Quote:
TrimmomaticSE: Started with arguments: -threads 52 -phred64 -trimlog C2.log.txt C2.fastq.gz C2.Processed ILLUMINACLIP:./Trimmomatic-0.32/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 CROP:24 MINLEN:21
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 39733090 Surviving: 0 (0.00%) Dropped: 39733090 (100.00%)
TrimmomaticSE: Completed successfully
As you see, all the trimmed reads have been dropped. I believe that this because of threshold values used - palindrome clip threshold, leading, trailing and sliding window. Should I be using palindrome clip thershold? I would really appreciate it if you can help me with this problem.

Thanks

BADE
BADE is offline   Reply With Quote
Old 01-13-2014, 03:41 AM   #2
TiborNagy
Senior Member
 
Location: Budapest

Join Date: Mar 2010
Posts: 329
Default

I guess the problem is you crop parameter. Your adaptor sequence is 34 nucleotide length, but the crop parameter is 24.
TiborNagy is offline   Reply With Quote
Old 01-13-2014, 04:08 AM   #3
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

The palindrome clip parameter will not be a problem, as you are not doing any palindrome trimming (you have SE reads, not PE, and your trimmomatic output says 'Using 0 prefix pairs').

How long are your reads? Your CROP:24 and MINLEN:21 is probably the reason
none of your reads are surviving.



What Illumina version are your reads? The current Illumina versions use the -phred33 quality encoding. See

http://en.wikipedia.org/wiki/FASTQ_format
mastal is offline   Reply With Quote
Old 01-13-2014, 07:45 AM   #4
BADE
Member
 
Location: Boston

Join Date: Jan 2014
Posts: 13
Default

Hi Mastal

Quote:
Originally Posted by mastal View Post
How long are your reads? Your CROP:24 and MINLEN:21 is probably the reason
none of your reads are surviving.
The reads are 50nt long. I selected Minlength so that reads with length >= 21 after trimming are retained. That is because a proportion of miRNAs are of length 21nt. I chop at length 24 after trimming because sRNA longer than 24 are basically not miRNAs and not of particular interet to me. But in both cases Minlength and Crop I am assuming that steps are performed on trimmed read. Am I wrong?

Quote:
What Illumina version are your reads? The current Illumina versions use the -phred33 quality encoding.
FastQC mentiones Illumina version 1.9 which uses phred33 as per the wiki link you sent.

These are the results after modifying the quality encoding to phred33 and with different MinLength and Crop options:

Quote:
$ ./trimmomatic.sh
TrimmomaticSE: Started with arguments: -threads 52 -phred33 C1.fastq.gz C1.Processed ILLUMINACLIP:./Trimmomatic-0.32/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 CROP:24 MINLEN:21
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 37243929 Surviving: 36503574 (98.01%) Dropped: 740355 (1.99%)
TrimmomaticSE: Completed successfully

TrimmomaticSE: Started with arguments: -threads 52 -phred33 C1fastq.gz C1.ProcessedDefault ILLUMINACLIP:./Trimmomatic-0.32/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 37243929 Surviving: 24107314 (64.73%) Dropped: 13136615 (35.27%)
TrimmomaticSE: Completed successfully
Now we have reads surviving and the number of survived reads is high with CROP:24 and MINLEN:21. Again I am assuming that both of these parameters are realted to trimmed read and not adapter?

Best

Bade
BADE is offline   Reply With Quote
Old 01-13-2014, 09:22 AM   #5
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by BADE View Post
FastQC mentiones Illumina version 1.9 which uses phred33 as per the wiki link you sent.
Using phred64 on phred33 data will usually result in little or no output, since it lowers quality scores by 31 across the board, dropping them below any reasonable threshold. Since this is a common problem, the most recent version of trimmomatic auto-detects the quality score.

Quote:
Originally Posted by BADE View Post
Now we have reads surviving and the number of survived reads is high with CROP:24 and MINLEN:21. Again I am assuming that both of these parameters are realted to trimmed read and not adapter?
CROP:24 cut the read after the 24th base, but will not cause a read to be dropped.

MINLEN:21 will drop all reads shorter than 21 bases, but will not shorten or otherwise modify the reads.

To answer the question in the original post, trimmomatic steps are applied in the order specified to the read (or pair if you were using pairs). The first step gets the whole read or pair, and subsequent steps get to work on the part which survived previous steps.

Hope this helps,

Tony.
tonybolger is offline   Reply With Quote
Old 01-13-2014, 09:28 AM   #6
BADE
Member
 
Location: Boston

Join Date: Jan 2014
Posts: 13
Default

Hi Tony (and All),

Thanks for confirming. I think I am on right track than. Many thanks.

BADE
BADE is offline   Reply With Quote
Old 01-13-2014, 09:55 PM   #7
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Trimmomatic is widely accepted because it is written in Java and can be easily run on various platforms. But if you pursue simplicity and efficiency, you may try skewer which also has good performance in small RNA adapter trimming.

For your case, you may input the following command:
$ skewer --min 21 --max 24 -t 8 -x TruSeq3-SE.fa C2.fastq.gz

where content of TruSeq3-SE.fa is:
>TruSeq3_IndexedAdapter
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>TruSeq3_UniversalAdapter
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

Last edited by relipmoc; 01-14-2014 at 07:39 AM.
relipmoc is offline   Reply With Quote
Old 03-02-2016, 09:47 AM   #8
newfie
Junior Member
 
Location: Corner Brook

Join Date: Feb 2016
Posts: 8
Default

Hi all,

This is my first post in seqanswers. I am also working with small RNA and i found this thread more useful in understanding the different parameters used in Trimmomatic. I have just now started learning things, so i have few questions regarding the parameter used by Bade (the user who opened this thread). BADE used LEADING:3 TRAILING:3.

Does it mean that the program cut 3 bases off the start and end of the read if it falls below the threshold quality?

If so, why just it just needs to be 3? Is it an optimum value? Is there a rationale in choosing 3?

When i look at my quality control report i can see the per base sequence quality drops towards the end of the read. Does it mean i have to focus only on the end of the read. not the start of the read?

Also, what is the difference between slidingwindow and leading? Both seems one and same for me.

Sorry if i am asking too many questions at the same time and sorry if my questions are stupid. I am just learning.

Thanks in advance for answering
newfie is offline   Reply With Quote
Old 03-02-2016, 02:35 PM   #9
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Hi,

I think you should probably read the trimmomatic manual.

http://www.usadellab.org/cms/uploads...nual_V0.32.pdf

The parameter used with LEADING: and TRAILING: refers to the base quality score, not the number of bases, it was designed to remove Ns from the 3' or 5' ends of the reads.
mastal is offline   Reply With Quote
Reply

Tags
illumina sequencing, srna, trimming, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO