Seqanswers Leaderboard Ad

**kga1978** · 12-08-2011, 01:51 AM

I tried running these samples through PrinSeq and cutadapt as well with very similar results. This means that the problem isn't specific to Trimmomatic, but I'm still interested to hear if anybody knows what is causing this? I guess it only happens on really low-quality reads?

**tonybolger** · 12-08-2011, 04:03 AM

Originally posted by kga1978 View Post

I have been using Trimmomatic to trim adapters and quality scores. In general, I have been pleased with the performance, but I just ran some low quality samples through and Trimmomatic doesn't appear to be trimming correctly based on quality?

Strange indeed.

Is your data really phred33 as suggested in the command line? Illumina 1.5 is normally phred64.

**kga1978** · 12-08-2011, 11:43 AM

To be perfectly honest, I'm not sure - the quality score thing is doing my head in (damn you, Illumina!). I assumed if it was phred64, my maximum score would be higher than 40, no? I'll try and rerun with phred64 and see what happens.

**tonybolger** · 12-09-2011, 02:29 AM

Originally posted by kga1978 View Post

To be perfectly honest, I'm not sure - the quality score thing is doing my head in (damn you, Illumina!). I assumed if it was phred64, my maximum score would be higher than 40, no? I'll try and rerun with phred64 and see what happens.

If the data really is phred-64 but trimmomatic is told that it is phred33, trimmomatic will interpret each score as 31 higher than it really is - thus not really trimming much since the quality appears 'excellent'. I really should add a warning if the quality scores are outside the expected range, as this is nearly always caused by wrong phred-33/phred-64 selection, and results in either no trimming, or almost everything trimmed, depending on the direction of the mistake.

In any case, you really shouldn't see a significant percentage of the reads with base calls much below the sliding window threshold - e.g. in fastQC, the yellow bars should mostly be above, but the whiskers will tend to be below. On really bad data, you might also see the yellow bars drop in the last few bases, an artefact of 'under-testing' as the sliding window runs off the end of the reads - this is to be expected.

Here's an example of some really low quality data pre/post trimming, using sliding window 4 wide, quality 15.

Untrimmed Forward:

Untrimmed Reverse:

Trimmed Forward Paired:

Trimmed Forward Unpaired:

Trimmed Reverse Paired:

Trimmed Reverse Unpaired:

**kga1978** · 12-09-2011, 03:42 AM

Hi Tony,

Got it. I reran some of the reads and most of them got better with phred64 (I mostly use trimming for adapters though - my aligner takes into consideration quality). However, as you said, really bad reads still fall off dramatically in the end - probably due to the sliding window. So, just to be clear - am I correct in the following?

Casava 1.3 - 1.7: Use Phred64
Casava 1.8+: Use Phred33
454 data (although Trimmomatic can't do this right now): Use Phred33

Thanks for following up.

**tonybolger** · 12-09-2011, 05:14 AM

Originally posted by kga1978 View Post

Got it. I reran some of the reads and most of them got better with phred64 (I mostly use trimming for adapters though - my aligner takes into consideration quality). However, as you said, really bad reads still fall off dramatically in the end - probably due to the sliding window.

How far in do you see the low bases, i.e below the threshold cut-off? Just the last few? Do your new plots look anything like the ones i posted?

Originally posted by kga1978 View Post

So, just to be clear - am I correct in the following?

Casava 1.3 - 1.7: Use Phred64
Casava 1.8+: Use Phred33
454 data (although Trimmomatic can't do this right now): Use Phred33

I believe so - though generally i verify by looking at the scores by eye, and checking here. Occasionally i've seen data in the 'wrong' phred because someone decided to be 'helpful'

**kga1978** · 12-09-2011, 06:15 AM

Actually, it's all good - the one that had a dramatic drop-off in the end, I had forgotten to change to phred64!

This is what the data looks like now:

**aforntacc** · 08-02-2013, 02:36 AM

Originally posted by tonybolger View Post

If the data really is phred-64 but trimmomatic is told that it is phred33, trimmomatic will interpret each score as 31 higher than it really is - thus not really trimming much since the quality appears 'excellent'. I really should add a warning if the quality scores are outside the expected range, as this is nearly always caused by wrong phred-33/phred-64 selection, and results in either no trimming, or almost everything trimmed, depending on the direction of the mistake.

In any case, you really shouldn't see a significant percentage of the reads with base calls much below the sliding window threshold - e.g. in fastQC, the yellow bars should mostly be above, but the whiskers will tend to be below. On really bad data, you might also see the yellow bars drop in the last few bases, an artefact of 'under-testing' as the sliding window runs off the end of the reads - this is to be expected.

Here's an example of some really low quality data pre/post trimming, using sliding window 4 wide, quality 15.

Untrimmed Forward:

Untrimmed Reverse:

Trimmed Forward Paired:

Trimmed Forward Unpaired:

Trimmed Reverse Paired:

Trimmed Reverse Unpaired:

ok, i get this part very well, but my question is please if i want to use tophat for mapping which of these files should i use? (forward paired and reverse paired) what about the unpaired. i am new to trimmomatic and tophat sorry if this seems a stupid question.
thanks in advance

**mastal** · 08-02-2013, 03:37 AM

Trimmomatic quality trimming

I don't think Tophat and Bowtie will let you use paired reads and unpaired reads in the same run, so you would have to do 2 runs, one with the R1_paired.fastq and R2_paired.fastq files, and another run with the files containing the R1_unpaired.fastq and R2_unpaired.fastq reads.

**ebioman** · 08-13-2013, 10:39 PM

How is quality score evaluated ?

Hi
I wondered whether anybody can explain me how the quality scores of the program
are actually calculated.
E.g. for the Lead-Trimming using a often cited value of 3 - obviously that won't be phred score. So what is it ?

**tonybolger** · 08-14-2013, 12:17 AM

Originally posted by ebioman View Post

Hi
I wondered whether anybody can explain me how the quality scores of the program
are actually calculated.
E.g. for the Lead-Trimming using a often cited value of 3 - obviously that won't be phred score. So what is it ?

It's a phred score

Historically, the illumina pipeline occasionally created reads with one (or more rarely two) N base-calls at the start, and more often, a set of trailing B phred quality scores at the end. N-base calls are treated as zero phred score, and B are quality 2, so by trimming both ends for all scores below 3, these artefacts are removed.

**ebioman** · 08-14-2013, 12:20 AM

Thanks that was as short as informative ! I always thought it might be some other internal scores and tried desperately to reveal its calculation

**Laine** · 03-04-2015, 10:22 AM

Originally posted by ebioman View Post

Thanks that was as short as informative ! I always thought it might be some other internal scores and tried desperately to reveal its calculation

I had just the same doubt!! Very informative indeed...

**trimmoMe** · 09-07-2015, 05:36 PM

need help with trimmomatics

Hi everyone,

I am have been having some issues with my command line for trimmomatics,

this is what ive been using:
java -jar /Users/omriadini/Desktop/Trimmomatic-0.33/trimmomatic-0.33.jar SE -threads 4 -trimlog /Users/omriadini/Desktop/156\ L001/L002.trimLog /Volumes/omri\ hard\ drive/Ally\'s\ stuff/Liron\'s\ Project/Raw\ Data/NRF2-1_S17/NRF2-1_S17_L001_R1_001.fastq trimmed.NRF2-1_S17_L001_R1_001.fastq ILLUMINACLIP:/Users/omriadini/Desktop/156\ L001/Truseq_NEBnext_adapter_sequences\ \(1\).txt:2:30:10 HEADCROP:12 MAXINFO:0:40:0.5 MINLEN:36

however, this is the response i get everytime:
ILLUMINACLIP: Using 0 prefix pairs, 48 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Reads: 7877072 Surviving: 0 (0.00%) Dropped: 7877072 (100.00%)
TrimmomaticSE: Completed successfully

I am not sure why it keeps dropping all of my reads,

any ideas?

thanks in advance

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Trimmomatic quality trimming

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News