Seqanswers Leaderboard Ad

**lukas1848** · 05-08-2012, 07:57 AM

If you use trimmomatic for read trimming, your paired fastq files will be in the right order and you will get seperate files for those reads that lost their mates in the trimming.

**fkrueger** · 05-08-2012, 08:11 AM

Trim Galore!, a wrapper around Cutadapt, can also take care of the out-of-sync problem of paired-end trimming in --paired mode, very similar to Trimmomatic.

**swNGS** · 05-08-2012, 09:24 AM

TrimGalore looks very promising... Although I am always a little skeptical of tools with an exclamation mark in its name

... But I am happy to let that prejudice go if it does the job!

I did have a go at Trimmomatic, but could not figure out how to run it (afte. following instructions from the website). Any suggestions on that front would be helpful

**swNGS** · 05-08-2012, 12:30 PM

So Trim Galore looks to be doing exactly what I want with one caveat...

I want to be able to trim a different adapter from each read... ie adapter1 from read 1 and adapter2 from read 2. I could do these two steps separately using CutAdapt on R1 then R2, but with Trim_Galore, I cant (as far as I can see) supply a different sequence to trim for each read.

A workaround would be to run cutadapt sepqrately for each read, then hack the 'trimmed' files into the trim_galore script, which is a bit beyond my meagre perl skills

Does anyone have other ideas?

Thanks

**fkrueger** · 05-08-2012, 01:14 PM

I can adapt it so that you can supply two different adapters first thing tomorrow morning if that's not too late?

By the way, did you use the standard Illumina adapters? In this case running the default option should work just fine (as the sequence on both ends starts off the same).

**swNGS** · 05-09-2012, 01:01 AM

that would be truly fantasitc

I didnt use the standard illumina adapters since the library prep is from Haloplex and there is some additional 'stuff' to trim off that is different at each end.

Alos, I might have missed it, but how do I get it to generate a log file?
i tried piping the output to a file but only got this:

Length cut-off for read 1: 35 bp (default)
Length cut-off for read 2: 35 bp (default)
Writing final adapter and quality trimmed output to /home/ngsuser/NGS/DATA/raw_data/3_demultiplexed/s_G1_L001_R1.fastq_split_2_trimmed.fq

Length cut-off for read 1: 35 bp (default)
Length cut-off for read 2: 35 bp (default)
Writing final adapter and quality trimmed output to /home/ngsuser/NGS/DATA/raw_data/3_demultiplexed/s_G1_L001_R2.fastq_split_2_trimmed.fq

I'm running trim_galore as follows:

trim_galore --paired --retain_unpaired --quality 15 --fastqc_args "--outdir $current_directory/logs/CutAdapt/" -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT ${input_fastq_path}/${input_fastq_R1} ${input_fastq_path}/${input_fastq_R2} > $current_directory/logs/CutAdapt/$sample_name'_R1_log'

Many thanks

**fkrueger** · 05-09-2012, 02:15 AM

I have now added an option '-a2/--adapter2' which only affects read 2 of paired-end pairs. The new version of Trim Galore (v0.2.2) is available here, you might have to force a cache refresh of the website if it still shows the older version.

Could you try it out and let me know if something isn't working as expected?

**fkrueger** · 05-09-2012, 02:29 AM

Forgot to mention that Trim Galore prints a report for each file automatically, you don't have to redirect the output anywhere. The read 2 report contains information about the validation part right at the end as well, i.e. the number of reads that failed to be validated etc. .

And sorry about the exclamation mark, it wasn't planned originally. Do you think a question mark would be more appropriate? :P

**swNGS** · 05-09-2012, 02:54 AM

This seems to work with no issues, thanks very much.

With regard to the log file, trim_galore's default output is to create it in the same folder as the original fastq. Is there any way to output it to anywhere else (I could fairly easily get my script to copy the trimming_report to a different location), but wondered if your script had an 'output-to' option for the report ?

Thanks,

Chris

**fkrueger** · 05-09-2012, 03:04 AM

Hi Chris,

it doesn't have that option right now (maybe in the next release). For the moment you could just locate the following line:

Code:

open (REPORT,'>',$report) ...

and change it to

Code:

open (REPORT,'>',"/your/path/$report")

**tonybolger** · 05-09-2012, 04:58 AM

Originally posted by swNGS View Post

I did have a go at Trimmomatic, but could not figure out how to run it (afte. following instructions from the website). Any suggestions on that front would be helpful

If you're still interested in trimmomatic, can you contact me (email on the trimmomatic site) and i will try to help get you going.

**vkpilla** · 07-30-2012, 03:11 PM

TrimGalore allowing mismatches

Hi,

How would one change the cutadapt parameter for allowing mismatches through the TrimGalore command-line, or is this not possible currently?

I think the cutadapt parameter is -e which specifies the ERROR_RATE allowed in a adapter match; I would like to change that when running TrimGalore.

Thanks!

**fkrueger** · 07-31-2012, 01:24 AM

Originally posted by vkpilla View Post

Hi,

How would one change the cutadapt parameter for allowing mismatches through the TrimGalore command-line, or is this not possible currently?

I think the cutadapt parameter is -e which specifies the ERROR_RATE allowed in a adapter match; I would like to change that when running TrimGalore.

Thanks!

Hi vkpilla,

I have just added an option '-e ERROR RATE' to Trim Galore which should now work in the way you need it (you can download it from here: http://www.bioinformatics.babraham.a...s/trim_galore/)

Cheers,
Felix

**pmiguel** · 07-31-2012, 08:58 AM

As a fairly general solution to the "out of sync" fastq issue, the following commands work okay for me. However it requires cdbtools (cdbfasta and cdbyank). Also, these presume you do not have the /1 and /2 suffix after the read name to denote read number. Further that each of the the fields of a fastq record are on a single line.

Code:

grep -hn '^@' R1_in.clipped.fq R2_in.clipped.fq                         | \
   perl -F: -ane 'print if ($F[0]-1)%4==0;'                                   | \
   perl     -ane 'my ($read_name)=m{^\d+:@([^/ ]+)[/ ]}; print $read_name,"\n";'  | \
                                                     sort                         | \
                                                     uniq -d                        > paired.list


cdbfasta -Q R1_in.clipped.fq
cdbyank  R1_in.clipped.cidx < paired.list > R1_in.synced.fq

cdbfasta -Q R2_in.clipped.fq
cdbyank  R2_in.clipped.cidx < paired.list > R2_in.synced.fq

If your fastq file is using /1 and /2 suffixes to the read names to denote read number, this should work:

Code:

grep -hn '^@' R1_in.clipped.fq R2_in.clipped.fq                         | \
   perl -F: -ane 'print if ($F[0]-1)%4==0;'                                   | \
   perl     -ane 'my ($read_name)=m{^\d+:@([^/ ]+)[/ ]}; print $read_name,"\n";'  | \
                                                     sort                       | \
                                                     uniq -d                        > paired.list

sed -e 's%.*%&/1%' paired.list > R1_sync.list
sed -e 's%.*%&/2%' paired.list > R2_sync.list

cdbfasta -Q R1_in.clipped.fq
cdbyank  R1_in.clipped.cidx < R1_sync.list > R1_in.synced.fq

cdbfasta -Q R2_in.clipped.fq
cdbyank  R2_in.clipped.cidx < R2_sync.list > R2_in.synced.fq

I actually like fastx_clipper. Seem less convoluted than Trimmomatic. But it does require re-syncing of PE reads after running.

--
Phillip

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Trimming reads messes up PE fastqs? any ideas please?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News