SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimming illumina 1.8 reads ssharma Bioinformatics 7 07-18-2013 04:33 AM
May I a question about fastqs and the script fq_all2std? cllorens Bioinformatics 4 02-28-2012 01:28 PM
trimming illumina reads empyrean Bioinformatics 5 12-20-2011 11:48 PM
trimming reads in tophat cswarth Bioinformatics 1 12-21-2010 02:00 PM
very short deletion messes up SAMtools SNP calling eyalbd Bioinformatics 8 04-19-2010 08:52 AM

Reply
 
Thread Tools
Old 05-08-2012, 07:52 AM   #1
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default Trimming reads messes up PE fastqs? any ideas please?

Hi,

I am in a bit of a pickle after trimming some adapters from a heavily contaminated PE library.
I think the problem is caused by the trimming (CutAdapt) removing some reads entirely from R1 and R2 fastqs, meaning that there are now some unpaired reads withe th 2 read files now out of sync...

The first problem this lead to was with bwa sampe inferring massive insert sizes, but I can turn off that behaviour by using the -A parameter.

This then leads to some weird validation warnings using picard to convert sam to bam (ignored using VALIDATION_STRINGENCY=LENIENT).
I then get the same lines erroring at each picard step, but the 'leniancy-avoiding-approach' then falls down at GATK IndelRealigner (using -S SILENT) when I get the following message and cannot progress:

Error caching SAM record (null), which is usually caused by malformed SAM/BAM files in which multiple identical copies of a read are present.

Without trimming these reads, they go through the pipeline nicely.
By trimming them, for those samples that dont throw up validation errors, I get a much better alignment, utilising more reads than untrimmed.

Can anyone suggest how I can get around this ?
I want to keep the trimmed reads, and hopefully use the reads that have lost their mates, but dont know how to go about that ... does anyone have any ides?

Thanks

Last edited by swNGS; 05-08-2012 at 07:54 AM.
swNGS is offline   Reply With Quote
Old 05-08-2012, 07:57 AM   #2
lukas1848
Member
 
Location: Germany

Join Date: Jun 2011
Posts: 54
Default

If you use trimmomatic for read trimming, your paired fastq files will be in the right order and you will get seperate files for those reads that lost their mates in the trimming.
lukas1848 is offline   Reply With Quote
Old 05-08-2012, 08:11 AM   #3
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

Trim Galore!, a wrapper around Cutadapt, can also take care of the out-of-sync problem of paired-end trimming in --paired mode, very similar to Trimmomatic.
fkrueger is offline   Reply With Quote
Old 05-08-2012, 09:24 AM   #4
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default

TrimGalore looks very promising... Although I am always a little skeptical of tools with an exclamation mark in its name ... But I am happy to let that prejudice go if it does the job!

I did have a go at Trimmomatic, but could not figure out how to run it (afte. following instructions from the website). Any suggestions on that front would be helpful
swNGS is offline   Reply With Quote
Old 05-08-2012, 12:30 PM   #5
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default

So Trim Galore looks to be doing exactly what I want with one caveat...

I want to be able to trim a different adapter from each read... ie adapter1 from read 1 and adapter2 from read 2. I could do these two steps separately using CutAdapt on R1 then R2, but with Trim_Galore, I cant (as far as I can see) supply a different sequence to trim for each read.

A workaround would be to run cutadapt sepqrately for each read, then hack the 'trimmed' files into the trim_galore script, which is a bit beyond my meagre perl skills

Does anyone have other ideas?

Thanks
swNGS is offline   Reply With Quote
Old 05-08-2012, 01:14 PM   #6
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

I can adapt it so that you can supply two different adapters first thing tomorrow morning if that's not too late?

By the way, did you use the standard Illumina adapters? In this case running the default option should work just fine (as the sequence on both ends starts off the same).
fkrueger is offline   Reply With Quote
Old 05-09-2012, 01:01 AM   #7
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default

that would be truly fantasitc

I didnt use the standard illumina adapters since the library prep is from Haloplex and there is some additional 'stuff' to trim off that is different at each end.

Alos, I might have missed it, but how do I get it to generate a log file?
i tried piping the output to a file but only got this:

Length cut-off for read 1: 35 bp (default)
Length cut-off for read 2: 35 bp (default)
Writing final adapter and quality trimmed output to /home/ngsuser/NGS/DATA/raw_data/3_demultiplexed/s_G1_L001_R1.fastq_split_2_trimmed.fq

Length cut-off for read 1: 35 bp (default)
Length cut-off for read 2: 35 bp (default)
Writing final adapter and quality trimmed output to /home/ngsuser/NGS/DATA/raw_data/3_demultiplexed/s_G1_L001_R2.fastq_split_2_trimmed.fq


I'm running trim_galore as follows:

trim_galore --paired --retain_unpaired --quality 15 --fastqc_args "--outdir $current_directory/logs/CutAdapt/" -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT ${input_fastq_path}/${input_fastq_R1} ${input_fastq_path}/${input_fastq_R2} > $current_directory/logs/CutAdapt/$sample_name'_R1_log'

Many thanks
swNGS is offline   Reply With Quote
Old 05-09-2012, 02:15 AM   #8
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

I have now added an option '-a2/--adapter2' which only affects read 2 of paired-end pairs. The new version of Trim Galore (v0.2.2) is available here, you might have to force a cache refresh of the website if it still shows the older version.

Could you try it out and let me know if something isn't working as expected?
fkrueger is offline   Reply With Quote
Old 05-09-2012, 02:29 AM   #9
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

Forgot to mention that Trim Galore prints a report for each file automatically, you don't have to redirect the output anywhere. The read 2 report contains information about the validation part right at the end as well, i.e. the number of reads that failed to be validated etc. .

And sorry about the exclamation mark, it wasn't planned originally. Do you think a question mark would be more appropriate? :P
fkrueger is offline   Reply With Quote
Old 05-09-2012, 02:54 AM   #10
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default

This seems to work with no issues, thanks very much.

With regard to the log file, trim_galore's default output is to create it in the same folder as the original fastq. Is there any way to output it to anywhere else (I could fairly easily get my script to copy the trimming_report to a different location), but wondered if your script had an 'output-to' option for the report ?

Thanks,

Chris
swNGS is offline   Reply With Quote
Old 05-09-2012, 03:04 AM   #11
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

Hi Chris,

it doesn't have that option right now (maybe in the next release). For the moment you could just locate the following line:

Code:
open (REPORT,'>',$report) ...
and change it to

Code:
open (REPORT,'>',"/your/path/$report")
fkrueger is offline   Reply With Quote
Old 05-09-2012, 04:58 AM   #12
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by swNGS View Post
I did have a go at Trimmomatic, but could not figure out how to run it (afte. following instructions from the website). Any suggestions on that front would be helpful
If you're still interested in trimmomatic, can you contact me (email on the trimmomatic site) and i will try to help get you going.
tonybolger is offline   Reply With Quote
Old 07-30-2012, 03:11 PM   #13
vkpilla
Junior Member
 
Location: Boston, MA

Join Date: Apr 2012
Posts: 2
Default TrimGalore allowing mismatches

Hi,

How would one change the cutadapt parameter for allowing mismatches through the TrimGalore command-line, or is this not possible currently?

I think the cutadapt parameter is -e which specifies the ERROR_RATE allowed in a adapter match; I would like to change that when running TrimGalore.


Thanks!
vkpilla is offline   Reply With Quote
Old 07-31-2012, 01:24 AM   #14
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

Quote:
Originally Posted by vkpilla View Post
Hi,

How would one change the cutadapt parameter for allowing mismatches through the TrimGalore command-line, or is this not possible currently?

I think the cutadapt parameter is -e which specifies the ERROR_RATE allowed in a adapter match; I would like to change that when running TrimGalore.


Thanks!
Hi vkpilla,

I have just added an option '-e ERROR RATE' to Trim Galore which should now work in the way you need it (you can download it from here: http://www.bioinformatics.babraham.a...s/trim_galore/)

Cheers,
Felix

Last edited by fkrueger; 07-31-2012 at 07:21 AM. Reason: fixed broken link
fkrueger is offline   Reply With Quote
Old 07-31-2012, 08:58 AM   #15
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

As a fairly general solution to the "out of sync" fastq issue, the following commands work okay for me. However it requires cdbtools (cdbfasta and cdbyank). Also, these presume you do not have the /1 and /2 suffix after the read name to denote read number. Further that each of the the fields of a fastq record are on a single line.

Code:
grep -hn '^@' R1_in.clipped.fq R2_in.clipped.fq                         | \
   perl -F: -ane 'print if ($F[0]-1)%4==0;'                                   | \
   perl     -ane 'my ($read_name)=m{^\d+:@([^/ ]+)[/ ]}; print $read_name,"\n";'  | \
                                                     sort                         | \
                                                     uniq -d                        > paired.list


cdbfasta -Q R1_in.clipped.fq
cdbyank  R1_in.clipped.cidx < paired.list > R1_in.synced.fq

cdbfasta -Q R2_in.clipped.fq
cdbyank  R2_in.clipped.cidx < paired.list > R2_in.synced.fq
If your fastq file is using /1 and /2 suffixes to the read names to denote read number, this should work:

Code:
grep -hn '^@' R1_in.clipped.fq R2_in.clipped.fq                         | \
   perl -F: -ane 'print if ($F[0]-1)%4==0;'                                   | \
   perl     -ane 'my ($read_name)=m{^\d+:@([^/ ]+)[/ ]}; print $read_name,"\n";'  | \
                                                     sort                       | \
                                                     uniq -d                        > paired.list

sed -e 's%.*%&/1%' paired.list > R1_sync.list
sed -e 's%.*%&/2%' paired.list > R2_sync.list

cdbfasta -Q R1_in.clipped.fq
cdbyank  R1_in.clipped.cidx < R1_sync.list > R1_in.synced.fq

cdbfasta -Q R2_in.clipped.fq
cdbyank  R2_in.clipped.cidx < R2_sync.list > R2_in.synced.fq
I actually like fastx_clipper. Seem less convoluted than Trimmomatic. But it does require re-syncing of PE reads after running.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-01-2012, 07:08 AM   #16
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by pmiguel View Post
I actually like fastx_clipper. Seem less convoluted than Trimmomatic. But it does require re-syncing of PE reads after running.
Does not compute - how is this 'less convoluted than trimmomatic'? Avoiding this sort of thing was one big reason why i wrote trimmomatic. The other being the combination of many different trimming steps into a one-pass tool, rather than making a million and one intermediate files.

Incidentally, why do you consider trimmomatic convoluted?

Sure, it could do with a nice script to hide the 'javaisms' behind (which has been on my todo list forever), and making the adapter fasta is a PITA for new users (which i can help with), but apart from that, i open to any suggestions you might have for improving it.
tonybolger is offline   Reply With Quote
Old 08-01-2012, 08:14 AM   #17
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by tonybolger View Post
Does not compute - how is this 'less convoluted than trimmomatic'? Avoiding this sort of thing was one big reason why i wrote trimmomatic. The other being the combination of many different trimming steps into a one-pass tool, rather than making a million and one intermediate files.

Incidentally, why do you consider trimmomatic convoluted?

Sure, it could do with a nice script to hide the 'javaisms' behind (which has been on my todo list forever), and making the adapter fasta is a PITA for new users (which i can help with), but apart from that, i open to any suggestions you might have for improving it.
trimmomatic is doing a bunch of stuff under the hood while fastx_clipper is doing one thing. Most of the time I like simpler tools better. So I think I would use fastx_clipper, no matter what. But I will admit that is a matter of taste and lots of others will prefer a tool that subsumes a greater segment of the pipeline. No offense intended...

Since you are offering, I would like it better if you explained what the deal is with that "palindromic" trimming thing. I just am not getting when that should be used and when the simple trimming would be best.


--
Phillip
pmiguel is offline   Reply With Quote
Old 03-06-2013, 07:12 PM   #18
MWN
Junior Member
 
Location: CA

Join Date: Aug 2011
Posts: 8
Default

Quote:
Originally Posted by swNGS View Post
that would be truly fantasitc

I didnt use the standard illumina adapters since the library prep is from Haloplex and there is some additional 'stuff' to trim off that is different at each end.

Alos, I might have missed it, but how do I get it to generate a log file?
i tried piping the output to a file but only got this:

Length cut-off for read 1: 35 bp (default)
Length cut-off for read 2: 35 bp (default)
Writing final adapter and quality trimmed output to /home/ngsuser/NGS/DATA/raw_data/3_demultiplexed/s_G1_L001_R1.fastq_split_2_trimmed.fq

Length cut-off for read 1: 35 bp (default)
Length cut-off for read 2: 35 bp (default)
Writing final adapter and quality trimmed output to /home/ngsuser/NGS/DATA/raw_data/3_demultiplexed/s_G1_L001_R2.fastq_split_2_trimmed.fq


I'm running trim_galore as follows:

trim_galore --paired --retain_unpaired --quality 15 --fastqc_args "--outdir $current_directory/logs/CutAdapt/" -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT ${input_fastq_path}/${input_fastq_R1} ${input_fastq_path}/${input_fastq_R2} > $current_directory/logs/CutAdapt/$sample_name'_R1_log'

Many thanks
What additional adapter sequence Haloplex added? Thanks,
MWN is offline   Reply With Quote
Old 03-07-2013, 05:21 AM   #19
d1antho
Member
 
Location: Ireland

Join Date: Mar 2012
Posts: 15
Default

You could also use fastqMCF tool (http://code.google.com/p/ea-utils/wiki/FastqMcf). fastq-mcf sequence quality filter, clipping and processor.

I've used it and its pretty easy to use and maintains the order or PE data, it also removes both mates from the fastq files when one fails the quality check
d1antho is offline   Reply With Quote
Old 07-22-2013, 07:26 AM   #20
jordi
Member
 
Location: València, Spain

Join Date: Apr 2009
Posts: 48
Default

Quote:
Originally Posted by MWN View Post
What additional adapter sequence Haloplex added? Thanks,
As far as I know using HaloPlex there are no 5' adapters. Reagrding 3' adapters there are a pair of seqeunces to be considered in the trimming step prior mapping the sequences:
>1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

Could anyone says if I'm right??
jordi is offline   Reply With Quote
Reply

Tags
sam error trim

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:17 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO