SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastQC: 2 peak per sequence GC content standonn Bioinformatics 7 11-21-2014 10:32 AM
Remove the adapter sequence by fastx_clipper in fastq file Jiafen Bioinformatics 14 08-08-2013 02:16 AM
FastQC result and cleaning sequence Kawaccino Bioinformatics 2 03-19-2013 10:03 AM
remove suffix from fastq sequence ID alexd106 Bioinformatics 7 03-13-2012 08:34 AM
Remove adapter sequence vini SOLiD 1 04-13-2011 10:28 AM

Reply
 
Thread Tools
Old 01-21-2015, 07:57 AM   #1
ClemBuntu
Member
 
Location: Lyon

Join Date: Dec 2014
Posts: 37
Exclamation Trying to remove nextera transposase sequence using cutadapt and fastqc

Hello everybody,

We just launched a nextseq500 run recently.
I analyzed it with Fastqc and I obtained an error at the adapter content plot. It seems like the nextera transposase sequence is too high (see the files).

So I've tried to trim my reads with cutadapt thus I used the following commands because I'm in paired-end :

Python-2.7.9/python ~/cutadapt-1.7.1/bin/cutadapt -q 30 -b CTGTCTCTTATACACATCTGACGCTGCCGACGA --minimum-length 20 --overlap=5 -o tmpl1.1.fastq --paired-output tmpl1.2.fastq myRead_S1_L001_R1_001.fastq myRead_S1_L001_R2_001.fastq

Python-2.7.9/python ~/cutadapt-1.7.1/bin/cutadapt -b CTGTCTCTTATACACATCTCCGAGCCCACGAGAC --minimum-length 20 -q 30 --overlap=5 -o myReads_S1_L001_R2_001.trimmed.fastq --paired-output myReads_S1_L001_R1_001.trimmed.fastq tmpl1.2.fastq tmpl1.1.fastq

And then I checked my results like this :

/FastQC/fastqc myReads_S1_L001_R1_001.trimmed.fastq -t 4 -o FASTQTRY/

Finally I obtained the same plots ! For adapters plots and quality plots as well !

How is that possible ? How can some reads have a quality score less than 30 ?
Attached Images
File Type: png nextera.PNG (75.1 KB, 258 views)
File Type: png quality.PNG (44.5 KB, 164 views)

Last edited by ClemBuntu; 01-21-2015 at 08:04 AM.
ClemBuntu is offline   Reply With Quote
Old 01-21-2015, 08:11 AM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 611
Default

Hmm, have you tried repeating the run with -a instead of -b? When we trim Nextera adapters we run Cutadapt with -a CTGTCTCTTATA which does the job just nicely (in fact I have done this only an hour ago). If you wanted to use Trim Galore (a wrapper around Cutadapt) you can use the version attached (v0.3.8) with the option --nextera. Usage is simply:
Code:
trim_galore --paired --nextera myRead_S1_L001_R1_001.fastq myRead_S1_L001_R2_001.fastq
Attached Files
File Type: zip trim_galore.zip (12.2 KB, 143 views)
fkrueger is offline   Reply With Quote
Old 01-22-2015, 01:39 AM   #3
ClemBuntu
Member
 
Location: Lyon

Join Date: Dec 2014
Posts: 37
Default

Hi,
I've use -b option because it does both 3' or 5' adapters instead of -a doing only 3'

By the way the adapters were fine removed :
Cutadapt output :
Quote:
=== Adapter 1 ===

Sequence: CTGTCTCTTATACACATCTGACGCTGCCGACGA; Type: variable 5'/3'; Length: 33; Trimmed: 2306073 times.

=== Adapter 1 ===

Sequence: CTGTCTCTTATACACATCTCCGAGCCCACGAGAC; Type: variable 5'/3'; Length: 34; Trimmed: 2036017 times.
So the results I obtained with fastq are very strange right ? Did I use it wrong ? Maybe I forgot an option but that seems odd.

I'v tried to use trim_galore, I've change my .bashrc in order to make this software working, like that :
alias cutadapt='/home/myhome/Python-2.7.9/python /home/myhome/cutadapt-1.7.1/bin/cutadapt'
export PATH=$PATH:/home/myhome/FastQC/

And I got this error :

Quote:
>>> Now performing quality (cutoff 30) and adapter trimming in a single pass for the adapter sequence: 'CTGTCTCTTATA' from file myReads_S1_L001_R1_001.fastq <<<
Traceback (most recent call last):
File "/home/myhome/cutadapt-1.7.1/bin//cutadapt", line 9, in ?
from cutadapt.scripts import cutadapt
File "/home/myhome/cutadapt-1.7.1/cutadapt/__init__.py", line 9
except ImportError as e:
^
SyntaxError: invalid syntax
I got the same error when I tried to run cutadapt with an old python (v2.4), but the alias in my .bashrc should fix it...

Last edited by ClemBuntu; 01-22-2015 at 02:08 AM.
ClemBuntu is offline   Reply With Quote
Old 01-22-2015, 02:33 AM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 611
Default

what do you get when you run:
Code:
alias cutadapt='/home/myhome/Python-2.7.9/python /home/myhome/cutadapt-1.7.1/bin/cutadapt'
and then
Code:
cutadapt
You need to get this command to work, or it won't work within Trim Galore. You could also supply '/home/myhome/Python-2.7.9/python /home/myhome/cutadapt-1.7.1/bin/cutadapt' (or rather a version of it that is working) as the path to Cutadapt in one of the first lines of Trim Galore.
fkrueger is offline   Reply With Quote
Old 01-22-2015, 05:43 AM   #5
ClemBuntu
Member
 
Location: Lyon

Join Date: Dec 2014
Posts: 37
Default

Launching 'cutadapt' command or all the pathway give me the same thing.

Anyway, I change the Trim Galore source code as you said and now it works

According to FastQC the nextera adapter was well remove.
Now my 2nd question, I used Trim Galore like that :
Quote:
~/trim_galore --paired -q 30 --nextera myReads_S1_L001_R1_001.fastq myReads_S1_L001_R2_001.fastq -o TrimGaloreTry/
And after I used FastQC on the files I get at the output and I obtained the quality plots I attached.
My question is : with the "-q 30" option all reads should have a phred score greater or equal than 30, but that's not what FastQC show me. Why ?

(Edit : I also used cutadapt "manually" and FastQC gave me the same plots)
Attached Images
File Type: png quality.PNG (51.6 KB, 79 views)
File Type: png quality2.PNG (47.4 KB, 74 views)

Last edited by ClemBuntu; 01-22-2015 at 05:56 AM.
ClemBuntu is offline   Reply With Quote
Old 01-22-2015, 05:58 AM   #6
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 611
Default

Glad to hear that you got the adapter trimming sorted. I suppose the reason why there are still qualities lower than 30 in the file is because the Cutadapt doesn't immediately truncate a sequence as soon as it hits a certain threshold but it uses an algorithm for that:
Code:
-q CUTOFF, --quality-cutoff=CUTOFF
                        Trim low-quality ends from reads before adapter
                        removal. The algorithm is the same as the one used by
                        BWA (Subtract CUTOFF from all qualities; compute
                        partial sums from all indices to the end of the
                        sequence; cut sequence at the index at which the sum
                        is minimal) (default: 0)
So if you only get a single dip in a sequence but all basecalls afterwards are fine again the sequence might pass nevertheless. Does that make sense?
fkrueger is offline   Reply With Quote
Old 01-22-2015, 06:05 AM   #7
ClemBuntu
Member
 
Location: Lyon

Join Date: Dec 2014
Posts: 37
Default

Ok that makes sense thanks.
But do you think it's "normal" that the boxplots extremities are this low ? i.e. up to 14 for 84 -150 bp.
I used to use cutadapt and the boxplot I obtained are way better than this one, but it's my 1st nextseq run so maybe the quality is lower than HiSeq and MiSeq ?
ClemBuntu is offline   Reply With Quote
Old 01-22-2015, 06:09 AM   #8
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 611
Default

Hmm, hard to tell. But reads that long always show a similar decline towards the 3' end, I don't think its much different for MiSeq to be honest. I would just go ahead with your analysis and see how that goes. You could always come back and perform something more stringent afterwards.
fkrueger is offline   Reply With Quote
Old 12-29-2015, 01:29 PM   #9
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 150
Default

Your read qualities are fine. Could be a lot worse. There were some systematic problems with R2 qualities due to NextSeq reagent problems, you can search these forums for more detail.

You are getting this much transposase though because your insert size is too small. You are essentially sequencing all the way to the other end's adaptor. You need larger fragments for sure.

And most people don't use read trimming anymore because most modern aligners do read soft-clipping.
apredeus is offline   Reply With Quote
Old 04-26-2017, 08:34 AM   #10
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Quote:
Originally Posted by fkrueger View Post
Hmm, have you tried repeating the run with -a instead of -b? When we trim Nextera adapters we run Cutadapt with -a CTGTCTCTTATA which does the job just nicely (in fact I have done this only an hour ago). If you wanted to use Trim Galore (a wrapper around Cutadapt) you can use the version attached (v0.3.8) with the option --nextera. Usage is simply:
Code:
trim_galore --paired --nextera myRead_S1_L001_R1_001.fastq myRead_S1_L001_R2_001.fastq
I second this comment - TrimGalore successfully removed the Nextera adapters (that could not be removed by cutadapt).
sagarutturkar is offline   Reply With Quote
Reply

Tags
cutadapt, fastqc, nextera, nextseq, transposase

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO