SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimmomatic quality trimming kga1978 Bioinformatics 26 11-24-2015 10:14 AM
Trimming Haloplex adapters jordi Bioinformatics 10 01-03-2014 05:41 AM
Trimmomatic Sliding Window vs. Removing Adapters arcolombo698 Bioinformatics 7 12-13-2013 01:57 AM
Trimming Illumina PE sequences with Trimmomatic nicole_01 Bioinformatics 5 08-28-2013 05:12 PM
Using CASAVA versus cutadapt for adapter trimming id0 Bioinformatics 0 08-08-2013 09:39 AM

Reply
 
Thread Tools
Old 02-07-2014, 02:50 PM   #1
MafaldaSF
Junior Member
 
Location: Portugal

Join Date: Jan 2014
Posts: 1
Default Adapters trimming: Cutadapt vs Trimmomatic

Hi,

I'm new here and first of all I would like to thank for this wonderful forum. I will probably talk about something not new here but I've read a lot and after a while I felt lost and decided to post this with my exact questions

So, I have RNA-seq data from Illumina HiSeq but library prep was done with SureSelect Strand-Specific mRNA library prep protocol. So, I have paired-end strand specific data.

While doing the adapter trimming, I faced the following doubts:

1. I don't have the exact adapter sequences from SureSelect but by making a quick analysis with a script that I have that calculates duplicates and analyzing FastQC results, I can tell that they are similar to TruSeq adapters, except for the indexes. But does anyone can confirm this? Can I use the sequences from TruSeq to trim the SureSelect ones? What solutions are there for this?

2. I now that, due to read through, I'm expected to find the indexed adapter in read 1 at the 3' end, and the reverse complement of the universal adapter at read 2, at the 3'end. But, due to adapter dimmer, isn't it possible to find other combinations? For example, if two indexed adapters exist in read 2, isn't it possible to find the reverse complement of the indexed adapter in read 2, at 5' end? From your experience, does this happen often enough that it compensates trimming and potential the loss of good data?

3. What is the best to deal with paired-end data? Actually, I was using the -b option from Cutadapt to remove the adapters, without considering the paired-end information and I'm getting what I think are nice results (but my experience is not great). I'm considering the paired-end options of Cutadapt, but I don't understand very well how Cutadapt uses the paired-end info. Would, for example, Trimmomatic be better because it actually aligns the two reads and uses them to understand where the real sequences end and the adapters begin (palindrome).?

Thanks a lot in advance
Mafalda
MafaldaSF is offline   Reply With Quote
Old 03-15-2014, 03:55 PM   #2
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 141
Default Trimming Illumina Adapter Sequences

Hello.

I have a question with how you used Cutadapt to trim your adapter sequences.

I am interested in trimming the 3' adapter sequences from read1 and need to use the -a option from cutadapt.

however, my question is that I have many adapters used (27 of them roughly)... what is the command line to trim the adapter sequences?
arcolombo698 is offline   Reply With Quote
Old 03-15-2014, 06:18 PM   #3
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

I can't help you with Cutadapt, but if you want to use BBDuk, the command is:

bbduk.sh in1=read1.fq in2=read2.fq out1=trimmed1.fq out2=trimmed2.fq ref=adapters.fa ktrim=r k=25 mink=12 hdist=1

That will trim adapters toward the 3' end, starting with the first instance of any 25-mer in the reference. If there are no 25-mer matches, it will try to match as few as 12bp from the adapter end to the read 3' end. 'hdist=1' allows a hamming distance of 1 (1 mismatch). 'adapters.fa' should be a valid fasta file with all the adapters (so the name of each adapter needs a '>' symbol).

If you only have single-ended reads, you can omit in2 and out2. But with paired reads it's best to trim both reads at the same time or else they may lose sync (if some reads are discarded because they are entirely adapter). And at least for our lab, both reads get the same TruSeq adapters on the 3' end in a normal library.
Brian Bushnell is offline   Reply With Quote
Old 03-15-2014, 08:13 PM   #4
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

I found trimmomatic to be very easy to use for trimming RNA-Seq data.

Anyway try the following for the Kmers or over expressed sequences, add them to a fastA list. Then use Trimmomatic and check with FastQC for the output file from the Trimmomatic. Then run Trimmomatic again and until you are satisfied.

Also I know for sure trimmomatic takes account for both forward and reverse reads, which is exactly what you would like when trimming the reads data. I am not too sure about cutadapt though.
Zapages is offline   Reply With Quote
Old 03-16-2014, 05:17 PM   #5
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 141
Default Cutadapt and Trimmomatic

Yes I use trimmomatic and it is great. However, I am using cutadapt now as a different tool for cross validation.

1) with Trimmomatic, yes I do have the adpater sequences in a fasta file, and I can see that the raw data when using grep contains the adapter, however after using trimmomatic and grep, the sequences are removed successfully.

2) However, I am not sure if cutadapt takes an input file. the cutadapt --help takes the input as SPECIFIC SEQUENCES, so this implies taht I would have to enter in ALL the sequences I wish to have cut, or ?

has anyone used cut adapt to cut out the universal adapter AND all the TruSeq Adapter Indices?

Thank you so much again for your input.
arcolombo698 is offline   Reply With Quote
Old 03-19-2014, 02:04 PM   #6
sprocha
Junior Member
 
Location: spain

Join Date: Mar 2014
Posts: 2
Default adaptor sequences

Hi,
I may have a very stupid question here... new to this...
I just got my reads and checked (fastQC) that among the "overrepresented sequences" are full adaptors (universal and indexed, 100% over 50bp, so I assume it's the complete thing...)

I see that the adaptor sequences that people usually use are shorter, the inner fragments of these... and when I use only those, yes, these "overrepresented" sequences are not there anymore. I used trimmomatic with cmd ILLUMINACLIP:Cadaptor31.fa:1:30:10:8:true giving adaptor file:

>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT # this being the 3' end of the TruSeq Universal adaptor (5'-3')
>PrefixPE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC # this being the 3' end of the TruSeq Indexed adaptor (5'-3')
>PE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT # this being the 3' end of the TruSeq Universal adaptor (5'-3')
>PE/1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
>PE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC # this being the 3' end of the TruSeq Indexed adaptor (5'-3')
>PE/2_rc
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC


what is trimmomatic doing exactly?? when it detects an adaptor, it removes it until its 3' end...? if not I still should be seeing the remaining parts as overrepresented, right??

should I give it instead an adaptor file with the full sequences?

many thanks in advance,
sara
sprocha is offline   Reply With Quote
Old 03-19-2014, 02:17 PM   #7
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 141
Default Response

Hi,
I may have a very stupid question here... new to this...
I just got my reads and checked (fastQC) that among the "overrepresented sequences" are full adaptors (universal and indexed, 100% over 50bp, so I assume it's the complete thing...)

>what is your data? often times you must use BlastN to examine the overrepresented data. Be warned for mRNA the libraries are smaller so this means that you should expect sequences to repeat.


I see that the adaptor sequences that people usually use are shorter, the inner fragments of these... and when I use only those, yes, these "overrepresented" sequences are not there anymore. I used trimmomatic with cmd ILLUMINACLIP:Cadaptor31.fa:1:30:10:8:true giving adaptor file:



>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT # this being the 3' end of the TruSeq Universal adaptor (5'-3')
>PrefixPE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC # this being the 3' end of the TruSeq Indexed adaptor (5'-3')
>PE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT # this being the 3' end of the TruSeq Universal adaptor (5'-3')
>PE/1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
>PE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC # this being the 3' end of the TruSeq Indexed adaptor (5'-3')
>PE/2_rc
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC


what is trimmomatic doing exactly?? when it detects an adaptor, it removes it until its 3' end...? if not I still should be seeing the remaining parts as overrepresented, right??


> I am not an expert but trimmomatic differs from other software because depending on the parameters given such as sliding window and the parameters right after the adapter sequence, it will trim out the listed sequences here in the given adapter file. This means that trimmomatic detects matches of the reverse compliments, it will omit them out. You can check this by using "grep" command after trimmomatic completes and you can see that if you grep the adapters in the raw data file they appear, however, if you grep the adaptors in the trimmed file they are official omitted.


should I give it instead an adaptor file with the full sequences?
>The current behaviour is to retain from the start of the read to the base before the adapter starts.

It is also possible, if you have short adapter sequences and a liberal match threshold, that false positives can be causing a problem.
arcolombo698 is offline   Reply With Quote
Old 03-19-2014, 02:19 PM   #8
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 141
Default

this implies (need a confirmation) that the entire sequence is not needed to be specified because trimmomatic will detect the base of the the adapter , and cut out the remaining junk.

you should read the paper Adapters de mystified, and learn how reads are constructred.
arcolombo698 is offline   Reply With Quote
Old 03-20-2014, 06:16 AM   #9
sprocha
Junior Member
 
Location: spain

Join Date: Mar 2014
Posts: 2
Default

right... I am only using those "illuminaclip" instructions (to trim the primers)... nothing else.. but thanks for the grep suggestion - I will confirm these.

Also, an additional question: trimm (and others I guess) are fw and rev sets of both paired and unpaired reads.

Am I correct of assuming that I should use both paired and unpaired reads for te assembly? (unpaired being reads where only one was kept, pair being dropped)... because it seems to be that some people usually discards the "unpaired" files. I can see why, being few data... and most likely many "borderline" to being discarded... so not very informative... but it has not necessarily to be like that, right? One should use these unpaired as well, or not??

many thanks in advance again,
sprocha is offline   Reply With Quote
Reply

Tags
adapters, cutadapt, rna-seq, trimming

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO