SEQanswers (
-   RNA Sequencing (
-   -   Trimming adapters with Cutadapt (

Elfangor 07-29-2016 02:29 AM

Trimming adapters with Cutadapt
Hi everyone, I'm having some problems trying to figure out what sequence of adapter should I enter as input in cutadapt or trimmomatic to trim them from my fastqs.

I have a set of fastqs, each of them with a set of reads of 51 bp, comencing with an N and then a series of letters corresponding to the read. I have also the information about the index sequence in each fastq, after demultiplexing, and two sequences determining the primers used. For instance, this is the information about one fastqc I have:


@700470R:449:HVHH7BCXX:2:1101:1406:1948 1:N:0:GTGAAA
@700470R:449:HVHH7BCXX:2:1101:1814:1992 1:N:0:GTGAAA
@700470R:449:HVHH7BCXX:2:1101:2184:1885 1:N:0:GTGAAA
The index sequence is, as determined in the header, GTGAAA. I also have information about the SR primer, which is:


and the Index primer, which is:


Substituting the NNNNNN with the index sequence provided in the header of the corresponding fastq, I would obtain the barcoded adapter used for sequencing, if I'm not wrong.

So here is where I start getting lost. After doing fastqc analysis, I got a list with a bunch of sequences in the overrepresented sequences, corresponding to Illumina Multiplexing PCR primer, as if there were different adapters withing the whole fastq in the same file.

So, here is my question:

¿What sequence should I include in cutadapt program to trim in this case, for instance? Should I include more than one? In my oppinion I should include the Index primer sequence substituting the NNNNNN with the index sequence (barcode), for each fastq, but I'm not sure whether this is correct or not, and whether I should include more sequences or not. Also I'm not sure about what parameters I should include to run cutadapt. I assume that I should add the variables -a and -g to include the adaptor sequence in both sides to be trimmed, or if just adding -a would work. Also wondering about Error Tolerance (-e) in matching letters in adapters (don't know what by default value is included if no specification is added). Also wondering about using Wildcards NNNNN as universal adapter or just creating a list for each barcode used in each sample fastq to be included as adapter variable. Also wondering if using Quality trimming would be usefull, although the average quality base call in each read is very high (over 30). And also wondering if ussing --trim-n option to trim possible flanking Ns in my reads...

As you all see... quite lost I am...

dovah 08-03-2016 09:05 AM


In my experience, using trimmomatic, you can use the information about your platform to remove universal adapters from your reads, no need to know the exact index sequence.

You can find these universal adapters as part of the trimmomatic package, or can be downloaded from here. Note that the adapter file to be specified in your trimming procedure depends on a combination of platform and nature of sequencing reads (paired/single end). You can find Trimmomatic usage info here. It's very clearly explained and quite self-explanatory, but write back here if you still have issues.

ronaldrcutler 08-03-2016 11:22 AM

Also consider doing a quality analysis of your fastq files before doing any trimming or proceeding in the pipeline. Use FastQC for the quality analysis and then use Trim_galore to trim the reads of adaptors in addition to the general quality improvement of the reads.

kerplunk412 08-10-2016 11:26 AM

I think this is what you need:

cutadapt -a AGATCGGAAGAG -o YOUR_FILE.trim1.fq --minimum-length 15 YOUR_FILE.fastq.gz

You don't need to put in the index sequence, as cutadapt will remove anything 3' of the adapter sequence, unless you specify otherwise. The minimum length command will throw out any reads less than the specified value. I think the default allowed error rate is 0.1, which should be fine.

It does look like you can use the --trim-n option to remove the first N.

It probably isn't necessary to quality trim, although you may want to quality filter before the adapter trimming. Also probably no need for the -g command, unless this was a particular kind of library where you expect to see adapter sequence at the 5' end of the read.

All times are GMT -8. The time now is 02:55 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.