Seqanswers Leaderboard Ad

**dpryan** · 12-13-2013, 07:57 AM

Cutadapt (or trim_galore, which I find to be a nice wrapper) can do that.

**mastal** · 12-13-2013, 08:39 AM

Originally posted by milesgr View Post

I am pretty sure the rest of the 3' end is artifact,

I have not worked with Illumina miRNA reads, but in general the Illumina adapters are close to 60 bp long. After that you do indeed get unpredictable sequences from the flow cell or other 'artifacts', but what you are seeing is probably just more of the adapter, possibly including a multiplex barcode if one was used on your samples.

See this webpage from U Texas at Austin:

Illumina - all flavors - Genomic Sequencing and Analysis Facility User Support Wiki - UT Austin Wikis

https://wikis.utexas.edu/display/GSAF/Illumina+-+all+flavors

**milesgr** · 12-13-2013, 10:10 AM

Thanks for the info - it was very helpful. As a follow-up, I used the following command:
cutadapt -e 0.05 -a TGGAATTCTCGGGTGCCAAGG 001.fastq > 001_CLIPPED.fastq

I found the output was still retaining some adapters. For instance, one sequence left was (underlined)
GAGACCGCCTGGGAATACCGGGTGCTGTAGGCTTTGGAATTCTCGGGTG

This sequence is 49 bases (remember, read lengths were 50 bp), making me think that the trimmer removed the last base and missed the big picture. Another one is here (underlined), where a single base deletion (T between bolded bases) seems to have ruined the trimming procedure here:

CCCCCCACTGCTAACTTTGACTGGCTTTGGAATTCCGGGTGCAAGGAAC

I wanted to leave some error (0.05 error allows for one base error out of 21 total on the adapter sequence), but cutadapt seems to be missing a lot, reducing my miRNA coverage significantly. Any suggestions would be greatly appreciated. Thanks in advance.

**mastal** · 12-13-2013, 11:50 AM

I use trimmomatic, but i find that it doesn't recognize adapters with indels either.

if the value of -e that you are using should allow for 1 mismatch out of 21 bases, it's possible that the adapter sequence you are giving cutadapt is too short, and the score is not high enough for it to recognize adapters in your reads. Maybe you should try allowing a higher error level.

**tonybolger** · 12-14-2013, 02:21 AM

Originally posted by mastal View Post

I use trimmomatic, but i find that it doesn't recognize adapters with indels either.

You are correct - right now, trimmomatic doesn't perform matching with INDELs, since it is relatively rare to find them in the standard illumina datasets, and trimmomatic was very much designed to meet our own requirements rather than cover all possible tasks.

That said, we are currently evaluating what additional alignment (or other) features are needed for more special case applications, so if anyone has any suggestions, please let us know (email on the trimmomatic web page).

Thanks,

Tony.

**mastal** · 12-14-2013, 04:08 AM

Hi Tony,

I am finding the occasional 1-base insertions or deletions in the Illumina adapter sequences. In the case of the insertions, it is sort of a homopolymer effect, and the inserted base is almost always the same as the previous base (on the 5' side) in the sequence.

By the way, I think trimmomatic is great, even if it took me a while to understand how palindrome clipping works.

Best wishes,
Maria

**relipmoc** · 12-15-2013, 01:39 PM

try skewer

if you put all the sequences in test.fasta as below:
>1
TAGTAGGTTGCATAGTTTGGAATTCTCGGGTGCCAAGGAACTCCAG
>2
GAGACCGCCTGGGAATACCGGGTGCTGTAGGCTTTGGAATTCTCGGGTG
>3
CCCCCCACTGCTAACTTTGACTGGCTTTGGAATTCCGGGTGCAAGGAAC

and use the following command:
$ skewer -r 0.2 -d 0.06 -x TGGAATTCTCGGGTGCCAAGG test.fasta -1 -l 16 2>/dev/null

you may get the following output:
>1
TAGTAGGTTGCATAGTT
>2
GAGACCGCCTGGGAATACCGGGTGCTGTAGGCTT
>3
CCCCCCACTGCTAACTTTGACTGGCTT

In your case, the error rate is higher than usual case, so a higher error rate (-r 0.2) and a higher indel error rate (-d 0.06) are chosen.

BTW: indel error occurs in illumina reads, though pretty rare.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 48 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Trimming adapter sequence in the middle of a read

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News