SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TruSeq adapter sequences? BIG_SNP Illumina/Solexa 35 03-16-2014 06:21 PM
TruSeq adapter sequences kirankm Sample Prep / Library Generation 4 05-10-2012 07:32 AM
SOLiD Adapter Sequences DrDTonge SOLiD 0 08-25-2011 01:14 AM
primer/adapter sequences nikiwilson Sample Prep / Library Generation 2 06-21-2011 02:36 PM
Adapter sequences SeqTruth General 2 04-27-2011 08:05 AM

Reply
 
Thread Tools
Old 08-15-2011, 02:44 AM   #41
mghita
Member
 
Location: Cambridge

Join Date: Aug 2011
Posts: 10
Default

I have a problem using cutadapt. The output file still contain the adapter. What am I doing wrong?

cutadapt -a N ~/Desktop/pr/reads1.fq > ~/Desktop/pr/reads.fq


When I use the program, it removes the adapter, but when I save it in an output file, it has the adapter.

Last edited by mghita; 08-15-2011 at 02:47 AM.
mghita is offline   Reply With Quote
Old 08-15-2011, 11:17 PM   #42
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by mghita View Post
I have a problem using cutadapt. The output file still contain the adapter. What am I doing wrong?
cutadapt -a N ~/Desktop/pr/reads1.fq > ~/Desktop/pr/reads.fq
When I use the program, it removes the adapter, but when I save it in an output file, it has the adapter.
"-a N" means your adaptor sequence is a single "N" letter? Is that what you intended?
Torst is offline   Reply With Quote
Old 08-16-2011, 01:28 AM   #43
mghita
Member
 
Location: Cambridge

Join Date: Aug 2011
Posts: 10
Default

Quote:
Originally Posted by Torst View Post
"-a N" means your adaptor sequence is a single "N" letter? Is that what you intended?

Initially, that was what I was intended, but I have discovered that other reads include a series of N's, so now I am interested in removing the reads that have N's inside. cutadapt did not work for that either.

Also, I would like to know if anyone has any method of eliminating the first (few) and last (some) bases from the reads?
mghita is offline   Reply With Quote
Old 08-17-2011, 01:10 AM   #44
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by mghita View Post
Initially, that was what I was intended, but I have discovered that other reads include a series of N's, so now I am interested in removing the reads that have N's inside. cutadapt did not work for that either.
Also, I would like to know if anyone has any method of eliminating the first (few) and last (some) bases from the reads?
You should be able to write a trivial perl/python/C program to remove reads with N in them?

If not, perhaps the FASTX toolkit could be used: http://hannonlab.cshl.edu/fastx_toolkit/
Torst is offline   Reply With Quote
Old 10-06-2011, 03:46 AM   #45
lmilne
Junior Member
 
Location: UK

Join Date: Apr 2009
Posts: 8
Default

Is it possible to have an option added to cutadapt to mask adapter sequences rather than to discard or trim reads?
lmilne is offline   Reply With Quote
Old 10-06-2011, 05:19 AM   #46
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

How would the masking look like? I imagine lowercase vs. uppercase would be an option. Could you please add this as a feature request to the issue tracker on http://cutadapt.googlecode.com/ ?
mmartin is offline   Reply With Quote
Old 10-06-2011, 07:05 AM   #47
lmilne
Junior Member
 
Location: UK

Join Date: Apr 2009
Posts: 8
Default

Masking with 'N' would work for me.
lmilne is offline   Reply With Quote
Old 04-30-2012, 10:32 AM   #48
kbhit
Junior Member
 
Location: Philadelphia

Join Date: Sep 2011
Posts: 9
Default

What's the best way to cutadapt Illumina reads where there are 2 adaptors such as:
5' - adaptor1 - sequence - adaptor2 - 3'

I want to the adaptors on both sides. What I have been doing is running cutadapt twice:
a. Run cut-adapt with -a flag with adaptor2
b. Feed the above output to another cut-adapt with -g flag of adaptor1

Is that the best method to handle cutting both the adaptors?

Many thanks,
Phillipe
kbhit is offline   Reply With Quote
Old 06-09-2012, 01:48 AM   #49
chjiao
Junior Member
 
Location: China

Join Date: Aug 2011
Posts: 2
Default

A very good and valuable tool to cut adaptors from SOLiD sequencing data, but I wish that you could make the 5' adaptors cut for color-space data, since in the situation when Adaptors connected to another Adaptors this function is necessary.
chjiao is offline   Reply With Quote
Old 06-18-2012, 03:03 AM   #50
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Quote:
Originally Posted by chjiao View Post
but I wish that you could make the 5' adaptors cut for color-space data, since in the situation when Adaptors connected to another Adaptors this function is necessary.
I have just released cutadapt 1.1, which adds this feature and is also 30% faster than before. See the release announcement at http://code.google.com/p/cutadapt/
mmartin is offline   Reply With Quote
Old 06-18-2012, 03:05 AM   #51
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Quote:
Originally Posted by kbhit View Post
What's the best way to cutadapt Illumina reads where there are 2 adaptors such as:
5' - adaptor1 - sequence - adaptor2 - 3'

I want to the adaptors on both sides. What I have been doing is running cutadapt twice:
a. Run cut-adapt with -a flag with adaptor2
b. Feed the above output to another cut-adapt with -g flag of adaptor1

Is that the best method to handle cutting both the adaptors?
Yes, that is currently the best way to handle this kind of data. There's also an open feature request regarding some kind of 'paired adapter' mode, but this isn't implemented, yet.
mmartin is offline   Reply With Quote
Old 06-19-2012, 01:21 AM   #52
chjiao
Junior Member
 
Location: China

Join Date: Aug 2011
Posts: 2
Default

Quote:
Originally Posted by mmartin View Post
I have just released cutadapt 1.1, which adds this feature and is also 30% faster than before. See the release announcement at http://code.google.com/p/cutadapt/
Thanks, the new version of cutadap is challening when dealing with color-sapce data.
chjiao is offline   Reply With Quote
Old 08-07-2012, 09:22 PM   #53
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default cutadapt trimming length issue

Hi all,

I just used cutadapt to process some libraries I made and got a result that I don't quite understand. It appears that cutadapt trimmed more than 21 bases (the length of the adapter sequence) from many of the reads. I would think that the longest length that cutadapt could trim from the reads would be 21 bp if given a 21 bp-long adapter. Here is my output:

$ cutadapt -q 20 --quality-base=64 -a TCGTATGCCGTCTTCTGCTTG wt.fastq -o wt_adapttrim.fastq
cutadapt version 1.1
Command line parameters: -q 20 --quality-base=64 -a TCGTATGCCGTCTTCTGCTTG wt.fastq -o wt_adapttrim.fastq
Maximum error rate: 10.00%
Processed reads: 54310589
Trimmed reads: 8911163 ( 16.4%)
Total basepairs: 2111179389 (2111.2 Mbp)
Trimmed basepairs: 100971255 (101.0 Mbp) (4.78% of total)
Too short reads: 0 ( 0.0% of processed reads)
Too long reads: 0 ( 0.0% of processed reads)
Total time: 1918.18 s
Time per read: 0.04 ms

=== Adapter 1 ===

Adapter 'TCGTATGCCGTCTTCTGCTTG', length 21, was trimmed 8911163 times.

Lengths of removed sequences
length count expected
3 1088578 848603.0
4 735568 212150.7
5 672468 53037.7
6 645706 13259.4
7 570424 3314.9
8 507225 828.7
9 449549 207.2
10 387252 51.8
11 320108 12.9
12 298280 3.2
13 277658 0.8
14 277521 0.2
15 244603 0.1
16 239725 0.0
17 202783 0.0
18 205865 0.0
19 202092 0.0
20 233860 0.0
21 208748 0.0
22 136163 0.0
>=23 1006987 0.0


Any answers would be greatly appreciated!

This tool is super and very intuitive, thank you for it.

Last edited by kerhard; 08-07-2012 at 09:23 PM. Reason: no title
kerhard is offline   Reply With Quote
Old 08-07-2012, 11:40 PM   #54
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default

Quote:
Originally Posted by kerhard View Post
Hi all,

I just used cutadapt to process some libraries I made and got a result that I don't quite understand. It appears that cutadapt trimmed more than 21 bases (the length of the adapter sequence) from many of the reads. I would think that the longest length that cutadapt could trim from the reads would be 21 bp if given a 21 bp-long adapter. Here is my output:

$ cutadapt -q 20 --quality-base=64 -a TCGTATGCCGTCTTCTGCTTG wt.fastq -o wt_adapttrim.fastq
cutadapt version 1.1
Command line parameters: -q 20 --quality-base=64 -a TCGTATGCCGTCTTCTGCTTG wt.fastq -o wt_adapttrim.fastq
Maximum error rate: 10.00%
Processed reads: 54310589
Trimmed reads: 8911163 ( 16.4%)
Total basepairs: 2111179389 (2111.2 Mbp)
Trimmed basepairs: 100971255 (101.0 Mbp) (4.78% of total)
Too short reads: 0 ( 0.0% of processed reads)
Too long reads: 0 ( 0.0% of processed reads)
Total time: 1918.18 s
Time per read: 0.04 ms

=== Adapter 1 ===

Adapter 'TCGTATGCCGTCTTCTGCTTG', length 21, was trimmed 8911163 times.

Lengths of removed sequences
length count expected
3 1088578 848603.0
4 735568 212150.7
5 672468 53037.7
6 645706 13259.4
7 570424 3314.9
8 507225 828.7
9 449549 207.2
10 387252 51.8
11 320108 12.9
12 298280 3.2
13 277658 0.8
14 277521 0.2
15 244603 0.1
16 239725 0.0
17 202783 0.0
18 205865 0.0
19 202092 0.0
20 233860 0.0
21 208748 0.0
22 136163 0.0
>=23 1006987 0.0


Any answers would be greatly appreciated!

This tool is super and very intuitive, thank you for it.
Sorry, I wasn't reading the description of the options carefully enough. I think I understand now what the above results indicate. Using the -a option and given an adapter 21 bp in length, a longer length than 21 bp would be trimmed by cutadapt from a given read if the adapter was found in the 5' end of the read, followed by more sequence.

If someone could verify if this is the case, or whether I am missing some other reason, it would be much appreciated.
kerhard is offline   Reply With Quote
Old 08-08-2012, 01:33 AM   #55
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Quote:
Originally Posted by kerhard View Post
Sorry, I wasn't reading the description of the options carefully enough. I think I understand now what the above results indicate. Using the -a option and given an adapter 21 bp in length, a longer length than 21 bp would be trimmed by cutadapt from a given read if the adapter was found in the 5' end of the read, followed by more sequence.

If someone could verify if this is the case, or whether I am missing some other reason, it would be much appreciated.
Yes, that is correct. The column indicates the length of the removed sequence, which includes the bases after the adapter if there are any. It used to indicate the length of the matching adapter in earlier cutadapt versions, but I think that was less helpful.
mmartin is offline   Reply With Quote
Old 08-08-2012, 02:02 AM   #56
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default

Quote:
Originally Posted by mmartin View Post
Yes, that is correct. The column indicates the length of the removed sequence, which includes the bases after the adapter if there are any. It used to indicate the length of the matching adapter in earlier cutadapt versions, but I think that was less helpful.

Thanks for confirmation. I'm glad I tried out cutadapt, as I was assuming my libraries were absent of adapter sequences, which turns out not to be true at all.

I suppose I still don't understand how some of these reads can have sequence AFTER the 3' adapters (eg., adapters found in the middle of the read). Searching for the full adapter sequence in the raw read files by hand, I notice that many times the sequences found after the 3' adapter are a string of A's. For example:

AGTCTADAPTERAAAAAAAAAAAA

TGCGTACGRACTADAPTERAAAAA

Any ideas as to what that means and how that may happen? Are these from the sequencing reactions on the Illumina machines or are these from library constructions?
kerhard is offline   Reply With Quote
Old 08-08-2012, 03:14 AM   #57
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Quote:
Originally Posted by kerhard View Post
Any ideas as to what that means and how that may happen? Are these from the sequencing reactions on the Illumina machines or are these from library constructions?
I cannot really tell because I haven’t seen that before - but I mostly work with SOLiD data, so that doesn’t mean anything. You could look at the qualities that were assigned to the ”A“s. I would guess that they are from the library construction if the qualities are high, but that they are artifacts from the sequencing process if the qualities are low.

On further thought, I would also expect that artifacts from the sequencing process (the basecaller calling a base although there really is none) would lead to a sequence of random nucleotides, but I’m only speculating here.
mmartin is offline   Reply With Quote
Old 08-23-2012, 01:52 AM   #58
rimpi
Junior Member
 
Location: austria

Join Date: Apr 2012
Posts: 2
Default

Can Cutadapt give the results in csfasta format.
I have solid data and I used cutadapt for removing the adapters in colorspace but there is an option of removing the adapters and then the resultant file in fastq format. can't I get it in csfasta format.
Here is the result of cutadaptIs it correct)
Command line parameters: -c -e 0.12 -a 330201030313112312 -x abc: --maq -o output.fastq /home/rimpi/solid/hubert/mergecs/1A.csfasta /home/rimpi/solid/hubert/mergecs/1A_QV.qual
Maximum error rate: 12.00%
Processed reads: 35052447
Trimmed reads: 23518246 ( 67.1%)
Total basepairs: 2593881078 (2593.9 Mbp)
Trimmed basepairs: 881159025 (881.2 Mbp) (33.97% of total)
Too short reads: 0 ( 0.0% of processed reads)
Too long reads: 0 ( 0.0% of processed reads)
Total time: 3386.48 s
Time per read: 0.10 ms

=== Adapter 1 ===

Adapter '330201030313112312', length 18, was trimmed 23518246 times.

Lengths of removed sequences
length count expected
3 294756 547694.5
4 122284 136923.6
5 155337 34230.9
6 196812 8557.7
7 260390 2139.4
8 318801 534.9
9 772603 133.7
10 419888 33.4
11 140579 8.4
12 214331 2.1
13 229820 0.5
14 456111 0.1
15 120289 0.0
16 107822 0.0
17 169000 0.0
18 235124 0.0
19 206157 0.0
>=20 19098142 0.0
rimpi is offline   Reply With Quote
Old 08-23-2012, 02:16 AM   #59
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Sorry, csfasta/qual output is not supported at the moment. You will have to use a separate program to convert colorspace FASTQ to csfasta/qual (I think someone posted one in the forum). Don't use the --maq option if you do that. Which read mapper do you use that does not support colorspace FASTQ?
mmartin is offline   Reply With Quote
Old 08-27-2012, 02:03 AM   #60
minoru_harvest
Junior Member
 
Location: China

Join Date: Aug 2012
Posts: 5
Default

s silly question:
how can i pipeline cutadapt with other program ? cutadapt seems not being able to read from STDIN, so i cannot use "|"?
minoru_harvest is offline   Reply With Quote
Reply

Tags
adapter trimming, color space, microrna, mirna, solid

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:11 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO