SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TruSeq adapter sequences? BIG_SNP Illumina/Solexa 35 03-16-2014 06:21 PM
TruSeq adapter sequences kirankm Sample Prep / Library Generation 4 05-10-2012 07:32 AM
SOLiD Adapter Sequences DrDTonge SOLiD 0 08-25-2011 01:14 AM
primer/adapter sequences nikiwilson Sample Prep / Library Generation 2 06-21-2011 02:36 PM
Adapter sequences SeqTruth General 2 04-27-2011 08:05 AM

Reply
 
Thread Tools
Old 12-27-2010, 07:18 PM   #21
naluru
Member
 
Location: Woods Hole, Massachusetts

Join Date: Jul 2010
Posts: 16
Default cutadapt 0.8

Hi Martin,

I got a chance to use the new version and have couple of questions with regard to the output. I am posting the output below and my questions are in the bottom.

OUTPUT:

Maximum error rate: 12.00%
Processed reads: 8847400
Trimmed reads: 5519819 ( 62.4%)
Too short reads: 0 ( 0.0% of processed reads)
Total time: 775.97 s
Time per read: 0.09 ms
=== Adapter 1 ===
Adapter '330201030313112312', length 18, was trimmed 5519819 times.
Histogram of adapter lengths
length count
3 279189
4 472804
5 658292
6 662309
7 516419
8 294151
9 287150
10 245650
11 474506
12 697873
13 112506
14 39956
15 33765
16 35982
17 29564
18 679703

My questions are
1. Does the output file contains only trimmed reads (5519819 reads) or all the reads?
2. Is it possible to write the reads without the adaptor sequence to a file?
3. Do you normally keep the reads without the adaptor sequence for further analysis?

Thank you for all your help.

Happy Holidays,

Neel
naluru is offline   Reply With Quote
Old 01-04-2011, 02:34 AM   #22
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Hello Neel,
thanks for trying out the new version.

Quote:
Originally Posted by naluru View Post
1. Does the output file contains only trimmed reads (5519819 reads) or all the reads?
The output file contains all reads.

Quote:
2. Is it possible to write the reads without the adaptor sequence to a file?
This is currently not possible, but someone has already sent me a patch implementing this feature. I will add it to the next version.

Quote:
3. Do you normally keep the reads without the adaptor sequence for further analysis?
For small RNA sequencing, we did keep those reads initially. This was mainly done out of curiosity and because we wanted to know what else we had sequenced besides small RNA. For the final expression profile, we counted only reads mapping to small RNA anyway so it did not make a difference.
mmartin is offline   Reply With Quote
Old 01-10-2011, 02:31 AM   #23
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default cutadapt v0.9 released

Hi and thanks to all who use this tool and especially those who have given me feedback. I have released cutadapt 0.9, which adds some small, but nice to have features:

* Use --too-short-output and --untrimmed-output to redirect too short or untrimmed reads to a separate file, based on patch by Paul Ryvkin (thanks!).
* With --maximum-length, reads longer than a specified length can be discarded.
* Added the --length-tag option, which helps to fix read lengths in FASTA/Q comment lines (e.g., 'length=123' becomes 'length=58' after trimming) (requested by Paul Ryvkin)
* Added -q/--quality-cutoff option for trimming low-quality ends (uses the same algorithm as BWA, but works also in color space)

cutadapt is now in the Python Package Index. You should be able to simply install it with "easy_install cutadapt" or (if you prefer pip) with "pip install cutadapt".
mmartin is offline   Reply With Quote
Old 03-18-2011, 12:37 PM   #24
naluru
Member
 
Location: Woods Hole, Massachusetts

Join Date: Jul 2010
Posts: 16
Default cutadapt - loss of first base in SOLiD reads

Hi Martin,

I am not quite sure if my issue is related to cutadapt but I just thought I will check with you as I used it to trim the adapter.

I used cutadapt to trim the adapter using cutadapt 0.8 version and then used fastqfilter (you sent me this a while ago) to trim reads with negative qualities.
Then I used BWA to align the reads to the genome.

What I noticed is that lot of reads do not have the first base in them? I am not sure if it happened during adapter trimming or during filtering with fastqfilter. I noticed this because when I use another software (CLC Bio Genomics Workbench), I see the first base. This has always been the first base of mature miRNA.

Have you ever noticed it in your analysis or has anyone reported this issue?
I didn't use any quality control option while aligning the reads with BWA.

Any suggestions or comments will be highly appreciated. If you need any further details, I will be happy to provide them.

Thank you for all your help.

Neel
naluru is offline   Reply With Quote
Old 03-18-2011, 02:18 PM   #25
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Hello Neel,

this is actually a limitation of BWA (which it has "inherited" from MAQ). BWA cannot make use of the primer base (a "T" is in the beginning of all SOLiD reads that I have seen), and it can also not use the first color. The script solid2fastq.pl that is included with MAQ and BWA therefore removes those two first characters of each read, and cutadapt does the same if you use the --bwa option.

We do have some code in our group to "reattach" that missing nucleotide, perhaps I can polish that and make it usable for others. I'll check that next week.

Marcel
mmartin is offline   Reply With Quote
Old 03-19-2011, 07:51 PM   #26
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

Can the tool be used on data that is not generated with SOLiD machines (such as 454 and Illumina)? If yes, how does it compare to alternatives such as SeqTrim, SeqClean, or TagCleaner?
robs is offline   Reply With Quote
Old 03-20-2011, 07:26 AM   #27
naluru
Member
 
Location: Woods Hole, Massachusetts

Join Date: Jul 2010
Posts: 16
Default

Thank you, Marcel. It will be really helpful if you could provide that code. Do you know how SHRiMP and Mosiak assembler deal with it?

Thank you,
Neel
naluru is offline   Reply With Quote
Old 03-21-2011, 03:47 AM   #28
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

robs: cutadapt was developed for SOLiD and 454 data and also works with Illumina reads.

cutadapt is focused on command-line users who have a data file from a second-generation sequencing machine and want to simply remove one or more know adapter sequences from that file. There is probably some overlap in functionality to the tools you mention. TagClean and SeqClean were published after I have implemented cutadapt, and SeqTrim was unknown to me. Also, SeqClean and SeqTrim seem to be primarily for the analysis of Sanger sequencing data. I cannot say how easy it is to get them to work with second-generation data. SeqTrim, for example, seems to not be able to cope with FASTQ files.
mmartin is offline   Reply With Quote
Old 03-21-2011, 08:54 AM   #29
ttnguyen
Member
 
Location: Ireland

Join Date: Mar 2010
Posts: 41
Default

I have no experience with the adapter removal and just wonder if cutadapt is able to predict the adapter sequences from FASTQ?
ttnguyen is offline   Reply With Quote
Old 03-21-2011, 09:05 AM   #30
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Quote:
Originally Posted by ttnguyen View Post
I have no experience with the adapter removal and just wonder if cutadapt is able to predict the adapter sequences from FASTQ?
No, it cannot. I think TagCleaner may have that ability.
mmartin is offline   Reply With Quote
Old 03-22-2011, 04:19 AM   #31
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Quote:
Originally Posted by naluru View Post
It will be really helpful if you could provide that code. Do you know how SHRiMP and Mosiak assembler deal with it?
Dear Neel,
I'm sorry to say that the code I had in mind was only for the old MAQ output files and it won't work with SAM files. I looked at it a bit, but it is not straightforward to get it to work with SAM files. As far as I know, SHRiMP is much better at dealing with colorspace. I cannot say anything about Mosaik.
mmartin is offline   Reply With Quote
Old 03-22-2011, 10:35 AM   #32
nntao
Junior Member
 
Location: Mid-west

Join Date: Jan 2010
Posts: 4
Default consideration of paired ends

Nice job creating this tool!
one suggestion: be nice to be able to work with paired end reads together to improve accuracy especially for reads contain short partial adaptors, since if one read contain adaptor the other end should contain adaptor as well.
nntao is offline   Reply With Quote
Old 03-22-2011, 12:31 PM   #33
naluru
Member
 
Location: Woods Hole, Massachusetts

Join Date: Jul 2010
Posts: 16
Default

Thanks, Marcel. No problem. I will try to work around it.

Neel
naluru is offline   Reply With Quote
Old 03-22-2011, 01:55 PM   #34
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

Quote:
Originally Posted by mmartin View Post
robs: cutadapt was developed for SOLiD and 454 data and also works with Illumina reads.

cutadapt is focused on command-line users who have a data file from a second-generation sequencing machine and want to simply remove one or more know adapter sequences from that file. There is probably some overlap in functionality to the tools you mention. TagClean and SeqClean were published after I have implemented cutadapt, and SeqTrim was unknown to me. Also, SeqClean and SeqTrim seem to be primarily for the analysis of Sanger sequencing data. I cannot say how easy it is to get them to work with second-generation data. SeqTrim, for example, seems to not be able to cope with FASTQ files.
Thanks for the answer. It might be useful to perform some run time comparisons. Also, I think non of the previous tools were designed for pair-end read data and I agree with nntao that this would be a nice feature to see in cutadapt.
robs is offline   Reply With Quote
Old 03-23-2011, 03:12 AM   #35
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

This is interesting. I have only some experience with paired-end data. How would the data look like? Would one expect that the adapter starts at the same position in both reads?
mmartin is offline   Reply With Quote
Old 03-25-2011, 08:16 AM   #36
nntao
Junior Member
 
Location: Mid-west

Join Date: Jan 2010
Posts: 4
Default Trimming adaptors from paired-end reads

It depends on how the paired-end sequencing libraries are prepared, in particular, whether adaptor/primers are introduced in the process. In a general case, you'll have molecuales like the following that are fed to the sequencer (sometimes the adaptor are after the sequencing primers, other times the there could be just sequencing primer):


----adaptor1SAMPLEDNA2rotpada
--->adaptor1SAMPLEDNA2rotpadaremirp # F read obtained with 2 bp adaptor underlined
----adaptor1SAMPLEDNA2rotpada<- # R read with a 2 bp from adaptor too


---primerSAMPLEDNAremirp
-------->SAMPLEDNAremirp # read obtained



When trimming short adaptors down to 2 bp, you may over-trim reads that are with ends like the adaptor/primer (~~ 1/16 chance). But if both Forward and Reverse reads contain the 2 bp adaptor, they are likely from the adaptor because the SAMPLEDNA fragment is short (thus both reads would contain adaptors).

Again, thanks for you hard and nice work!
nntao is offline   Reply With Quote
Old 03-26-2011, 03:57 PM   #37
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Thanks for the explanation ("rotpada" is nice :-) ). I cannot promise that I'll implement this, but I've added it as an enhancement request to the issue tracker. It would also be helpful if someone could provide me with a few actual paired-end reads with adapters. A SRA accession number would also be ok.
mmartin is offline   Reply With Quote
Old 07-20-2011, 08:22 AM   #38
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Hello,

I'd like to announce that cutadapt 0.9.5 has been released. Please see the cutadapt homepage for the release announcement and the changelog. Please also note that the alignment algorithm was improved slightly in this release.
mmartin is offline   Reply With Quote
Old 07-20-2011, 08:45 AM   #39
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Thanks, this has been a wonderful tool by the way.
chadn737 is offline   Reply With Quote
Old 08-15-2011, 02:03 AM   #40
mghita
Member
 
Location: Cambridge

Join Date: Aug 2011
Posts: 10
Default

Hi, I am trying to use cutadapt to trim the adapter in some illumina reads, the adapter is on the first position. The output still contains the adapter. What am I doing wrong?

cutadapt -a N ~/Desktop/pr/reads1.fq -o ~/Desktop/pr/reads.fq

Also, is there any option in bwa that trims the first base of the read (or an adapter)?
mghita is offline   Reply With Quote
Reply

Tags
adapter trimming, color space, microrna, mirna, solid

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO