Seqanswers Leaderboard Ad

**mmartin** · 08-27-2012, 02:10 AM

Quoting from the first lines that appear when you run "cutadapt --help":

Usage: cutadapt [options] <FASTA/FASTQ FILE> [<QUALITY FILE>]

Reads a FASTA or FASTQ file, finds and removes adapters,
and writes the changed sequence to standard output.
When finished, statistics are printed to standard error.

Use a dash "-" as file name to read from standard input
(FASTA/FASTQ is autodetected).

**flobpf** · 09-13-2012, 12:19 PM

cutadapt for solid reads

Hi,

I'm using cutadapt tool v1.1 to trim adapters from my SOLiD colorspace reads. The tool does trim the adapters out, however, I haven't been able to get my reads back in colorspace. cutadapt has been converting them to basespace by default. Wonder if I'm missing something? Have you seen this before?

My command line is as follows:

cutadapt-1.1/bin/cutadapt --colorspace -e 0.12 -a 2130003001020221302222 -a 201122001 -a 2132113130301020331 -x out_ --bwa -o out.fastq --untrimmed-output=out.fastq.untrimmed --double-encode filename.csfasta filename.qual

I have tried changing output to --maq but still no effect.

I was also wondering if the --double-encode option is required. I'm an absolute beginner to SOLiD reads (mostly I do Illumina), so I ask - aren't all recent SOLiD reads double-encoded? I may be wrong about this, though.

Other than this, I have found this tool perfect for my purposes!
Thanks!

**mmartin** · 09-14-2012, 12:53 AM

Hello,

I’ll answer your questions below, but I have also updated the cutadapt README section on colorspace reads. Perhaps that also helps a bit.

Originally posted by flobpf View Post

Hi,
I'm using cutadapt tool v1.1 to trim adapters from my SOLiD colorspace reads. The tool does trim the adapters out, however, I haven't been able to get my reads back in colorspace.

I'm not sure what kind of output you would like to have. If you want a pair of csfasta/qual files, then this has been answered a few messages ago (in short: it’s currently not supported). If you want FASTQ files that contain colorspace reads in which the colors are encoded as numbers 0, 1, 2, 3, then this is possible, simply don’t use --maq, --bwa or --double-encode (see also the README file I linked to above).

cutadapt has been converting them to basespace by default. Wonder if I'm missing something? Have you seen this before?

Cutadapt never converts reads to basespace since that should be done after read mapping.

My command line is as follows:

cutadapt-1.1/bin/cutadapt --colorspace -e 0.12 -a 2130003001020221302222 -a 201122001 -a 2132113130301020331 -x out_ --bwa -o out.fastq --untrimmed-output=out.fastq.untrimmed --double-encode filename.csfasta filename.qual

I have tried changing output to --maq but still no effect.

Please read through the descriptions of the --maq, --bwa and --double-encode options that are shown when running cutadapt -h. In summary: --maq is the same as --bwa. Both options imply --colorspace and --double-encode - you can simply leave them out.

I was also wondering if the --double-encode option is required. I'm an absolute beginner to SOLiD reads (mostly I do Illumina), so I ask - aren't all recent SOLiD reads double-encoded? I may be wrong about this, though.

Hm, I haven’t seen recent SOLiD files for a few months, but I guess they are not double-encoded. I guess the term is confusing. I’ll just copy the text from the README section I have just written. I hope that helps.

Double-encoding, BWA and MAQ

The read mappers MAQ and BWA (and possibly others) need their colorspace input reads to be in a so-called "double encoding". This simply means that they cannot deal with the characters 0, 1, 2, 3 in the reads, but require that the letters A, C, G, T be used for colors. For example, the colorspace sequence 0011321 would be AACCTGC in double-encoded form. This is not the same as conversion to basespace! The read is still in colorspace, only letters are used instead of digits. If that sounds confusing, that is because it is.

Note that MAQ is unmaintained and should not be used in new projects.

BWA’s colorspace support was dropped in versions more recent than 0.5.9, but that version works well.

When you want to trim reads that will be mapped with BWA or MAQ, you can use the --bwa option, which enables colorspace mode (-c), double-encoding (-d) and primer trimming (-t), all of which are required for BWA, in addition to some other useful options.

There is also the --maq option, which is simply another name for the --bwa option.

**flobpf** · 09-14-2012, 07:11 AM

If you want FASTQ files that contain colorspace reads in which the colors are encoded as numbers 0, 1, 2, 3, then this is possible, simply don’t use --maq, --bwa or --double-encode

That helps. Thanks

For example, the colorspace sequence 0011321 would be AACCTGC in double-encoded form. This is not the same as conversion to basespace! The read is still in colorspace, only letters are used instead of digits.

OK. Thats better. I had misunderstood what double-encoding is.

Thanks for your help!

**flobpf** · 10-18-2012, 06:55 AM

Hi Marcel,

As I said above, cutadapt worked well to trim the adapters. I used the following command line

Code:

cutadapt-1.1/bin/cutadapt --colorspace --trim-primer -e 0.12 -a 2130003001020221302222 -a 201122001 -a 2132113130301020331 -g 2130003001020221302222 -g 201122001 -g 2132113130301020331 -m 20 -q 20 -O 5 -x proc_ -o INPFILE.trim.fastq --untrimmed-output=INPFILE.untrim.fastq INPFILE.csfasta INPFILE.qual > INPFILE.stats

Cutadapt produced the two expected output files. However, when I run BFAST using these files, it gives me the following errors:
With trimmed file

*** glibc detected *** bfast: malloc(): memory corruption: 0x00000000023f7e80 **

With untrimmed file

bfast: ../bfast/RGMatch.c:154: RGMatchPrint: Assertion `m->qualLength > 0' failed.

BFAST ran fine with the original, "un-cutadapted" FASTQ file created by merging CSFASTA+QUAL using solid2fastq.pl in BFAST. No errors there.

My BFAST command is

Code:

bfast easyalign -f ../_MOUSEGENOME/Mus_musculus.GRCm38.68.dna_rm.toplevel.fa -r INPFILE.trim.fastq -A 1 -n 4 > INPFILE.trim.fastq.easyalign

Have you seen this before? Anything wrong with my cutadapt command line? I'd appreciate any suggestions on how to fix this problem.

Thanks

**mmartin** · 10-18-2012, 07:37 AM

I haven't used BFAST in a while, but I think it requires that the primer base is still in the read. Could you try leaving out the --trim-primer option?

**Braganca** · 11-06-2012, 05:07 AM

Originally posted by flobpf View Post

Hi Marcel,

As I said above, cutadapt worked well to trim the adapters. I used the following command line

Code:

cutadapt-1.1/bin/cutadapt --colorspace --trim-primer -e 0.12 -a 2130003001020221302222 -a 201122001 -a 2132113130301020331 -g 2130003001020221302222 -g 201122001 -g 2132113130301020331 -m 20 -q 20 -O 5 -x proc_ -o INPFILE.trim.fastq --untrimmed-output=INPFILE.untrim.fastq INPFILE.csfasta INPFILE.qual > INPFILE.stats

Cutadapt produced the two expected output files. However, when I run BFAST using these files, it gives me the following errors:
With trimmed file

With untrimmed file

BFAST ran fine with the original, "un-cutadapted" FASTQ file created by merging CSFASTA+QUAL using solid2fastq.pl in BFAST. No errors there.

My BFAST command is

Code:

bfast easyalign -f ../_MOUSEGENOME/Mus_musculus.GRCm38.68.dna_rm.toplevel.fa -r INPFILE.trim.fastq -A 1 -n 4 > INPFILE.trim.fastq.easyalign

Have you seen this before? Anything wrong with my cutadapt command line? I'd appreciate any suggestions on how to fix this problem.

Thanks

Hi Flobpf,

I'm having trouble determining which adapter sequences to use, as the SOLiD preparation guide is not clear on this. Furthermore, those oligos that appear are in basespace.

Can you please inform me how you obtained the oligo sequences, and how you managed to get them in colourspace?

I have SOLiD 4 data, but I was not involved in preparing or sequencing the data.

Regards,
Craig

**mmartin** · 11-06-2012, 05:59 AM

Originally posted by Braganca View Post

Can you please inform me how you obtained the oligo sequences, and how you managed to get them in colourspace?

I don't know the oligo sequences, but if you want to use them with cutadapt, then there is no need to convert them to colorspace: Since version 1.1, you can give the adapter in basespace and cutadapt converts it for you.

**Braganca** · 11-07-2012, 01:09 AM

Originally posted by mmartin View Post

I don't know the oligo sequences, but if you want to use them with cutadapt, then there is no need to convert them to colorspace: Since version 1.1, you can give the adapter in basespace and cutadapt converts it for you.

Thanks Martin,

That helped, I managed to locate the oligo sequences

Regards,
Craig

**ELoomis** · 11-28-2012, 11:09 AM

Filtering out reads lacking adapters

I'm trying to use cutadapt to remove an adapter sequence from my reads but I'd also like to discard any sequences that do not have an adapter (the adapter was added in an enrichment step prior to library construction, so reads lacking the adapter could be artifacts or contaminants). The --discard option seems to do the opposite of that. Would that be easy to change in cutadapt, is there a different tool that does something like this, or should I use a more roundabout option (based on identifying the reads not discarded by not having an adapter)?

Thanks,
Erick

**mmartin** · 11-29-2012, 05:19 AM

There was a patch by James Casbon, which implements such an option for cutadapt. I have now integrated his work into cutadapt. That is, the most recent version of cutadapt, which you can get from https://github.com/marcelm/cutadapt , has a "--discard-untrimmed" option.

**ELoomis** · 11-29-2012, 10:17 AM

Thanks! I downloaded and installed v1.2 and get an error when I try --discard-untrimmed. It doesn't seem to recognize that option, but I do see "--untrimmed-output=FILE" which accomplishes the same goal (and is actually even better).

**mmartin** · 11-29-2012, 10:57 AM

Great that worked for you! For those who really want the discard-untrimmed option, you would need to get the the version from GitHub (directly from version control). cutadapt 1.2rc2, available on Google code, does not have the option. I'll make a release soon to remedy this.

**mmartin** · 11-30-2012, 03:01 AM

Hello, I've just released cutadapt 1.2. As always, get it from http://code.google.com/p/cutadapt/ or simply via "easy_install cutadapt". This is a copy of the list of changes:

At least 25% faster processing of .csfasta/.qual files due to faster parser.
Between 10% and 30% faster writing of gzip-compressed output files.
Support 5' adapters in color space, even when no primer trimming is requested.
The "--info-file" option has been added. Use this to write further information about the found adapters in each read to a separate file.
Named adapters are now possible. Use "-a My_Adapter=ACCGTA" to assign the name "My_adapter" to an adapter.
Improved the alignment algorithm for better poly-A trimming when there are sequencing errors. Previously, not the longest possible poly-A tail would be trimmed.
James Casbon contributed the --discard-untrimmed option.

**carmeyeii** · 12-13-2012, 04:52 PM

Hello,

Does cutadapt have the option to simply trim an n number of bases from the 5' or 3' end, as specified by the user?

I do not wish to remove adaptors, but to remove bases from the reads due to quality concerns. Is there any tool, if cutadapt is not suitable, that will do it for both .csfasta and .qual files?

Thanks a lot,

carmen

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 26 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News