Seqanswers Leaderboard Ad

**Adrian_H** · 07-23-2010, 08:22 AM

fastx_clipper seemed to work fine for me. Are you sure you're running it with the parameters you want? (I did have some issues with clipping paired-end reads, where fastx_clipper would blow away one of the sequences if it was too short after clipping, leading to unmatched pairs, and had to modify fastx_clipper to leave sequences in even if they were completely clipped)

**maubp** · 07-23-2010, 08:54 AM

Personally I use Biopython with some simple adaptor matching.

There ought to be a tool in EMBOSS to do this too...

**raela** · 07-23-2010, 09:15 AM

With some of the tools in fastx_toolkit, I've found redirection and flags act differently. Most give the option of using -i infile -o outfile or `cat inline | tool > outfile` - if you're using -i -o, try cat in | > out and see if that works better. I forget which one usually gives me issues with it.

**bioinfosm** · 07-23-2010, 09:49 AM

did you try novoaligns adapter trimming? That ought to help

**Gianza** · 07-23-2010, 10:05 AM

I'm using this command.
I want to clarify I'm not a bioinformatician.

fastx_clipper -a AAGCAGTGGTATCAACGCAGAGTACGCGGG -i 1M_1.txt -M 20 -n -o output.txt

For Adrian_H: could you please give me your modified version. You just make me remeber the aligner I will use (Mosaik) doesn't accept missing paired-end.

For raela: Sorry I didn't get your point with cat and pipe..could you please write how should I type it? Thanks!

However; I found that I have more than one trouble: while blowing away tons of reads with no reason (other than a regular adaptor match, I guess), it also leaves tons of other adapters in the output sequences (e.g. sequences heavily trimmed, and the little remaining stretch....is an adaptor!!

)

If anybody can provide me a working utility, will be my idol.
(I played so well with SeqClean and cln2qual.....why so many format on this world??)

**Gianza** · 07-23-2010, 10:08 AM

Novoalign seems to have an internal-only trimming pipeline.
Looking at the manual it doesn't seem it will just return a trimmed fastq, but the final alignment....Am I wrong?

I would like to avoid conversion to fasta+qual, expecially cause I'm dealing with several dataset (need to report clipping to qual).

Mosaik, only accepts perfect matching paired sequences (only a missing one and it will stop, thus I'm also concerned about this issue). Have anybody evere dealed with this kind of issue: keep zero length sequences?

**john_mu** · 07-23-2010, 10:38 AM

http://wiki.bioinformatics.ucdavis.edu/index.php/Data_Analysis

There are some perl scripts there, which might help.

**Jose Blanca** · 07-23-2010, 12:35 PM

We have build a pipeline, and the first step (the read cleaning) takes care of that. You can take a look at:

Page not found · GitHub Pages

http://bioinf.comav.upv.es/ngs_backbone/

**Zigster** · 07-23-2010, 12:52 PM

Originally posted by Adrian_H View Post

fastx_clipper seemed to work fine for me. Are you sure you're running it with the parameters you want? (I did have some issues with clipping paired-end reads, where fastx_clipper would blow away one of the sequences if it was too short after clipping, leading to unmatched pairs, and had to modify fastx_clipper to leave sequences in even if they were completely clipped)

yes I have written to the fastx guy about orphaned pairs he said the next version might have some solution. At any rate I think the fastx clipping is too aggressive.

**robs** · 07-25-2010, 10:10 PM

Did you try a tool called TagCleaner?

TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets - BMC Bioinformatics

http://www.biomedcentral.com/1471-2105/11/341

Background Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. Results TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. Conclusions TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner .

TagCleaner @ SourceForge.net

http://tagcleaner.sourceforge.net/

Description

It's a web-based tool, but I heard you can contact them if your files are large and they will process them offline for you.

**raela** · 07-26-2010, 04:47 AM

Not sure if it'll work in your case, but try running it as:
cat 1M_1.txt | fastx_clipper -a AAGCAGTGGTATCAACGCAGAGTACGCGGG -M 20 -n > output.txt

**vgrubor** · 07-28-2010, 06:49 AM

You can use Genome Analysis Toolkit (GATK) to do this. http://www.broadinstitute.org/gsa/wi.../Read_Clipping

You can configure it to mask your adapters sequences with Ns so you don't end up with an empty sequence which can cause trouble with aligners when aligning in a paired-end mode.

**Zigster** · 07-30-2010, 01:24 PM

oops nevermind I see it

can someone familiar with FASTX explain to me which 14 nt are aligning here? it seems way too aggressive

Code:

cat myseq.fq 
@HWI-EASXXX/1
AACGCGATGCCTCCATTGCTGGTGCAACTGAGCCTGGATATCGGCAGTGCGATCCTCATGGACTTGGATCTGGGTT
+HWI-EASXXX/1
`_bb_b_bbYbb^bbbaaXbbb`b_a[S``[[MWO`\``]b_bbJ\^Z\J`Y^a[`^[b_bF^b_BBBBBBBBBBB

>cat myseq.fq | fastx_clipper -a AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG -M 14
@HWI-EASXXX/1
AACGCGATGCCTCCATTGCTGGTGCAACTGAGCCTGG
+HWI-EASXXX/1
`_bb_b_bbYbb^bbbaaXbbb`b_a[S``[[MWO`\

**Adrian_H** · 07-30-2010, 01:48 PM

If you dig into the fastx_clipper source code, you can see what it's doing (I agree with you that I'm not at all sure that it's the right thing to do though!).

if ( alignment_size > 5
&&
alignment_results.target_start == 0
&&
(alignment_results.matches * 100 / alignment_size ) >= 75 ) {
//printf("--2\n");
return alignment_results.query_start ;
}

I think that this is what is aligning:

ATATCGGCAGTGCGAT
: ::::: :: ::: :
AGATCGGAAGAGCGGT

and the it is cutting off everything that follows

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Fastq adaptors removal/stripping/cleaning

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News