SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
454 Data cleaning Himalaya Bioinformatics 28 10-23-2013 01:33 PM
PCR primers+adaptors gio5 Sample Prep / Library Generation 2 01-05-2012 09:32 AM
cleaning cp, mt and rRNA from reads tarias Bioinformatics 3 09-26-2011 01:12 PM
PubMed: ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Nex Newsbot! Literature Watch 0 06-05-2011 06:00 AM
primers and adaptors xgm-1999 454 Pyrosequencing 2 09-22-2009 12:24 AM

Reply
 
Thread Tools
Old 07-23-2010, 07:56 AM   #1
Gianza
Junior Member
 
Location: Italy

Join Date: Jan 2010
Posts: 7
Default Fastq adaptors removal/stripping/cleaning

Hi guys,
I'm facing such a very dumb problem.
I cannot find a tool which simply strip off adaptors from Fastq Illumina files. I have contamination from library synthesis adaptors (SMART).
-Seqclean only works with fasta.
-Lucy2: libgtk1.2 libraries no longer supported in my linux distro (and I don't even know if it handle fastq)
-fastx_clipper form FastX-toolkit: makes a big mess, cause it doesn't only strip the adaptor but blow away the whole sequence (it's not supposed to behave like this): it results in loss of more than 1/3 of the dataset.

Other solutions are integrated in assembler or aligner, but I need a crude trimmed fastq as output.

Does anybody know something which might be helpful to me?

Thanks in advance!!

Davide
Gianza is offline   Reply With Quote
Old 07-23-2010, 08:22 AM   #2
Adrian_H
Member
 
Location: Cambridge, MA

Join Date: Feb 2010
Posts: 10
Default

fastx_clipper seemed to work fine for me. Are you sure you're running it with the parameters you want? (I did have some issues with clipping paired-end reads, where fastx_clipper would blow away one of the sequences if it was too short after clipping, leading to unmatched pairs, and had to modify fastx_clipper to leave sequences in even if they were completely clipped)
Adrian_H is offline   Reply With Quote
Old 07-23-2010, 08:54 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Personally I use Biopython with some simple adaptor matching.

There ought to be a tool in EMBOSS to do this too...
maubp is offline   Reply With Quote
Old 07-23-2010, 09:15 AM   #4
raela
Member
 
Location: Ithaca, NY

Join Date: Apr 2010
Posts: 39
Default

With some of the tools in fastx_toolkit, I've found redirection and flags act differently. Most give the option of using -i infile -o outfile or `cat inline | tool > outfile` - if you're using -i -o, try cat in | > out and see if that works better. I forget which one usually gives me issues with it.
raela is offline   Reply With Quote
Old 07-23-2010, 09:49 AM   #5
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

did you try novoaligns adapter trimming? That ought to help
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 07-23-2010, 10:05 AM   #6
Gianza
Junior Member
 
Location: Italy

Join Date: Jan 2010
Posts: 7
Default

I'm using this command.
I want to clarify I'm not a bioinformatician.

fastx_clipper -a AAGCAGTGGTATCAACGCAGAGTACGCGGG -i 1M_1.txt -M 20 -n -o output.txt

For Adrian_H: could you please give me your modified version. You just make me remeber the aligner I will use (Mosaik) doesn't accept missing paired-end.

For raela: Sorry I didn't get your point with cat and pipe..could you please write how should I type it? Thanks!

However; I found that I have more than one trouble: while blowing away tons of reads with no reason (other than a regular adaptor match, I guess), it also leaves tons of other adapters in the output sequences (e.g. sequences heavily trimmed, and the little remaining stretch....is an adaptor!! )

If anybody can provide me a working utility, will be my idol.
(I played so well with SeqClean and cln2qual.....why so many format on this world??)
Gianza is offline   Reply With Quote
Old 07-23-2010, 10:08 AM   #7
Gianza
Junior Member
 
Location: Italy

Join Date: Jan 2010
Posts: 7
Default

Novoalign seems to have an internal-only trimming pipeline.
Looking at the manual it doesn't seem it will just return a trimmed fastq, but the final alignment....Am I wrong?

I would like to avoid conversion to fasta+qual, expecially cause I'm dealing with several dataset (need to report clipping to qual).

Mosaik, only accepts perfect matching paired sequences (only a missing one and it will stop, thus I'm also concerned about this issue). Have anybody evere dealed with this kind of issue: keep zero length sequences?

Last edited by Gianza; 07-23-2010 at 10:21 AM.
Gianza is offline   Reply With Quote
Old 07-23-2010, 10:38 AM   #8
john_mu
Member
 
Location: Stanford, CA

Join Date: May 2010
Posts: 88
Default

http://wiki.bioinformatics.ucdavis.e.../Data_Analysis

There are some perl scripts there, which might help.
__________________
SpliceMap: De novo detection of splice junctions from RNA-seq
Download SpliceMap Comment here
john_mu is offline   Reply With Quote
Old 07-23-2010, 12:35 PM   #9
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

We have build a pipeline, and the first step (the read cleaning) takes care of that. You can take a look at:
http://bioinf.comav.upv.es/ngs_backbone/
Jose Blanca is offline   Reply With Quote
Old 07-23-2010, 12:52 PM   #10
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

Quote:
Originally Posted by Adrian_H View Post
fastx_clipper seemed to work fine for me. Are you sure you're running it with the parameters you want? (I did have some issues with clipping paired-end reads, where fastx_clipper would blow away one of the sequences if it was too short after clipping, leading to unmatched pairs, and had to modify fastx_clipper to leave sequences in even if they were completely clipped)
yes I have written to the fastx guy about orphaned pairs he said the next version might have some solution. At any rate I think the fastx clipping is too aggressive.
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 07-25-2010, 10:10 PM   #11
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

Did you try a tool called TagCleaner?

http://www.biomedcentral.com/1471-2105/11/341
http://tagcleaner.sourceforge.net/

It's a web-based tool, but I heard you can contact them if your files are large and they will process them offline for you.
robs is offline   Reply With Quote
Old 07-26-2010, 04:47 AM   #12
raela
Member
 
Location: Ithaca, NY

Join Date: Apr 2010
Posts: 39
Default

Not sure if it'll work in your case, but try running it as:
cat 1M_1.txt | fastx_clipper -a AAGCAGTGGTATCAACGCAGAGTACGCGGG -M 20 -n > output.txt
raela is offline   Reply With Quote
Old 07-28-2010, 06:49 AM   #13
vgrubor
Junior Member
 
Location: North Carolina

Join Date: Sep 2009
Posts: 6
Default

You can use Genome Analysis Toolkit (GATK) to do this. http://www.broadinstitute.org/gsa/wi.../Read_Clipping

You can configure it to mask your adapters sequences with Ns so you don't end up with an empty sequence which can cause trouble with aligners when aligning in a paired-end mode.
vgrubor is offline   Reply With Quote
Old 07-30-2010, 01:24 PM   #14
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

oops nevermind I see it
Quote:
can someone familiar with FASTX explain to me which 14 nt are aligning here? it seems way too aggressive
Code:
cat myseq.fq 
@HWI-EASXXX/1
AACGCGATGCCTCCATTGCTGGTGCAACTGAGCCTGGATATCGGCAGTGCGATCCTCATGGACTTGGATCTGGGTT
+HWI-EASXXX/1
`_bb_b_bbYbb^bbbaaXbbb`b_a[S``[[MWO`\``]b_bbJ\^Z\J`Y^a[`^[b_bF^b_BBBBBBBBBBB

>cat myseq.fq | fastx_clipper -a AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG -M 14
@HWI-EASXXX/1
AACGCGATGCCTCCATTGCTGGTGCAACTGAGCCTGG
+HWI-EASXXX/1
`_bb_b_bbYbb^bbbaaXbbb`b_a[S``[[MWO`\
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter

Last edited by Zigster; 07-30-2010 at 01:36 PM.
Zigster is offline   Reply With Quote
Old 07-30-2010, 01:48 PM   #15
Adrian_H
Member
 
Location: Cambridge, MA

Join Date: Feb 2010
Posts: 10
Default

If you dig into the fastx_clipper source code, you can see what it's doing (I agree with you that I'm not at all sure that it's the right thing to do though!).

if ( alignment_size > 5
&&
alignment_results.target_start == 0
&&
(alignment_results.matches * 100 / alignment_size ) >= 75 ) {
//printf("--2\n");
return alignment_results.query_start ;
}

I think that this is what is aligning:

ATATCGGCAGTGCGAT
: ::::: :: ::: :
AGATCGGAAGAGCGGT


and the it is cutting off everything that follows
Adrian_H is offline   Reply With Quote
Old 01-03-2012, 09:20 PM   #16
James Hane
Member
 
Location: Perth, Australia

Join Date: Apr 2010
Posts: 11
Default

cutadapt works brilliantly
James Hane is offline   Reply With Quote
Old 01-04-2012, 06:01 AM   #17
jordi
Member
 
Location: València, Spain

Join Date: Apr 2009
Posts: 48
Default

What about prinseq?
http://edwards.sdsu.edu/prinseq_beta/
jordi is offline   Reply With Quote
Reply

Tags
adapter trimming

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO