SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
fastx_clipper mismatches in adapter fbelzile62 Bioinformatics 3 06-06-2011 02:59 PM

Reply
 
Thread Tools
Old 12-06-2011, 12:51 PM   #1
mgg
Member
 
Location: London, UK

Join Date: Nov 2011
Posts: 12
Default seeking advice on fastx_clipper

I'm looking for for some help with fastx clipper. Despite my best (if inexperienced) efforts, it's not doing what I want. So far it's little better than random. Worse, actually, since it's not clipping at the adapter sequence provided, but at other sequences entirely.

I've analysed my Illumina data using FastQC. This showed contamination with indexed TruSeq adapters. The universal adapter did not show up (below the 0.1% threshold I guess). It also showed up fairly high (min 30%, mostly >60%) levels of sequence redundancy in the libraries.

As part of a pipeline I used fastx_clipper to remove adapter-containing reads entirely (-C option). This removed huge numbers of reads (e.g. ~20% of all reads even for TruSeq Universal adapter - which did not show up in the FastQC analysis).

I've spent the rest of the afternoon away from the pipeline, using commandline to try figure out what it's actually doing. For this I used a single fastq file (2.fq) as a test case.


I used grep -c into the .fastq file (2.fq) to count up the instances of the Illumina universal adapter. This showed there were some, mostly at start of inserts; certainly nothing like the number fastx_clipper removed.

PHP Code:
grep -ce AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT 2.fq
6903  
# total  instances
grep -ce ^AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT 2.fq
6329  
# almost all of them at start of reads (i.e. no insert) 

I ran a manual fastx_clipper (path specifications removed)
PHP Code:
fastx_clipper -Cva AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT 
-i 2.fq -o 2.noUniv.fq
Min
Length5
Clipped reads 
discarded.
Input4959421 reads.
Output3937769 reads.
discarded 55558 too-short reads.
discarded  26892 adapter-only reads.  
discarded 936303 clipped reads.      # these two lines total  963195 ...
discarded 2899 N reads.                 ... >>>> the 6903 from grep


Since I couldn't understand what was going on, I repeated this but used option -c to retained only those sequences that had been clipped (ie those reads that had originally had the adapter). Numbers-wise outcome was comparable to the first run.
PHP Code:
fastx_clipper -cva AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT  
-i 2.fq -o 2_clippedonly
Min
Length5
Non
-Clipped reads discarded.
Input4959421 reads.
Output936149 reads.         # that's 936149 reads that originally HAD adapter
discarded 55558 too-short reads.
discarded 26892 adapter-only reads.
discarded 3940668 non-clipped reads.
discarded 154 N reads

I extracted at random a single sequence (TGGTATTTTATTTTTCTACCTAAATTT) from this file, and grepped back into the file to recover all reads terminating with the sequence.
PHP Code:
grep TGGTATTTTATTTTTCTACCTAAATTT 2_clippedonly sort uniq -|  sort -k1,1n
  
2_clipped_TGGTATTTTATTTTTCTACCTAAATTT 

Then I did the same with the original file (2.fq) so I had a set of clipped and original unclipped sequences I could compare.
PHP Code:
grep TGGTATTTTATTTTTCTACCTAAATTT 2.fq sort uniq -|  sort -k1,1n 
 
2_original_TGGTATTTTATTTTTCTACCTAAATTT 

Comparison of the two on alternating lines below (first line clipped, second line original, etc) shows that the sequences removed by fastx clipper are not those supplied as the -a param string. If I blast the sequence removed it is contiguous sequence with the original random sequence (TGGTATTTTATTTTTCTACCTAAATTT) ... not the supplied commandline sequence string.


CTTGGTATTTTATTTTTCTACCTAAATTT
CTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATGATATTGCTACCTAATTTA
TCTTGGTATTTTATTTTTCTACCTAAATTT
TCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATGATATTGCTACCTAATTT
TTTCTTGGTATTTTATTTTTCTACCTAAATTT
TTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATGATATTGCTACCTAAT
TTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTT
TTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATGATATTGC
ATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTT
ATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATG
AAAAAACAATAGTAATAGCCATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTT
AAAAAACAATAGTAATAGCCATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATA
GGAAAAAACAATAGTAATAGCCATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTT
GGAAAAAACAATAGTAATAGCCATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAA

I've since repeated this with 4 addition sequences from the fastx_clipper output, with the same result. This has left me baffled. I would welcome someone pointing out my simple error (?)!

M

Last edited by mgg; 12-07-2011 at 01:27 AM. Reason: more informative title, early precis of problem
mgg is offline   Reply With Quote
Old 12-08-2011, 05:13 AM   #2
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

I used fastx_clipper like this:

Code:
zcat sequences.gz | fastx_clipper -v -l 20 -M 15 -a GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG | other_steps >filterered.fastq
I also am wondering if this is actually doing what I asked it for. When I had the program give me the 'adaptor-only' sequences, they were with the adaptor in the beginning, followed by other bases. So, not adaptor-only.

Your post makes me wonder if the program is very buggy, and perhaps should not be used. Have you contacted the author(s)?
flxlex is offline   Reply With Quote
Old 12-08-2011, 01:41 PM   #3
Emilie
Member
 
Location: Toronto

Join Date: Nov 2010
Posts: 21
Default

Hi,

Did you try Cutadapt?
http://code.google.com/p/cutadapt/

Emilie
Emilie is offline   Reply With Quote
Old 05-22-2012, 10:33 AM   #4
sudders
Member
 
Location: Sheffield, UK

Join Date: Dec 2011
Posts: 32
Default

Hi,

I've also had problems with fastx clipper. Its seems that not only is it not doing the trimming I expect, but some thing is also a bit odd about what it outputs.


I tested this by passing it a single read, which is filtered out and returned an empty output. If I then pass the same read along with two others, then the read is retained, and one of the others that should be returned is not.

I'm currently in the processes of switching in cutadapt into my pipeline. Will report back on my results.
sudders is offline   Reply With Quote
Old 08-08-2012, 02:13 PM   #5
JueFish
Member
 
Location: Connecticut

Join Date: May 2010
Posts: 42
Default

I have had some simliar issues with figuring out my fastx output. Has anyone spoken with the authors of the program? I haven't been able to glean much from their website or manual on the details underlying this program.
JueFish is offline   Reply With Quote
Old 08-09-2012, 12:45 AM   #6
sudders
Member
 
Location: Sheffield, UK

Join Date: Dec 2011
Posts: 32
Default

I've been using cutadapt with more success.
sudders is offline   Reply With Quote
Old 10-22-2012, 01:58 AM   #7
bernatgel
Junior Member
 
Location: Barcelona

Join Date: Jul 2011
Posts: 1
Default

I had problems too with fastx_clipper lack of specificity and I wrote to the authors.

They wrote back and told me that it was indeed a limitation of the program. The fact is that fastx_clipper was designed for small RNA experiments and so it's tweaked to be very sensitive and not specific at all, clipping anything that resembles an adapter and any nucleotides after that.

As a consequence it's not the best tool to clip adapters for general experiments and it's only suitable for the small RNA ones.
bernatgel is offline   Reply With Quote
Old 10-22-2012, 04:32 AM   #8
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

Use Trimmomatic....better software that is mate-pair aware
JackieBadger is offline   Reply With Quote
Old 10-22-2012, 06:43 AM   #9
micans
Junior Member
 
Location: Cambridge, UK

Join Date: Oct 2012
Posts: 3
Cool

We (the Enright lab) have developed reaper and tally. The first is for demultiplexing, stripping, trimming adapter, and filtering of various sorts, the second for deduplicating sequence data. They can work in conjunction or apart, and allow handling of paired-end files. Adapter stripping is handled by Smith-Waterman local alignments, and highly customisable. Manual: http://www.ebi.ac.uk/~stijn/reaper/reaper.html, download: http://www.ebi.ac.uk/~stijn/reaper/s...per-12-205.tgz. These are parts of a larger exciting pipeline, to be published imminently, for comprehensive analysis of small-RNA experiments or clean-up and QC of paired-end sequencing data in general.
micans is offline   Reply With Quote
Old 10-23-2012, 01:10 AM   #10
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Unfortunate overlap - namewise - with REAPR, http://www.sanger.ac.uk/resources/software/reapr/ ...
flxlex is offline   Reply With Quote
Old 06-13-2013, 06:19 AM   #11
earonesty
Member
 
Location: United States of America

Join Date: Mar 2011
Posts: 52
Default

Parameter-sweep - comparison of clipping programs:

http://benthamscience.com/open/opena...7/1TOBIOIJ.htm
earonesty is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO