![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
fastx_clipper mismatches in adapter | fbelzile62 | Bioinformatics | 3 | 06-06-2011 02:59 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: London, UK Join Date: Nov 2011
Posts: 12
|
![]()
I'm looking for for some help with fastx clipper. Despite my best (if inexperienced) efforts, it's not doing what I want. So far it's little better than random. Worse, actually, since it's not clipping at the adapter sequence provided, but at other sequences entirely.
I've analysed my Illumina data using FastQC. This showed contamination with indexed TruSeq adapters. The universal adapter did not show up (below the 0.1% threshold I guess). It also showed up fairly high (min 30%, mostly >60%) levels of sequence redundancy in the libraries. As part of a pipeline I used fastx_clipper to remove adapter-containing reads entirely (-C option). This removed huge numbers of reads (e.g. ~20% of all reads even for TruSeq Universal adapter - which did not show up in the FastQC analysis). I've spent the rest of the afternoon away from the pipeline, using commandline to try figure out what it's actually doing. For this I used a single fastq file (2.fq) as a test case. I used grep -c into the .fastq file (2.fq) to count up the instances of the Illumina universal adapter. This showed there were some, mostly at start of inserts; certainly nothing like the number fastx_clipper removed. PHP Code:
I ran a manual fastx_clipper (path specifications removed) PHP Code:
Since I couldn't understand what was going on, I repeated this but used option -c to retained only those sequences that had been clipped (ie those reads that had originally had the adapter). Numbers-wise outcome was comparable to the first run. PHP Code:
I extracted at random a single sequence (TGGTATTTTATTTTTCTACCTAAATTT) from this file, and grepped back into the file to recover all reads terminating with the sequence. PHP Code:
Then I did the same with the original file (2.fq) so I had a set of clipped and original unclipped sequences I could compare. PHP Code:
Comparison of the two on alternating lines below (first line clipped, second line original, etc) shows that the sequences removed by fastx clipper are not those supplied as the -a param string. If I blast the sequence removed it is contiguous sequence with the original random sequence (TGGTATTTTATTTTTCTACCTAAATTT) ... not the supplied commandline sequence string. CTTGGTATTTTATTTTTCTACCTAAATTT CTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATGATATTGCTACCTAATTTA TCTTGGTATTTTATTTTTCTACCTAAATTT TCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATGATATTGCTACCTAATTT TTTCTTGGTATTTTATTTTTCTACCTAAATTT TTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATGATATTGCTACCTAAT TTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTT TTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATGATATTGC ATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTT ATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATAAAGAAAAAAGTCAGAAAATG AAAAAACAATAGTAATAGCCATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTT AAAAAACAATAGTAATAGCCATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAATA GGAAAAAACAATAGTAATAGCCATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTT GGAAAAAACAATAGTAATAGCCATATTTTTTGTTGTATTTCTTGGTATTTTATTTTTCTACCTAAATTTAAATCGTAGGTTAGCATTAAGTGTTTTTACTATGAA I've since repeated this with 4 addition sequences from the fastx_clipper output, with the same result. This has left me baffled. I would welcome someone pointing out my simple error (?)! M Last edited by mgg; 12-07-2011 at 01:27 AM. Reason: more informative title, early precis of problem |
![]() |
![]() |
![]() |
#2 |
Moderator
Location: Oslo, Norway Join Date: Nov 2008
Posts: 415
|
![]()
I used fastx_clipper like this:
Code:
zcat sequences.gz | fastx_clipper -v -l 20 -M 15 -a GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG | other_steps >filterered.fastq Your post makes me wonder if the program is very buggy, and perhaps should not be used. Have you contacted the author(s)? |
![]() |
![]() |
![]() |
#3 |
Member
Location: Toronto Join Date: Nov 2010
Posts: 21
|
![]() |
![]() |
![]() |
![]() |
#4 |
Member
Location: Sheffield, UK Join Date: Dec 2011
Posts: 32
|
![]()
Hi,
I've also had problems with fastx clipper. Its seems that not only is it not doing the trimming I expect, but some thing is also a bit odd about what it outputs. I tested this by passing it a single read, which is filtered out and returned an empty output. If I then pass the same read along with two others, then the read is retained, and one of the others that should be returned is not. I'm currently in the processes of switching in cutadapt into my pipeline. Will report back on my results. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Connecticut Join Date: May 2010
Posts: 42
|
![]()
I have had some simliar issues with figuring out my fastx output. Has anyone spoken with the authors of the program? I haven't been able to glean much from their website or manual on the details underlying this program.
|
![]() |
![]() |
![]() |
#6 |
Member
Location: Sheffield, UK Join Date: Dec 2011
Posts: 32
|
![]()
I've been using cutadapt with more success.
|
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: Barcelona Join Date: Jul 2011
Posts: 1
|
![]()
I had problems too with fastx_clipper lack of specificity and I wrote to the authors.
They wrote back and told me that it was indeed a limitation of the program. The fact is that fastx_clipper was designed for small RNA experiments and so it's tweaked to be very sensitive and not specific at all, clipping anything that resembles an adapter and any nucleotides after that. As a consequence it's not the best tool to clip adapters for general experiments and it's only suitable for the small RNA ones. |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Halifax, Nova Scotia Join Date: Mar 2009
Posts: 381
|
![]()
Use Trimmomatic....better software that is mate-pair aware
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: Cambridge, UK Join Date: Oct 2012
Posts: 3
|
![]()
We (the Enright lab) have developed reaper and tally. The first is for demultiplexing, stripping, trimming adapter, and filtering of various sorts, the second for deduplicating sequence data. They can work in conjunction or apart, and allow handling of paired-end files. Adapter stripping is handled by Smith-Waterman local alignments, and highly customisable. Manual: http://www.ebi.ac.uk/~stijn/reaper/reaper.html, download: http://www.ebi.ac.uk/~stijn/reaper/s...per-12-205.tgz. These are parts of a larger exciting pipeline, to be published imminently, for comprehensive analysis of small-RNA experiments or clean-up and QC of paired-end sequencing data in general.
|
![]() |
![]() |
![]() |
#10 |
Moderator
Location: Oslo, Norway Join Date: Nov 2008
Posts: 415
|
![]()
Unfortunate overlap - namewise - with REAPR, http://www.sanger.ac.uk/resources/software/reapr/ ...
|
![]() |
![]() |
![]() |
#11 |
Member
Location: United States of America Join Date: Mar 2011
Posts: 52
|
![]()
Parameter-sweep - comparison of clipping programs:
http://benthamscience.com/open/opena...7/1TOBIOIJ.htm |
![]() |
![]() |
![]() |
Thread Tools | |
|
|