SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FASTX Toolkit barcode splitter issue jdanderson Bioinformatics 36 01-31-2016 07:09 PM
FASTX-Toolkit: quality score value thinkRNA Bioinformatics 13 09-30-2014 10:25 AM
Newbie questions regarding Illumina read quality statistics using FASTX toolkit Lspoor Bioinformatics 21 09-05-2013 12:48 PM
Fastx Toolkit for Quality Stats of data from new illumina pipeline software vedbar Bioinformatics 6 09-19-2011 10:50 AM
SCS/RTA upgrade Q-score of 41 fastx toolkit crash seqfast Bioinformatics 1 08-22-2011 08:15 AM

Reply
 
Thread Tools
Old 10-04-2011, 12:10 PM   #1
liu_xt005
Member
 
Location: Iowa City, IA

Join Date: Jun 2011
Posts: 24
Default FastX-toolkit

I am going to clean some exome sequence data (paired-end) generated by Illumina using FastX-toolkit. Could you please suggest a good procedure, for instance,
1. fastx_clipper
2. fastx_quality_filter
3. fastx_quality_trimmer
...?
I am confused about the steps as well as the order.
Thanks very much!
liu_xt005 is offline   Reply With Quote
Old 03-08-2012, 08:42 PM   #2
odoyle81
Member
 
Location: United States

Join Date: Aug 2011
Posts: 31
Default

I'm actually confused about this too...
Is it better to do clipper first, and then quality filter?
When determining where to clip, do you look at the fastQC results? For example if my graph of quality scores across all bases and I see that after position 85 QC falls below 20, should I just clip all reads at 85?
Thanks!
odoyle81 is offline   Reply With Quote
Old 03-09-2012, 05:17 AM   #3
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by odoyle81 View Post
I'm actually confused about this too...
Is it better to do clipper first, and then quality filter?
When determining where to clip, do you look at the fastQC results? For example if my graph of quality scores across all bases and I see that after position 85 QC falls below 20, should I just clip all reads at 85?
Thanks!
I would not hard clip based on just Fastqc. Remember it's just showing you the distribution of quality scores, and you will have plenty of reads that have good quality all the way through. As for whether to even trim the reads at all, that depends. Could you provide more details about your library and what you plan to do with the reads?
pbluescript is offline   Reply With Quote
Old 03-09-2012, 09:37 AM   #4
odoyle81
Member
 
Location: United States

Join Date: Aug 2011
Posts: 31
Default

We have a couple different projects:

1. A population of mutants segregating for a phenotype... we want to locate the deletion, so I want to use one of these programs to do that (pindel, svseq, cortex (just learned about that one today)).
2. We also want to do a reference alignment with another sample.. I was going to use BWA..

We have Illumina 100bp PE reads.

If I trim then quality filter, I keep 74% of reads
If I just quality filter then I keep 70% of reads
I was trimming to 89bp and quality filter q=20 p=80

I thought it was really important to QC the reads before further processing?

Last edited by odoyle81; 03-09-2012 at 09:42 AM.
odoyle81 is offline   Reply With Quote
Old 03-09-2012, 01:57 PM   #5
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by odoyle81 View Post
I thought it was really important to QC the reads before further processing?
It is important to QC the reads, but it might not be necessary to trim the reads based on quality. Most aligners are aware of the quality of the bases and will take that into account when mapping. BWA is a good example since it can soft clip reads. If you do trim off low quality tails with PE data and map with BWA, you might even get worse results than if you just map the reads without trimming them. BWA can have a hard time determining the size distribution of the insert if you do quality trimming.
pbluescript is offline   Reply With Quote
Old 03-09-2012, 07:38 PM   #6
odoyle81
Member
 
Location: United States

Join Date: Aug 2011
Posts: 31
Default

Thanks for that perspective!
So after quality filtering, I will probably lose some of the reads from pairs. I've been reading about how remove the orphaned reads. Does everyone do this with custom scripts or is there a tool for this?
odoyle81 is offline   Reply With Quote
Old 03-09-2012, 10:58 PM   #7
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

Two things:
1) The best way to get an idea of the best way to trim is to trim a couple different ways and see which aligns the best. While it is probably too time consuming to do this for all data sets, it's informative to kind of get an idea what things are doing.
2) You might want to take a look at Trimmomatic. It is way faster than the FASTX Toolkit.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 03-13-2012, 02:56 AM   #8
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

Just a comment for odoyle81 about using Cortex - you should not need to pre-quality filter the reads for Cortex (unless you have massive massive coverage, in which case it will do no harm I guess). Just use the inbuilt error-cleaning mechanisms, and it should work just fine.
Zam is offline   Reply With Quote
Old 05-14-2014, 02:49 AM   #9
vivi7
Member
 
Location: Aarhus, Denmark

Join Date: Mar 2014
Posts: 10
Smile fastx_barcodes_splitter issue with the run

Hi,

I saw the post and I hope maybe some of you can help me

When I run fastx_barcode_splitter.pl with this script

/usr/local/bin/fastx_barcode_splitter.pl --bcfile ./Barcodes9nt.txt --prefix ./Rescued9nt --suffix .fq –bol

In the command line it looks like is running (no error message, no > sign), see attachment for screenshot.
However is not running at all, I can see with top that is not using any memory or CPUs and it has been ‘running’ for days on a very small file without producing any results.
The input file is in the STDIN folder as supposed to.

I would be very grateful if you could suggest what might be wrong.
Thanks in advance
Vivi
vivi7 is offline   Reply With Quote
Old 05-14-2014, 12:26 PM   #10
odoyle81
Member
 
Location: United States

Join Date: Aug 2011
Posts: 31
Default

Unfortunately I can't advice on why that isn't working for you, but I would recommend you just write your own script, or try to find one on the internet - most of the FASTX tools are out of date and not updated and don't work that well. For example, this looks like one that might work:
https://pypi.python.org/pypi/paired_sequence_utils/0.1
If you google, you should be able to find a bunch, as it is a pretty simple operation that needs to be done.
I can't offer much support, and maybe this isn't the most efficient way to do it (it is kinda slow), but the one I wrote is here:
https://bitbucket.org/odoyle81/pytho...3.py?at=master
In any case, learning to write your own will allow you to adapt to your specific needs.

hope that helps.
odoyle81 is offline   Reply With Quote
Old 05-15-2014, 01:01 AM   #11
vivi7
Member
 
Location: Aarhus, Denmark

Join Date: Mar 2014
Posts: 10
Default

Thank you very much!!!
vivi7 is offline   Reply With Quote
Old 10-10-2014, 09:22 PM   #12
luofastx
Junior Member
 
Location: tianjin

Join Date: Oct 2014
Posts: 2
Default

fastx_trimmer: input file (/BJPROJ/Data_production/HiseqX/140807_ST-E00142_0036_BH04CYALXX/DHE00358/DHE00358_L5_2.fq.gz) has unknown file format (not FASTA or FASTQ), first character = ^_ (31) ????
luofastx is offline   Reply With Quote
Old 10-10-2014, 09:29 PM   #13
luofastx
Junior Member
 
Location: tianjin

Join Date: Oct 2014
Posts: 2
Default if fastx-toolkit can read gzip file?

fastx_trimmer: input file (/BJPROJ/Data_production/HiseqX/140807_ST-E00142_0036_BH04CYALXX/DHE00358/DHE00358_L5_2.fq.gz) has unknown file format (not FASTA or FASTQ), first character = ^_ (31)
what is the reason to this error?
luofastx is offline   Reply With Quote
Old 10-11-2014, 05:52 AM   #14
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

See: https://www.biostars.org/p/83237/
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:04 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO