SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Fastq adaptors removal/stripping/cleaning Gianza Bioinformatics 16 01-04-2012 06:01 AM
cleaning cp, mt and rRNA from reads tarias Bioinformatics 3 09-26-2011 01:12 PM
PubMed: ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Nex Newsbot! Literature Watch 0 06-05-2011 06:00 AM
454 data nitinkumar Bioinformatics 4 02-23-2011 01:24 PM
sff_extract: combining data from 454 Flx and Titanium data sets agroster Bioinformatics 7 01-14-2010 10:19 AM

Reply
 
Thread Tools
Old 05-25-2011, 07:08 AM   #1
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Smile 454 Data cleaning

Has anyone tried any software for 454 data cleaning, removing the poor quality reads? And has anyone tired installing hyphy and 454HIV without any problem in installing? I need help..
Himalaya is offline   Reply With Quote
Old 05-25-2011, 01:19 PM   #2
proteasome
Member
 
Location: Wisconsin

Join Date: Jul 2009
Posts: 22
Default

If you're generally looking to cleanup 454 data I would suggest using Galaxy http://main.g2.bx.psu.edu/to convert your sff files to fastq, and then using fastq filters and tools to remove short reads or low quality reads. You can also mask low quality bases (such as in homo-polymers) to Ns without loosing reads.
proteasome is offline   Reply With Quote
Old 05-26-2011, 12:09 AM   #3
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

thanx proteasome for the reply..galaxy is great online tool...problem is uploading files of huge size..
Himalaya is offline   Reply With Quote
Old 05-27-2011, 04:42 AM   #4
essvee
Member
 
Location: Guelph

Join Date: Apr 2011
Posts: 11
Default

I suggest trying SeqTrim.
You can set minimum quality based on a defined window size, minimum length, etc.
You can also run it command line, or online.
www.scbi.uma.es/seqtrim/
essvee is offline   Reply With Quote
Old 05-28-2011, 06:40 PM   #5
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Hi,

Check out fastx toolkits (http://hannonlab.cshl.edu/fastx_toolkit/) and SolexaQA (http://solexaqa.sourceforge.net/). Both have simple but neat scripts to do read trimming.

Douglas
www.contigexpress.com

Last edited by DZhang; 05-28-2011 at 06:41 PM. Reason: correction
DZhang is offline   Reply With Quote
Old 05-31-2011, 10:54 PM   #6
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

We have done our own read cleaning pipeline. It works for us, so we have made it available just in case it could be of any use to other people. It is called clean_reads.
Jose Blanca is offline   Reply With Quote
Old 06-06-2011, 02:07 PM   #7
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

I like PRINSEQ (http://prinseq.sourceforge.net/). It comes as web and standalone version and does all the QC and data pre-processing that you need.

The application note also contains a short comparison with similar tools (http://bioinformatics.oxfordjournals.../27/6/863.long).
robs is offline   Reply With Quote
Old 06-08-2011, 02:39 AM   #8
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Quote:
Originally Posted by Jose Blanca View Post
We have done our own read cleaning pipeline. It works for us, so we have made it available just in case it could be of any use to other people. It is called clean_reads.
Hi Jose Blanca..I installed clean_reads with Biopython and psubprocess preinstalled according to requirement but resulted to segmentation fault. have you run the program? Please advice me about the fault if you run it clean. thank you
Himalaya is offline   Reply With Quote
Old 06-08-2011, 02:53 AM   #9
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

I would need more information. A segmentation fault is quite a strange error in a python program. could you send me the output?
Jose Blanca is offline   Reply With Quote
Old 06-08-2011, 03:30 AM   #10
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Quote:
Originally Posted by Jose Blanca View Post
I would need more information. A segmentation fault is quite a strange error in a python program. could you send me the output?
Hi Jose
I am using mac os snow leopard. My commandline is: clean_reads -i Pair01.fastq -o ./clean_reads/output_q20_len50_only3end -p 454 -f fastq -g fastq -qual_threshold 20 -only_3_end True -min_len 50. It only gave me one line error 'segmentation fault' and says python quit unexpectedly in separate window with long error report. A small last part of error report is below:
0x7fff8507b000 - 0x7fff85131fff libobjc.A.dylib 227.0.0 (compatibility 1.0.0) <1960E662-D35C-5D98-EB16-D43166AE6A22> /usr/lib/libobjc.A.dylib
0x7fff85288000 - 0x7fff85446fff libicucore.A.dylib 40.0.0 (compatibility 1.0.0) <3D9313BF-97A4-6B65-E583-F6173E64C3C2> /usr/lib/libicucore.A.dylib
0x7fff8643f000 - 0x7fff86461ff7 libexpat.1.dylib 7.2.0 (compatibility 7.0.0) <7D173736-CBDF-F02F-2D07-B38F565D5ED4> /usr/lib/libexpat.1.dylib
0x7fff86462000 - 0x7fff864aaff7 libvDSP.dylib 268.0.1 (compatibility 1.0.0) <98FC4457-F405-0262-00F7-56119CA107B6> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvDSP.dylib
0x7fff87df1000 - 0x7fff87df1ff7 com.apple.Accelerate 1.6 (Accelerate 1.6) <15DF8B4A-96B2-CB4E-368D-DEC7DF6B62BB> /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate
0x7fff8846a000 - 0x7fff88544fff com.apple.vImage 4.0 (4.0) <B5A8B93B-D302-BC30-5A18-922645DB2F56> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vImage.framework/Versions/A/vImage
0x7fff88545000 - 0x7fff88d4ffe7 libBLAS.dylib 219.0.0 (compatibility 1.0.0) <2F26CDC7-DAE9-9ABE-6806-93BBBDA20DA0> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
0x7fffffe00000 - 0x7fffffe01fff libSystem.B.dylib ??? (???) <40DA878D-6D69-FEA3-398B-BBD80C9BFF46> /usr/lib/libSystem.B.dylib

Then i tried to run clean_reads in ubuntu with same command and gave me error:
IOError: [Errno 2] No such file or directory: 'ual_threshold'
i did specify any file 'ual_threshold'. That was option -qual_threshold' i specified.
any advice..please
Himalaya is offline   Reply With Quote
Old 06-08-2011, 04:41 AM   #11
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

In mac it won't work, because the binaries shiped inside clean_reads are only for linux.
Regarding the linux problem, it's a malformed command line. It should be:
--qual_threshold
instead of:
-qual_threshold
Regards.
Jose Blanca is offline   Reply With Quote
Old 06-08-2011, 04:59 AM   #12
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Quote:
Originally Posted by Jose Blanca View Post
In mac it won't work, because the binaries shiped inside clean_reads are only for linux.
Regarding the linux problem, it's a malformed command line. It should be:
--qual_threshold
instead of:
-qual_threshold
Regards.
Hi Jose, Thanks a lot. In linux it seems to work now..For the same command again, it gives me output " parameter qual_threshold is incompatible with platform long_with_quality". I tested the --qual_threshold value from 10 to 100 and repeatedly gave the same output.
any advice on this..Thanks for helping me out to run the program.
Himalaya is offline   Reply With Quote
Old 06-08-2011, 10:31 PM   #13
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

You can not use the --qual_thrshold parameter for long reads (sanger or 454). I have to explain that in the documentation a little. For the long reads we trim the bad quality regions by using lucy so the parameter to change would be lucy_error. qual_threshold is used by the short reads quality trimmer.
Jose Blanca is offline   Reply With Quote
Old 06-09-2011, 04:54 AM   #14
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Quote:
Originally Posted by Jose Blanca View Post
You can not use the --qual_thrshold parameter for long reads (sanger or 454). I have to explain that in the documentation a little. For the long reads we trim the bad quality regions by using lucy so the parameter to change would be lucy_error. qual_threshold is used by the short reads quality trimmer.
Hi Jose
I am trying to do quality trimming and filtering 454 reads. The adaptors and primers and barcode sequences are already removed.I am not allowed to specific minimum quality threshold to clean bad quality reads. I don't understand why? How does it do quality trimming. Sorry I could not get documentation of clean_reads. And when i specify option -only_3_end True, it gave me error not compatible with platform. So does it mean it trims from 5' and 3' prime ends.

thnx
Himalaya is offline   Reply With Quote
Old 06-09-2011, 05:22 AM   #15
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

Sorry, I have not explained myself well enough.
clean_reads uses two different algorithms for quality trimming. One for long reads (lucy) and a different one for short reads. If you're cleaning long reads, the parameters aplicable are the lucy parameters: lucy_error, lucy_window and lucy_bracket. These are the parameters that you should tweak to modify the cleaning behaviour when dealing with 454 and sanger reads.

For illumina and solid we didin't manage to use lucy so we implemented a sliding window trimming function. Its parameters are qual_window, qual_threshold, and only_3_end. That's why these parameters can only be used with short reads.
Jose Blanca is offline   Reply With Quote
Old 06-09-2011, 05:59 AM   #16
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Quote:
Originally Posted by Jose Blanca View Post
Sorry, I have not explained myself well enough.
clean_reads uses two different algorithms for quality trimming. One for long reads (lucy) and a different one for short reads. If you're cleaning long reads, the parameters aplicable are the lucy parameters: lucy_error, lucy_window and lucy_bracket. These are the parameters that you should tweak to modify the cleaning behaviour when dealing with 454 and sanger reads.

For illumina and solid we didin't manage to use lucy so we implemented a sliding window trimming function. Its parameters are qual_window, qual_threshold, and only_3_end. That's why these parameters can only be used with short reads.
Hi Jose
I am going with your explainations. I issued the followed command:
clean_reads -i Pair01_fastq_format.fastq -o ./clean_reads/output_q20_len_50 -p 454 -f fastq -g fastq -min_length 20 --lucy_error 0.025,0.02 --lucy_bracket 10,0.02 --lucy_window 1,0.02

It seems to work fine but no output. The clean_reads.error explains that the input file "Pair01_fastq_format.fastq" is not found. The file is in the same directory from where i issued the command and it says "no such file or directory".
Am i wrong in the command itself?
I really appreciate your help. Thnx
Himalaya is offline   Reply With Quote
Old 06-09-2011, 06:42 AM   #17
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

Are you sure the file is in there? could it be a problem with the letter case? In unix the case matter Pair01_fastq_format.fastq and pair01_fastq_format.fastq would be different files.
Can you run the following command ok?
head Pair01_fastq_format.fastq
Jose Blanca is offline   Reply With Quote
Old 06-10-2011, 04:36 AM   #18
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Hi Jose
Its fine now. The command is working. But I have a question. If I want to cleaning minimum threshold quality score, how can i do that? Since --qual_threshold does not work for 454, how is 'clean_reads' working without quality threshold information?
Thanx
Himalaya is offline   Reply With Quote
Old 06-10-2011, 04:58 AM   #19
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

Take a look at the lucy documentation, because you have to use their parameters.
Jose Blanca is offline   Reply With Quote
Old 06-11-2011, 06:20 AM   #20
Pedro
Junior Member
 
Location: Portugal

Join Date: Dec 2008
Posts: 6
Default

Hi Jose,

another question for you. I've been testing clean_reads and it works quite nicely. However, when I try to use multi threads I got errors which I believe are related with psubprocess. Could you help me on this (since it would speed up the work ). The error code is as follows:

"clean_reads -i mp1_M1.fastq -o mp1_M1cr1.fastq -p illumina -t 4 -a adaptors.fasta
The command was:
/usr/local/bin/clean_reads -i mp1_M1.fastq -o mp1_M1cr1.fastq -p illumina -t 4 -a adaptors.fasta
/usr/local/bin/clean_reads version: 0.2.1
Running pipeline illumina with the following parameters:
--platform: illumina
--seq_in: mp1_M1.fastq
--seq_out: mp1_M1cr1.fastq
--adaptors_file: adaptors.fasta
--threads: 4
--disable_quality_trimming: False
--qual_threshold: 20
--qual_window: 1
--only_3_end: False
--filter_identity: 95.0
--filter_length_percentage: 75.0
--error_log: clean_reads.error
An unexpected error happened.
The clean_reads developers would appreciate your feedback
Please send them the error log and take a look at it: clean_reads.error

[Errno 2] No such file or directory: '/tmp/tmpTYEr3e/tmpKIFhW2/tmpGHXvok'/usr/local/lib/python2.6/dist-packages/franklin/utils/cgitb.py:245: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
value = pydoc.text.repr(getattr(evalue, name))
Traceback (most recent call last):
File "/usr/local/bin/clean_reads", line 857, in <module>
main(stdout, stderr)
File "/usr/local/bin/clean_reads", line 840, in main
processes=threads)
File "/usr/local/lib/python2.6/dist-packages/franklin/pipelines/pipelines.py", line 339, in seq_pipeline_runner
processes)
File "/usr/local/lib/python2.6/dist-packages/franklin/pipelines/pipelines.py", line 287, in _parallel_process_sequences
retcode = process.wait()
File "/usr/local/lib/python2.6/dist-packages/psubprocess/prunner.py", line 374, in wait
self._collect_output_streams()
File "/usr/local/lib/python2.6/dist-packages/psubprocess/prunner.py", line 407, in _collect_output_streams
joiner(out_file, part_out_fnames)
File "/usr/local/lib/python2.6/dist-packages/psubprocess/prunner.py", line 490, in default_cat_joiner
in_fhand = open(in_file_, 'r')
IOError: [Errno 2] No such file or directory: '/tmp/tmpTYEr3e/tmpKIFhW2/tmpGHXvok' "

thanx
P
Pedro is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO