Has anyone tried any software for 454 data cleaning, removing the poor quality reads? And has anyone tired installing hyphy and 454HIV without any problem in installing? I need help..
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
If you're generally looking to cleanup 454 data I would suggest using Galaxy http://main.g2.bx.psu.edu/to convert your sff files to fastq, and then using fastq filters and tools to remove short reads or low quality reads. You can also mask low quality bases (such as in homo-polymers) to Ns without loosing reads.
-
I suggest trying SeqTrim.
You can set minimum quality based on a defined window size, minimum length, etc.
You can also run it command line, or online.
Comment
-
Hi,
Check out fastx toolkits (http://hannonlab.cshl.edu/fastx_toolkit/) and SolexaQA (http://solexaqa.sourceforge.net/). Both have simple but neat scripts to do read trimming.
Douglas
Comment
-
We have done our own read cleaning pipeline. It works for us, so we have made it available just in case it could be of any use to other people. It is called clean_reads.
Comment
-
I like PRINSEQ (http://prinseq.sourceforge.net/). It comes as web and standalone version and does all the QC and data pre-processing that you need.
The application note also contains a short comparison with similar tools (http://bioinformatics.oxfordjournals.../27/6/863.long).
Comment
-
Originally posted by Jose Blanca View PostWe have done our own read cleaning pipeline. It works for us, so we have made it available just in case it could be of any use to other people. It is called clean_reads.
Comment
-
Originally posted by Jose Blanca View PostI would need more information. A segmentation fault is quite a strange error in a python program. could you send me the output?
I am using mac os snow leopard. My commandline is: clean_reads -i Pair01.fastq -o ./clean_reads/output_q20_len50_only3end -p 454 -f fastq -g fastq -qual_threshold 20 -only_3_end True -min_len 50. It only gave me one line error 'segmentation fault' and says python quit unexpectedly in separate window with long error report. A small last part of error report is below:
0x7fff8507b000 - 0x7fff85131fff libobjc.A.dylib 227.0.0 (compatibility 1.0.0) <1960E662-D35C-5D98-EB16-D43166AE6A22> /usr/lib/libobjc.A.dylib
0x7fff85288000 - 0x7fff85446fff libicucore.A.dylib 40.0.0 (compatibility 1.0.0) <3D9313BF-97A4-6B65-E583-F6173E64C3C2> /usr/lib/libicucore.A.dylib
0x7fff8643f000 - 0x7fff86461ff7 libexpat.1.dylib 7.2.0 (compatibility 7.0.0) <7D173736-CBDF-F02F-2D07-B38F565D5ED4> /usr/lib/libexpat.1.dylib
0x7fff86462000 - 0x7fff864aaff7 libvDSP.dylib 268.0.1 (compatibility 1.0.0) <98FC4457-F405-0262-00F7-56119CA107B6> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvDSP.dylib
0x7fff87df1000 - 0x7fff87df1ff7 com.apple.Accelerate 1.6 (Accelerate 1.6) <15DF8B4A-96B2-CB4E-368D-DEC7DF6B62BB> /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate
0x7fff8846a000 - 0x7fff88544fff com.apple.vImage 4.0 (4.0) <B5A8B93B-D302-BC30-5A18-922645DB2F56> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vImage.framework/Versions/A/vImage
0x7fff88545000 - 0x7fff88d4ffe7 libBLAS.dylib 219.0.0 (compatibility 1.0.0) <2F26CDC7-DAE9-9ABE-6806-93BBBDA20DA0> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
0x7fffffe00000 - 0x7fffffe01fff libSystem.B.dylib ??? (???) <40DA878D-6D69-FEA3-398B-BBD80C9BFF46> /usr/lib/libSystem.B.dylib
Then i tried to run clean_reads in ubuntu with same command and gave me error:
IOError: [Errno 2] No such file or directory: 'ual_threshold'
i did specify any file 'ual_threshold'. That was option -qual_threshold' i specified.
any advice..please
Comment
-
Originally posted by Jose Blanca View PostIn mac it won't work, because the binaries shiped inside clean_reads are only for linux.
Regarding the linux problem, it's a malformed command line. It should be:
--qual_threshold
instead of:
-qual_threshold
Regards.
any advice on this..Thanks for helping me out to run the program.
Comment
-
You can not use the --qual_thrshold parameter for long reads (sanger or 454). I have to explain that in the documentation a little. For the long reads we trim the bad quality regions by using lucy so the parameter to change would be lucy_error. qual_threshold is used by the short reads quality trimmer.
Comment
-
Originally posted by Jose Blanca View PostYou can not use the --qual_thrshold parameter for long reads (sanger or 454). I have to explain that in the documentation a little. For the long reads we trim the bad quality regions by using lucy so the parameter to change would be lucy_error. qual_threshold is used by the short reads quality trimmer.
I am trying to do quality trimming and filtering 454 reads. The adaptors and primers and barcode sequences are already removed.I am not allowed to specific minimum quality threshold to clean bad quality reads. I don't understand why? How does it do quality trimming. Sorry I could not get documentation of clean_reads. And when i specify option -only_3_end True, it gave me error not compatible with platform. So does it mean it trims from 5' and 3' prime ends.
thnx
Comment
-
Sorry, I have not explained myself well enough.
clean_reads uses two different algorithms for quality trimming. One for long reads (lucy) and a different one for short reads. If you're cleaning long reads, the parameters aplicable are the lucy parameters: lucy_error, lucy_window and lucy_bracket. These are the parameters that you should tweak to modify the cleaning behaviour when dealing with 454 and sanger reads.
For illumina and solid we didin't manage to use lucy so we implemented a sliding window trimming function. Its parameters are qual_window, qual_threshold, and only_3_end. That's why these parameters can only be used with short reads.
Comment
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:35 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:35 AM
|
||
Started by seqadmin, 05-09-2024, 02:46 PM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
05-09-2024, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
||
Started by seqadmin, 05-06-2024, 07:17 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
05-06-2024, 07:17 AM
|
Comment