Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Himalaya
    Member
    • Jun 2010
    • 38

    454 Data cleaning

    Has anyone tried any software for 454 data cleaning, removing the poor quality reads? And has anyone tired installing hyphy and 454HIV without any problem in installing? I need help..
  • proteasome
    Member
    • Jul 2009
    • 22

    #2
    If you're generally looking to cleanup 454 data I would suggest using Galaxy http://main.g2.bx.psu.edu/to convert your sff files to fastq, and then using fastq filters and tools to remove short reads or low quality reads. You can also mask low quality bases (such as in homo-polymers) to Ns without loosing reads.

    Comment

    • Himalaya
      Member
      • Jun 2010
      • 38

      #3
      thanx proteasome for the reply..galaxy is great online tool...problem is uploading files of huge size..

      Comment

      • essvee
        Member
        • Apr 2011
        • 11

        #4
        I suggest trying SeqTrim.
        You can set minimum quality based on a defined window size, minimum length, etc.
        You can also run it command line, or online.

        Comment

        • DZhang
          Senior Member
          • Jun 2010
          • 177

          #5
          Hi,

          Check out fastx toolkits (http://hannonlab.cshl.edu/fastx_toolkit/) and SolexaQA (http://solexaqa.sourceforge.net/). Both have simple but neat scripts to do read trimming.

          Douglas
          Last edited by DZhang; 05-28-2011, 06:41 PM. Reason: correction

          Comment

          • Jose Blanca
            Member
            • Aug 2009
            • 70

            #6
            We have done our own read cleaning pipeline. It works for us, so we have made it available just in case it could be of any use to other people. It is called clean_reads.

            Comment

            • robs
              Senior Member
              • May 2010
              • 116

              #7
              I like PRINSEQ (http://prinseq.sourceforge.net/). It comes as web and standalone version and does all the QC and data pre-processing that you need.

              The application note also contains a short comparison with similar tools (http://bioinformatics.oxfordjournals.../27/6/863.long).

              Comment

              • Himalaya
                Member
                • Jun 2010
                • 38

                #8
                Originally posted by Jose Blanca View Post
                We have done our own read cleaning pipeline. It works for us, so we have made it available just in case it could be of any use to other people. It is called clean_reads.
                Hi Jose Blanca..I installed clean_reads with Biopython and psubprocess preinstalled according to requirement but resulted to segmentation fault. have you run the program? Please advice me about the fault if you run it clean. thank you

                Comment

                • Jose Blanca
                  Member
                  • Aug 2009
                  • 70

                  #9
                  I would need more information. A segmentation fault is quite a strange error in a python program. could you send me the output?

                  Comment

                  • Himalaya
                    Member
                    • Jun 2010
                    • 38

                    #10
                    Originally posted by Jose Blanca View Post
                    I would need more information. A segmentation fault is quite a strange error in a python program. could you send me the output?
                    Hi Jose
                    I am using mac os snow leopard. My commandline is: clean_reads -i Pair01.fastq -o ./clean_reads/output_q20_len50_only3end -p 454 -f fastq -g fastq -qual_threshold 20 -only_3_end True -min_len 50. It only gave me one line error 'segmentation fault' and says python quit unexpectedly in separate window with long error report. A small last part of error report is below:
                    0x7fff8507b000 - 0x7fff85131fff libobjc.A.dylib 227.0.0 (compatibility 1.0.0) <1960E662-D35C-5D98-EB16-D43166AE6A22> /usr/lib/libobjc.A.dylib
                    0x7fff85288000 - 0x7fff85446fff libicucore.A.dylib 40.0.0 (compatibility 1.0.0) <3D9313BF-97A4-6B65-E583-F6173E64C3C2> /usr/lib/libicucore.A.dylib
                    0x7fff8643f000 - 0x7fff86461ff7 libexpat.1.dylib 7.2.0 (compatibility 7.0.0) <7D173736-CBDF-F02F-2D07-B38F565D5ED4> /usr/lib/libexpat.1.dylib
                    0x7fff86462000 - 0x7fff864aaff7 libvDSP.dylib 268.0.1 (compatibility 1.0.0) <98FC4457-F405-0262-00F7-56119CA107B6> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvDSP.dylib
                    0x7fff87df1000 - 0x7fff87df1ff7 com.apple.Accelerate 1.6 (Accelerate 1.6) <15DF8B4A-96B2-CB4E-368D-DEC7DF6B62BB> /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate
                    0x7fff8846a000 - 0x7fff88544fff com.apple.vImage 4.0 (4.0) <B5A8B93B-D302-BC30-5A18-922645DB2F56> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vImage.framework/Versions/A/vImage
                    0x7fff88545000 - 0x7fff88d4ffe7 libBLAS.dylib 219.0.0 (compatibility 1.0.0) <2F26CDC7-DAE9-9ABE-6806-93BBBDA20DA0> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
                    0x7fffffe00000 - 0x7fffffe01fff libSystem.B.dylib ??? (???) <40DA878D-6D69-FEA3-398B-BBD80C9BFF46> /usr/lib/libSystem.B.dylib

                    Then i tried to run clean_reads in ubuntu with same command and gave me error:
                    IOError: [Errno 2] No such file or directory: 'ual_threshold'
                    i did specify any file 'ual_threshold'. That was option -qual_threshold' i specified.
                    any advice..please

                    Comment

                    • Jose Blanca
                      Member
                      • Aug 2009
                      • 70

                      #11
                      In mac it won't work, because the binaries shiped inside clean_reads are only for linux.
                      Regarding the linux problem, it's a malformed command line. It should be:
                      --qual_threshold
                      instead of:
                      -qual_threshold
                      Regards.

                      Comment

                      • Himalaya
                        Member
                        • Jun 2010
                        • 38

                        #12
                        Originally posted by Jose Blanca View Post
                        In mac it won't work, because the binaries shiped inside clean_reads are only for linux.
                        Regarding the linux problem, it's a malformed command line. It should be:
                        --qual_threshold
                        instead of:
                        -qual_threshold
                        Regards.
                        Hi Jose, Thanks a lot. In linux it seems to work now..For the same command again, it gives me output " parameter qual_threshold is incompatible with platform long_with_quality". I tested the --qual_threshold value from 10 to 100 and repeatedly gave the same output.
                        any advice on this..Thanks for helping me out to run the program.

                        Comment

                        • Jose Blanca
                          Member
                          • Aug 2009
                          • 70

                          #13
                          You can not use the --qual_thrshold parameter for long reads (sanger or 454). I have to explain that in the documentation a little. For the long reads we trim the bad quality regions by using lucy so the parameter to change would be lucy_error. qual_threshold is used by the short reads quality trimmer.

                          Comment

                          • Himalaya
                            Member
                            • Jun 2010
                            • 38

                            #14
                            Originally posted by Jose Blanca View Post
                            You can not use the --qual_thrshold parameter for long reads (sanger or 454). I have to explain that in the documentation a little. For the long reads we trim the bad quality regions by using lucy so the parameter to change would be lucy_error. qual_threshold is used by the short reads quality trimmer.
                            Hi Jose
                            I am trying to do quality trimming and filtering 454 reads. The adaptors and primers and barcode sequences are already removed.I am not allowed to specific minimum quality threshold to clean bad quality reads. I don't understand why? How does it do quality trimming. Sorry I could not get documentation of clean_reads. And when i specify option -only_3_end True, it gave me error not compatible with platform. So does it mean it trims from 5' and 3' prime ends.

                            thnx

                            Comment

                            • Jose Blanca
                              Member
                              • Aug 2009
                              • 70

                              #15
                              Sorry, I have not explained myself well enough.
                              clean_reads uses two different algorithms for quality trimming. One for long reads (lucy) and a different one for short reads. If you're cleaning long reads, the parameters aplicable are the lucy parameters: lucy_error, lucy_window and lucy_bracket. These are the parameters that you should tweak to modify the cleaning behaviour when dealing with 454 and sanger reads.

                              For illumina and solid we didin't manage to use lucy so we implemented a sliding window trimming function. Its parameters are qual_window, qual_threshold, and only_3_end. That's why these parameters can only be used with short reads.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              34 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              37 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...