Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FastX-toolkit

    I am going to clean some exome sequence data (paired-end) generated by Illumina using FastX-toolkit. Could you please suggest a good procedure, for instance,
    1. fastx_clipper
    2. fastx_quality_filter
    3. fastx_quality_trimmer
    ...?
    I am confused about the steps as well as the order.
    Thanks very much!

  • #2
    I'm actually confused about this too...
    Is it better to do clipper first, and then quality filter?
    When determining where to clip, do you look at the fastQC results? For example if my graph of quality scores across all bases and I see that after position 85 QC falls below 20, should I just clip all reads at 85?
    Thanks!

    Comment


    • #3
      Originally posted by odoyle81 View Post
      I'm actually confused about this too...
      Is it better to do clipper first, and then quality filter?
      When determining where to clip, do you look at the fastQC results? For example if my graph of quality scores across all bases and I see that after position 85 QC falls below 20, should I just clip all reads at 85?
      Thanks!
      I would not hard clip based on just Fastqc. Remember it's just showing you the distribution of quality scores, and you will have plenty of reads that have good quality all the way through. As for whether to even trim the reads at all, that depends. Could you provide more details about your library and what you plan to do with the reads?

      Comment


      • #4
        We have a couple different projects:

        1. A population of mutants segregating for a phenotype... we want to locate the deletion, so I want to use one of these programs to do that (pindel, svseq, cortex (just learned about that one today)).
        2. We also want to do a reference alignment with another sample.. I was going to use BWA..

        We have Illumina 100bp PE reads.

        If I trim then quality filter, I keep 74% of reads
        If I just quality filter then I keep 70% of reads
        I was trimming to 89bp and quality filter q=20 p=80

        I thought it was really important to QC the reads before further processing?
        Last edited by odoyle81; 03-09-2012, 09:42 AM.

        Comment


        • #5
          Originally posted by odoyle81 View Post
          I thought it was really important to QC the reads before further processing?
          It is important to QC the reads, but it might not be necessary to trim the reads based on quality. Most aligners are aware of the quality of the bases and will take that into account when mapping. BWA is a good example since it can soft clip reads. If you do trim off low quality tails with PE data and map with BWA, you might even get worse results than if you just map the reads without trimming them. BWA can have a hard time determining the size distribution of the insert if you do quality trimming.

          Comment


          • #6
            Thanks for that perspective!
            So after quality filtering, I will probably lose some of the reads from pairs. I've been reading about how remove the orphaned reads. Does everyone do this with custom scripts or is there a tool for this?

            Comment


            • #7
              Two things:
              1) The best way to get an idea of the best way to trim is to trim a couple different ways and see which aligns the best. While it is probably too time consuming to do this for all data sets, it's informative to kind of get an idea what things are doing.
              2) You might want to take a look at Trimmomatic. It is way faster than the FASTX Toolkit.
              --------------
              Ethan

              Comment


              • #8
                Just a comment for odoyle81 about using Cortex - you should not need to pre-quality filter the reads for Cortex (unless you have massive massive coverage, in which case it will do no harm I guess). Just use the inbuilt error-cleaning mechanisms, and it should work just fine.

                Comment


                • #9
                  fastx_barcodes_splitter issue with the run

                  Hi,

                  I saw the post and I hope maybe some of you can help me

                  When I run fastx_barcode_splitter.pl with this script

                  /usr/local/bin/fastx_barcode_splitter.pl --bcfile ./Barcodes9nt.txt --prefix ./Rescued9nt --suffix .fq –bol

                  In the command line it looks like is running (no error message, no > sign), see attachment for screenshot.
                  However is not running at all, I can see with top that is not using any memory or CPUs and it has been ‘running’ for days on a very small file without producing any results.
                  The input file is in the STDIN folder as supposed to.

                  I would be very grateful if you could suggest what might be wrong.
                  Thanks in advance
                  Vivi

                  Comment


                  • #10
                    Unfortunately I can't advice on why that isn't working for you, but I would recommend you just write your own script, or try to find one on the internet - most of the FASTX tools are out of date and not updated and don't work that well. For example, this looks like one that might work:
                    The Python Package Index (PyPI) is a repository of software for the Python programming language.

                    If you google, you should be able to find a bunch, as it is a pretty simple operation that needs to be done.
                    I can't offer much support, and maybe this isn't the most efficient way to do it (it is kinda slow), but the one I wrote is here:

                    In any case, learning to write your own will allow you to adapt to your specific needs.

                    hope that helps.

                    Comment


                    • #11
                      Thank you very much!!!

                      Comment


                      • #12
                        fastx_trimmer: input file (/BJPROJ/Data_production/HiseqX/140807_ST-E00142_0036_BH04CYALXX/DHE00358/DHE00358_L5_2.fq.gz) has unknown file format (not FASTA or FASTQ), first character = ^_ (31) ????

                        Comment


                        • #13
                          if fastx-toolkit can read gzip file?

                          fastx_trimmer: input file (/BJPROJ/Data_production/HiseqX/140807_ST-E00142_0036_BH04CYALXX/DHE00358/DHE00358_L5_2.fq.gz) has unknown file format (not FASTA or FASTQ), first character = ^_ (31)
                          what is the reason to this error?

                          Comment


                          • #14
                            See: https://www.biostars.org/p/83237/

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            30 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            32 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            28 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X