Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pre-filtering before mapping?

    Hi,
    We have some mated paired end sequences (50bp) to be mapped to the reference genome for SNP (and maybe indel) discovery.
    The data I have are csfasta and qual files.
    Could any one let me know if I need to do pre-filtering for the sequences before I use any software to map them?
    If I use bowtie, I should remove the orphan reads (and maybe try to map the orphan reads using a different parameter set).
    If I use BFAST, should I do the same?

    If I do need to filter the sequences based on the quality score, what's the cut-off threshold people normally use? Average of Q10?
    How to translate the quality score to the % error rate like the Phred score?

    Thanks!
    Nan

  • #2
    csfasta_quality_filter.pl this script will be used about qulity contorl in solid.
    you can find this script by google

    Comment


    • #3
      Hi,

      I am on index step, but apparently there is an error:

      In function "RGIndexLayoutCreate": Fatal Error[OutOfRange]. Message: Layout must begin with a one.

      Could someone help me with this problem???

      Comment


      • #4
        Originally posted by fenciso View Post
        Hi,

        I am on index step, but apparently there is an error:

        In function "RGIndexLayoutCreate": Fatal Error[OutOfRange]. Message: Layout must begin with a one.

        Could someone help me with this problem???

        you mean the indexing of the reference genome using bowtie?

        Below is the command I use (assuming bowtie-build and all_reference.fa are in the same folder)

        ./bowtie-build -C -f all_reference.fa reference_Color

        Comment


        • #5
          Originally posted by fanyucai1 View Post
          csfasta_quality_filter.pl this script will be used about qulity contorl in solid.
          you can find this script by google
          Thanks!
          I downloaded the program. It has many parameters to use for trimming. Can anyone tell me normally what value they use for filtering (if any)? Thanks!

          1. num_colors_to_hard_trim
          2. min_median_qv
          3. max_bad_colors_in_first_ten
          4. max_number_bad_colors
          5. num_consec_colors_to_trim
          6. trim_terminal_bad_colors
          7. min_read_length

          Comment


          • #6
            solid not supply the parameters, choose them by yourself, i advice you can statistics raw data before using last script.

            Comment


            • #7
              Originally posted by fanyucai1 View Post
              solid not supply the parameters, choose them by yourself, i advice you can statistics raw data before using last script.
              Thanks! I am new in the area and would like some advice on choosing the values for those parameters, such as: Minimum QV value for a single color.

              As to the statistics, besides median/average quality score, what else should I look into?

              Thanks!

              Comment


              • #8
                there is a paper called :Analysis of quality raw data of second generation sequencers with Quality Assessment Software. you can find it by google. it is very simple ,it wil help you .
                some parameters contains: min \max\Q20\mean\median you should consider

                Comment


                • #9
                  I downloaded and executet csfasta_quality_filter.pl script on my SOLiD 5500 csfasta (and qual) data to trim a fixed number of colors. ( code below)

                  perl csfasta_quality_filter.pl -f F5.cs fasta -q F5.QV.qual -o 5_bases_trimmed_F5.csfasta --num_colors_to_hard_trim 5

                  I was wondering about the output file, from the manual (Filtered and trimmed reads are output in csfasta format to a user-specified filename). And where is my associated QV.qual file trimmed? Any ideas? I cannot supply TopHat or any other alignment software with a trimmed csfasta and a full lenght associated quality file..

                  Comment


                  • #10
                    Originally posted by paolo.kunder View Post
                    I downloaded and executet csfasta_quality_filter.pl script on my SOLiD 5500 csfasta (and qual) data to trim a fixed number of colors. ( code below)

                    perl csfasta_quality_filter.pl -f F5.cs fasta -q F5.QV.qual -o 5_bases_trimmed_F5.csfasta --num_colors_to_hard_trim 5

                    I was wondering about the output file, from the manual (Filtered and trimmed reads are output in csfasta format to a user-specified filename). And where is my associated QV.qual file trimmed? Any ideas? I cannot supply TopHat or any other alignment software with a trimmed csfasta and a full lenght associated quality file..
                    This script could not output the .qual file after quality-contorl ,you could choose it from raw file according csfasta file .

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    19 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    50 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X