Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • getting reads not filtered?

    Hi all,
    do you know if there is a way to get the sequences before all the filters are applied?

    I know that I can modify the stringency of quality filtering for TrimBack Valley and quality score trimming (new analysis using a custom template.xml file), but I'm interested in getting sequences even before the dots filtering.

    The reason is that for our last run (4 regions - cDNA) we have 1 region with good results and the 3 others with a high amount of rejected sequences, especially due to dots filter.

    The documentation says that dots can be due to poor chemistry....but in our case, region 3 has ~48800 dots whereas region 1 and 2 more than 130000 (cf attached spreadsheet - extracted from the Filters menu of gsRunBrowser)...I think that if the problem came from the chemistry, we would expect to have the same results for dots for the entire PTP?

    Do you know other possible reasons for these dots to happen?

    so we would like to have the sequences not filtered in order to try to understand what happened.

    thank you very much for your help!!
    Attached Files

  • #2
    Strange behavior for this run indeed. Can you exclude a too high copy-per-bead ratio, did you do a titration?

    To get untrimmed reads, you could use the sfffile command with the -tr option (check the Data processing Manual, page 214 in the October 2008 version) to reset the trimpoints to create new sff files with reads that are not trimmed at all.

    Good luck!

    Comment


    • #3
      Thank you for your reply!!!
      Discussing with my colleagues (I'm a computer scientist) : they have done the titration.
      For region 1, it seems that the result was not so good, they had to incorporate much more DNA.
      For 3 regions (2-3-4), the results were the same - but not as high as expected - but the quality of the reads were not the same (region 3 was higher in quality)

      Concentration of DNA of the 4 samples were under 1µg/µL...

      If it's not clear, I will ask my colleagues to answer your questions.


      I managed to get the untrimmed sequences, really really thanks for your answer!!

      See you

      Gérald

      Comment


      • #4
        You gave me the right way to get untrimmed sequances for the reads whiche passed all quality filters, it helped us, but what I really wanted to do is generate the reads with less stringent filters.

        I managed this by using the runAnalysisFilter soft with a custom fiterTemplate.xml file :
        runAnalysisFilter --pipe=filterTemplate.xml --reg=2 R_2009_03_13_12_07_02_XXXX/D_2009_03_16_14_42_39_XXXX_signalProcessing/

        If it can help

        Comment


        • #5
          Hi Gérald,

          I tried runAnalysisFilter with eidted filterTemplate.xml file. The parameters were changed to either "false" or least stringent, but no success. Something wrong happened and the resulting .fna file is empty.

          Would like to see how you managed it.

          Comment


          • #6
            The current pipeline does not produce all the reads before the filter (including dot). This is inconvenient in my opinion as well.


            One way to get around this issue that current runAnalysisFilter does allow filter parameter to be changed. This works too for dot filter.

            I used below to change the dot filter:
            <dotFlowFractionCutoff>0.1</dotFlowFractionCutoff>

            This changed dot filter into 0.1%.

            By keeping all other parameter same, just change dot filter. One can tell what is going on on the sequence on the issue of dot.

            Comment


            • #7
              I tried runAnalysisFilter with eidted filterTemplate.xml file. The parameters were changed to either "false" or least stringent, but no success. Something wrong happened and the resulting .fna file is empty.

              Would like to see how you managed it.
              Do you have errors in gsRunProcessor_err.log or gsRunProcessor.log files, in the D_ repository?

              could you please attach you xml file?

              One way to get around this issue that current runAnalysisFilter does allow filter parameter to be changed. This works too for dot filter.

              I used below to change the dot filter:
              <dotFlowFractionCutoff>0.1</dotFlowFractionCutoff>

              This changed dot filter into 0.1%.
              OK...didn't find it in the manual. I didn't find the option dotFlowFractionCutoff in the filterTemplate.xml file too (generated as described in page 57 of the Data Analysis Software manual, oct 2008).

              I added this xml node in my filterTemplate.xml file, before <doValleyFilterTrimBack> node. I ran the runAnalysisFilter command....and it is now running...waiting for the results....
              Is that the correct thing to do ?

              Could you please tell me where you find the dotFlowFractionCutoff option?

              Thank you

              Gérald
              Last edited by gerald2545; 04-28-2009, 08:19 AM. Reason: add a question

              Comment


              • #8
                Yes, it is undocumented. But most of quality filter metrics can be modified with runAnalysisFilter.

                DOT parameter can be put inside quality filter block anywhere.

                Just have to remember that I found out that Dot fraction really means dot percentage. "fraction" word probably is typo in code.


                The full parameter can be found in history xml file under regions directory.
                Just go to D_*/regions/

                Then do this:
                unzip 1.cwf

                Then found the history.xml file, which includes full set of quality filter parameters. Most of which can be modified with runAnalysisFilter.

                Comment


                • #9
                  OK....thank you for your precious help.

                  Comment


                  • #10
                    The err message I got from runAnalysisFilter:

                    BaseCaller[Error] External program "ScaleMultiMers_mProc2_0c" returned an error code. (1)

                    Thank you for your help, hlu and gerald.

                    Comment


                    • #11
                      could you please attach your template xml file please?
                      if you generate the filterOnly xml file and execute runAnalysusFilter with this file, do you have the same error?

                      could you insert the command line you execute too?

                      Comment


                      • #12
                        On page 69 of the Data Analysis Software Manual (Oct. 2008) there is a table listing all possible output files and how they are generated. At the bottom it shows that SFF files can be generated for the control reads and all failed reads. It would then be a simple matter of using sffinfo to dump the sequences from these files. It seems like this feature may address Gerald's original query.

                        The note for this feature states that these additional SFF files may be generated by the gsRunProcessor during the Signal Processing step by adjusting the pipeline configuration file. However I can not find any documentation on how to properly adjust this file (presumably signalProcessing.xml or filterTemplate.xml). By dumping a filtering template (gsRunProcessor --template=filterOnly) I see this in the <basecaller> block:

                        <generateControlKeySffFiles>false</generateControlKeySffFiles>

                        which is obviously how you turn on or off the control read SFF. I don't see anything to turn on the failed read SFF. I presume it is just another key with a True/False value but would need to know the exact key name.

                        Does anyone know how to access this feature?
                        Last edited by kmcarr; 05-07-2009, 12:50 PM.

                        Comment


                        • #13
                          Thank you for your message, I found the information in the documentation on page 69 about generating the sff files for failed reads. I also didn't find a way to extract them, even in the history.xml file contained in the cwf file....

                          Won't be here for the next 3 weeks...

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          25 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          21 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X