Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • View all Raw reads?

    I have experience with re-processing 454 data using the custom xml file, but I'm wondering if there's any relatively easy way to pass ALL of the raw well reads into an sfffile without any filtering or trimming. Does anyone have experience doing this? Is there any other software package that can read the cwf file that's generated in the R-directory?

    Simon

  • #2
    Do you have the software manual? It tells you how you can turn off all of the filters and reprocess the data if that is what you are after.

    Comment


    • #3
      This thread seems vaguely familiar....

      I've been through and through SeqAnswers and various blogs looking for this, my boss keeps asking me for it. I tried the method in the thread above and it didn't work for me, this is with 16S amplicon data.

      Roche technical assistance hasn't responded, it's been months. Any Roche people out there? I'll bake you cookies...

      We have had some success changing emulsion PCR conditions- cutting primer concentration and using fewer template molecules per bead. At least we're up to ~40% passing filter. I would be interested to know how others are doing - this is Titanium with the HMP primers.

      Comment


      • #4
        Originally posted by cliffbeall View Post
        This thread seems vaguely familiar....

        I've been through and through SeqAnswers and various blogs looking for this, my boss keeps asking me for it. I tried the method in the thread above and it didn't work for me, this is with 16S amplicon data.

        Roche technical assistance hasn't responded, it's been months. Any Roche people out there? I'll bake you cookies...

        We have had some success changing emulsion PCR conditions- cutting primer concentration and using fewer template molecules per bead. At least we're up to ~40% passing filter. I would be interested to know how others are doing - this is Titanium with the HMP primers.
        Cliff,

        Our lab a has banged its collective forehead against the wall trying to improve the output of 16S amplicon libraries on the 454, especially problematic are the HMP amplicons you mentioned. Frankly I would be ecstatic with 40% passing HMPs so maybe you should be the one advising the rest of us!

        We have tried all of the recommendations from 454, reduced cpb, reduced primer and extended emPCR cycles. These work...sometimes...sort of. This is the most frustrating aspect of the whole thing, that nothing seems to be consistent or reproducible.

        We did get a protocol from Roche to identify "hidden" short fragments within amplicon libraries and rescue the libraries by a few additional rounds of PCR. This procedure so far has shown the most consistent good results. Have you seen this document from Roche? If not I'll see if I can find a copy. (I'm the computer guy, not the lab guy so I don't know the details off the top of my head.)

        Comment


        • #5
          Indeed my previous thread was similar, but in that case I was eventually able to turn off most of the filters and find my missing A reads (they were chimeric products generated with one of the Roche amplification primers).

          My current issue is that, even when turning off the filters I don't recover substantially more reads. Maybe it's the wrong combination of filter settings in the xml file. The combinations that I've tried sometime give me more reads, but then truncate all the reads to an unacceptably short length.

          I've been tempted to write a script to iterate through each possible xml filter combination, let it run for 5 days, and then see which settings give me the most, and longest reads. For my application, I don't care if the reads have homopolymers or other errors in them, I just need to see what's being generated by the machine for troubleshooting.

          I too have asked Roche about this and not gotten any response. My impression is that their 454 informatics unit is tiny and overworked, but still...

          BTW, we are using Titanium on the GS-Jr, and non-16S amplicon sequencing. Typical filterpass %s are around 25-40%.

          Comment


          • #6
            It's impossible to completely turn off the filters. The pipeline is built so that quality filtering is not optional. Each individual filter may or may not be completely bypassed. At the basecalling step, the passed reads and bases are used to generate thresholds for basecalling and do one final scaling of the data (to make it finally line up properly on the n-mer scale defined by the thresholds). Therefore when you relax the quality filtering as much humanly possible, the end result suffers because you have a lot of lower quality data going into this last thresholding and scaling part. I.e. any given read may be less accurately basecalled when you relax the quality filtering because you're allowing more dubious data to influence the final steps.

            That said, if you look in the software manual for the latest version, you can find on page 25 the following XML to turn off most of the filters:

            Keypass filter:
            <doClassifierCheck>false</doClassifierCheck>
            Dot filter:
            <doDotCheck>false</doDotCheck>
            Mixed filter:
            <doMixedCheck>false</doMixedCheck>
            Signal intensity filter:
            <doShortSignalCheck>false</doShortSignalCheck>

            Hoever this is not all the filters.
            You should also set:
            <filterToUse>false</filterToUse>

            Another one that isn't listed is the quality score trimming, which happens during basecalling, and so needs to be included in the basecalling block of the xml, not the quality filtering block:
            <doQScoreTrim>false</doQScoreTrim>

            Moving back to the quality filtering block proper, the valley filter also needs to be addressed. Unfortunately it's complicated (when will they redesign it already??) and can't just be turned off completely. The specific changes also probably depend on whether it is in trimBack mode (i.e. shotgun pipeline) or not (i.e. amplicon pipeline).

            Parameters to adjust would be:
            <vfTrimBackScaleFactor> make it as low as possible.
            <vfLastFlowToTest> make it as low as possible.
            <vfScanLimit> make it as low as possible.
            <vfBadFlowThreshold> make it as high as possible.

            Unfortunately I don't know the definition of "high (or low) as possible".

            Finally there are at least 3 different parameters that affect the minimum trimmed length allowed before a read is discarded, which should all be set to 1 (which reflect flows or bases depending on the parameter).
            <minLength>
            <vfTrimBackMinimumLength>
            <QScoreTrimMinLength> (This one goes in the basecalling block, not the quality filtering block)

            If anyone has success with this, posting the actual pipeline file would be most appreciated since obviously communicating and making all these edits is error-prone and the pipeline itself would be the best documentation.

            Comment


            • #7
              kmcarr-
              We did try that detection protocol - it's in Roche Technical Bulletin TCB No. 2011-002, but we didn't find short fragments in our failing library.

              Maven-
              Thanks for the suggestions. I have tried the first 4 ones in your list, that was RCJK's suggestion from the earlier thread, but I don't think I have tried the <filterToUse> tag.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM
              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-14-2024, 06:13 AM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-08-2024, 08:03 AM
              0 responses
              71 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-07-2024, 08:13 AM
              0 responses
              80 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-06-2024, 09:51 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X