Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • View all Raw reads?

    I have experience with re-processing 454 data using the custom xml file, but I'm wondering if there's any relatively easy way to pass ALL of the raw well reads into an sfffile without any filtering or trimming. Does anyone have experience doing this? Is there any other software package that can read the cwf file that's generated in the R-directory?

    Simon

  • #2
    Do you have the software manual? It tells you how you can turn off all of the filters and reprocess the data if that is what you are after.

    Comment


    • #3
      This thread seems vaguely familiar....

      I've been through and through SeqAnswers and various blogs looking for this, my boss keeps asking me for it. I tried the method in the thread above and it didn't work for me, this is with 16S amplicon data.

      Roche technical assistance hasn't responded, it's been months. Any Roche people out there? I'll bake you cookies...

      We have had some success changing emulsion PCR conditions- cutting primer concentration and using fewer template molecules per bead. At least we're up to ~40% passing filter. I would be interested to know how others are doing - this is Titanium with the HMP primers.

      Comment


      • #4
        Originally posted by cliffbeall View Post
        This thread seems vaguely familiar....

        I've been through and through SeqAnswers and various blogs looking for this, my boss keeps asking me for it. I tried the method in the thread above and it didn't work for me, this is with 16S amplicon data.

        Roche technical assistance hasn't responded, it's been months. Any Roche people out there? I'll bake you cookies...

        We have had some success changing emulsion PCR conditions- cutting primer concentration and using fewer template molecules per bead. At least we're up to ~40% passing filter. I would be interested to know how others are doing - this is Titanium with the HMP primers.
        Cliff,

        Our lab a has banged its collective forehead against the wall trying to improve the output of 16S amplicon libraries on the 454, especially problematic are the HMP amplicons you mentioned. Frankly I would be ecstatic with 40% passing HMPs so maybe you should be the one advising the rest of us!

        We have tried all of the recommendations from 454, reduced cpb, reduced primer and extended emPCR cycles. These work...sometimes...sort of. This is the most frustrating aspect of the whole thing, that nothing seems to be consistent or reproducible.

        We did get a protocol from Roche to identify "hidden" short fragments within amplicon libraries and rescue the libraries by a few additional rounds of PCR. This procedure so far has shown the most consistent good results. Have you seen this document from Roche? If not I'll see if I can find a copy. (I'm the computer guy, not the lab guy so I don't know the details off the top of my head.)

        Comment


        • #5
          Indeed my previous thread was similar, but in that case I was eventually able to turn off most of the filters and find my missing A reads (they were chimeric products generated with one of the Roche amplification primers).

          My current issue is that, even when turning off the filters I don't recover substantially more reads. Maybe it's the wrong combination of filter settings in the xml file. The combinations that I've tried sometime give me more reads, but then truncate all the reads to an unacceptably short length.

          I've been tempted to write a script to iterate through each possible xml filter combination, let it run for 5 days, and then see which settings give me the most, and longest reads. For my application, I don't care if the reads have homopolymers or other errors in them, I just need to see what's being generated by the machine for troubleshooting.

          I too have asked Roche about this and not gotten any response. My impression is that their 454 informatics unit is tiny and overworked, but still...

          BTW, we are using Titanium on the GS-Jr, and non-16S amplicon sequencing. Typical filterpass %s are around 25-40%.

          Comment


          • #6
            It's impossible to completely turn off the filters. The pipeline is built so that quality filtering is not optional. Each individual filter may or may not be completely bypassed. At the basecalling step, the passed reads and bases are used to generate thresholds for basecalling and do one final scaling of the data (to make it finally line up properly on the n-mer scale defined by the thresholds). Therefore when you relax the quality filtering as much humanly possible, the end result suffers because you have a lot of lower quality data going into this last thresholding and scaling part. I.e. any given read may be less accurately basecalled when you relax the quality filtering because you're allowing more dubious data to influence the final steps.

            That said, if you look in the software manual for the latest version, you can find on page 25 the following XML to turn off most of the filters:

            Keypass filter:
            <doClassifierCheck>false</doClassifierCheck>
            Dot filter:
            <doDotCheck>false</doDotCheck>
            Mixed filter:
            <doMixedCheck>false</doMixedCheck>
            Signal intensity filter:
            <doShortSignalCheck>false</doShortSignalCheck>

            Hoever this is not all the filters.
            You should also set:
            <filterToUse>false</filterToUse>

            Another one that isn't listed is the quality score trimming, which happens during basecalling, and so needs to be included in the basecalling block of the xml, not the quality filtering block:
            <doQScoreTrim>false</doQScoreTrim>

            Moving back to the quality filtering block proper, the valley filter also needs to be addressed. Unfortunately it's complicated (when will they redesign it already??) and can't just be turned off completely. The specific changes also probably depend on whether it is in trimBack mode (i.e. shotgun pipeline) or not (i.e. amplicon pipeline).

            Parameters to adjust would be:
            <vfTrimBackScaleFactor> make it as low as possible.
            <vfLastFlowToTest> make it as low as possible.
            <vfScanLimit> make it as low as possible.
            <vfBadFlowThreshold> make it as high as possible.

            Unfortunately I don't know the definition of "high (or low) as possible".

            Finally there are at least 3 different parameters that affect the minimum trimmed length allowed before a read is discarded, which should all be set to 1 (which reflect flows or bases depending on the parameter).
            <minLength>
            <vfTrimBackMinimumLength>
            <QScoreTrimMinLength> (This one goes in the basecalling block, not the quality filtering block)

            If anyone has success with this, posting the actual pipeline file would be most appreciated since obviously communicating and making all these edits is error-prone and the pipeline itself would be the best documentation.

            Comment


            • #7
              kmcarr-
              We did try that detection protocol - it's in Roche Technical Bulletin TCB No. 2011-002, but we didn't find short fragments in our failing library.

              Maven-
              Thanks for the suggestions. I have tried the first 4 ones in your list, that was RCJK's suggestion from the earlier thread, but I don't think I have tried the <filterToUse> tag.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Working...
              X