Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Re-implementing the Chastity Filter algorithm from SCS 2.6 on fastq files?

    We have some older runs we did with SCS 2.6 and CASAVA 1.6. We no longer have the qseq or pretty much any other files, but we do have fastqs. Unfortunately, the fastqs were not filtered for the reads that passed the chastity filter, and that data is not in the fastq file itself.

    Does anyone know where I can find the documentation that has the criteria SCS 2.6 used to tell if a read passed filter so that I can write a script to re-evaluate the reads in the fastq files? Or does anyone know of another solution to making these fastq files usable again?

    Thanks for the help.

  • #2
    Chastity (and Purity) filters are based on relative fluorescent signal intensities of each of the 4 bases at each cycle (well, up to cycle 25) of each cluster. Unless you have the original intensity data there is no way to do Chastity filtering at this point.

    Failing that I would suggest you just perform more standard Q-score based trimming/filtering using something like Trimmomatic.

    Comment


    • #3
      Ah, good to know. We do not have the original intensity data anymore, so I'll just use Trimmomatic.

      Thanks.

      Comment


      • #4
        There used to be a flag in the old FASTQs that indicated pass/fail for chastity filtering (I believe it was the last field of the identifier). At one point it was 1/0, then changed to N/Y (although my recollection is that Y meant it FAILED chastity). You should be able to tell by examining reads with all Q-scores = 0 (those invariably fail chastity).

        Comment


        • #5
          I don't think there are in this one. An example from the file:

          @NA328MB44S:1:1:1036:18513#0/1
          CACACCATTAGTTAATGCCACATCTCCCACACCTACAATTTAGAGATCGGAAGAGCGGTTCAGCAGGAACGACGA
          +
          a\_aaaaaaYXY]OX]UU]UL[YY[_``Y_]]R`]``\_````Q_LYY[T\S\P]]_WSM``\`BBBBBBBBBBB


          If I'm reading the documentation correctly, the first line only tells me computer/positioning data, and then multiplexing/mate pair information. I've looked for identifiers based on other projects where we do still have the old information, but I don't think there's any trace of it in these because of the way the person who did these runs did the conversions (These reads are from 2010 and he no longer works here).

          Comment


          • #6
            You could ask the facility that ran the original to see if they kept a backup of the original run data (we do). Then you would be all set.

            Edit: Perhaps you *are* in the facility that did the original runs.
            Last edited by GenoMax; 07-24-2013, 10:23 AM.

            Comment


            • #7
              Originally posted by sperez317 View Post
              I don't think there are in this one. An example from the file:

              @NA328MB44S:1:1:1036:18513#0/1
              ...
              The last character of the identifier (after '/') is the chastity flag.

              Comment


              • #8
                If that's the case (I thought it was pair membership), then something went wrong in our initial conversion, since every line has a 1 at the end, even those such as:

                @NA328MB44S:1:1:991:17986#0/1
                NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                +
                BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

                which are obviously not good reads.

                Trimmomatic is making them usable for our purposes, so it's working out.

                Comment


                • #9
                  Okay, that's good to know.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X