Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Demultiplexing Casava 1.8 reads

    Hi all,

    My most recent data set consists of a set of fastq files that have already been demultiplexed by Casava 1.8. However, my sequencing core told me that this only allows for a single bp mismatch. Because my barcodes are rather long many reads end up in the unknown indices. However, I designed the barcodes to allow for 5 errors between them and I would now like to utilize this. I can easily look at the fastq files and see the index in the read name header, but I can not find a program that will allow me to demultiplex based on this format. My core doesn't keep the original reads, so I am stuck having to demultiplex reads that have already been demultiplexed by Casava. If anyone knows of a program that can accomplish this that would be great.

  • #2
    Hi Metz,
    Try the fastx_barcode_splitter.pl from the FASTX toolkit:
    http://hannonlab.cshl.edu/fastx_toolkit/
    You can specify the number of mismatches and for a one off analysis it is reasonably quick.

    Comment


    • #3
      Thanks for the response Frank, but I've already tried fastx. Unless I'm mistaken, it requires that the barcode be part of the read, not in the read name. I've thought about just writing a script to reintroduce the barcode back into the read, but that also requires adding in 'mock' quality scores for those bases and other changes to the read name. My perl is several years out of use, and I'm trying to prevent reinventing the wheel.

      Comment


      • #4
        I assume you are looking to parse reads from the "Undetermined" reads file which would have the reads with more than 1 mismatch.

        Rather than re-introducing the tags back in the reads it would be more efficient to enumerate all "tags" that are in your "undetermined tags" file and decide the ones you want keep/extract.

        Comment


        • #5
          Do you know of a program that can do something along those lines?

          Comment


          • #6
            The following should work for CASAVA v1.8 fastq files (the grep expression may need to be modified to match your machine name):

            zcat name.of.fastq.gz | grep ‘^@HWI’ | cut -d : -f 10 | sort | uniq -c | sort -nr > indices.txt
            Last edited by HESmith; 01-25-2012, 08:33 AM. Reason: clarification

            Comment


            • #7
              Thanks for the code. It works for giving me a list of barcodes and their counts. I can definitively tell which barcode the most abundant ones belong too. However, I'm not sure how to proceed with this. I just don't understand enough perl to move forward quickly. However, if there isn't another solution, I guess that is the way to go. I just can't believe that nobody else has had this problem in the last year.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X