Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Demultiplexing Casava 1.8 reads

    Hi all,

    My most recent data set consists of a set of fastq files that have already been demultiplexed by Casava 1.8. However, my sequencing core told me that this only allows for a single bp mismatch. Because my barcodes are rather long many reads end up in the unknown indices. However, I designed the barcodes to allow for 5 errors between them and I would now like to utilize this. I can easily look at the fastq files and see the index in the read name header, but I can not find a program that will allow me to demultiplex based on this format. My core doesn't keep the original reads, so I am stuck having to demultiplex reads that have already been demultiplexed by Casava. If anyone knows of a program that can accomplish this that would be great.

  • #2
    Hi Metz,
    Try the fastx_barcode_splitter.pl from the FASTX toolkit:
    http://hannonlab.cshl.edu/fastx_toolkit/
    You can specify the number of mismatches and for a one off analysis it is reasonably quick.

    Comment


    • #3
      Thanks for the response Frank, but I've already tried fastx. Unless I'm mistaken, it requires that the barcode be part of the read, not in the read name. I've thought about just writing a script to reintroduce the barcode back into the read, but that also requires adding in 'mock' quality scores for those bases and other changes to the read name. My perl is several years out of use, and I'm trying to prevent reinventing the wheel.

      Comment


      • #4
        I assume you are looking to parse reads from the "Undetermined" reads file which would have the reads with more than 1 mismatch.

        Rather than re-introducing the tags back in the reads it would be more efficient to enumerate all "tags" that are in your "undetermined tags" file and decide the ones you want keep/extract.

        Comment


        • #5
          Do you know of a program that can do something along those lines?

          Comment


          • #6
            The following should work for CASAVA v1.8 fastq files (the grep expression may need to be modified to match your machine name):

            zcat name.of.fastq.gz | grep ‘^@HWI’ | cut -d : -f 10 | sort | uniq -c | sort -nr > indices.txt
            Last edited by HESmith; 01-25-2012, 08:33 AM. Reason: clarification

            Comment


            • #7
              Thanks for the code. It works for giving me a list of barcodes and their counts. I can definitively tell which barcode the most abundant ones belong too. However, I'm not sure how to proceed with this. I just don't understand enough perl to move forward quickly. However, if there isn't another solution, I guess that is the way to go. I just can't believe that nobody else has had this problem in the last year.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X