Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with De-Multiplexing MiSeq Data

    Hello,

    I cannot find a decent program or script to de-multiplex data where I have 3 fastq files: XXX_R1.fastq, XXX_R2.fastq, and XXX_I.fastq. The Index file has the same structure as a fastq and shares all the read hashes, but only has the barcode; I want to split the R1 and R2 files based on this barcode.

    Any Suggestions?

    Thanks.

  • #2
    I think you will find there is a reference to the barcode/sample at the end of the read name for each read. That might help.
    Last edited by Bukowski; 08-10-2012, 10:24 AM.

    Comment


    • #3
      This thread may help:
      Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

      Comment


      • #4
        also it looks like picard can do this:

        Comment


        • #5
          Originally posted by celzinga View Post
          also it looks like picard can do this:
          http://picard.sourceforge.net/comman...luminaBarcodes

          Um. I don't see how that tool has anything to do with this problem. I don't need to extract the barcodes at all. I have three fastq files. First fastq is the barcodes already, I.E.:

          Code:
          @M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
          AACCGAGA
          +
          ?AAAAAAB
          @M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
          AACCGAGA
          +
          ???A?@@B
          @M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
          AACCGAGA
          +
          A?AAAAAA
          @M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
          AAACATCA
          +
          AAAAABBB
          Then the two files for both paired ends...I.E.:

          Code:
          @M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
          NCGGGCACGACCATCACCATCATCATACGACGAACCAACGGGCATTATTCTGGTCGTTCGTCCTGATTGCGACGTTCATGGTCGTCGAAGTCATCGGCGGATTATGGACGAACAGTTTTGCGCTCTTGTCGGACGCCGGGCATATGCTTAG
          +
          #5<???AADDEEEDDDGGGGGGIIIIIIIIHHHHHHIIHHHHHHIIIIIIIIHIIHHHIHHHHHIIIIHHHHHHHHFHHHHHHGGFGGGGGGGGGGEGGG'.8:C*CCCD4A''*1CE*0:8'4C.:*:?)''.'.'.''2'**0*1:?:1
          @M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
          NCATACGTACCACCGATGACACCACCGACAAGCGGAACCATCTTCCCAAGATTAACGACCCCCGTATTCCCGAACTTCGTCAATAAGCGGAATCCGACTTTCTGATTGATTTTTTTGATGGTCGATCCAGGAATCTTCTTAATCATATTGA
          +
          #5<???BBDDDDDEDDFEFFFFIIIHHHHHHHIHHEHHIHIIIIIIIIIIIIIIIIHHHHHHHHDCFHHFHHHEHFDFH?DF;DFFDFEE=EFFA?A@BAEEFFEEEF=ABA?:8>DACAECEDD8A8*?*0:CCA0*::C*:ACA*:E:*
          @M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
          NTCCGCGTGACGGCGATGCCAGAGCGACGGGCCGCCTCGACGTTCGAGCCGACGTAATAAAACTCACGTCCTGTCTTCGAATACGTCAAAAACAGATGCGCCCCGGCGAAGAACAGAAGCATCAAGATGGCGACGAACGGGACAGGTCCGT
          +
          #5<???@@DDDDDDDDEEEFFFHHIHHHHHHHHHHHHHHHHHHHHEFHHHHEFFEFFEFFEEFFFFFFFFEEFFFFFFFEFFEFFFEE8A:CEEFEFEFDEADD?DDD'8>8?C:?E:*?:CAE0?::**:2'8;>2>').?8A))1*0'*
          @M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
          NATCGGAAGAGCACACGTCTGAACTCCAGTCACAAACATCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAGACAGAACGAGACAAAAGAAGCACAAATCCGTAATCGATGAGACTTAATGCGAGATCATGACACCATTGTAA
          +
          #5<???AAEDEDDDDDGGGGGGIIIIIIIIIIIIIIIIIIIIIIIIHHIIIIHHHIIIIIIIIHHIIIIHHHHHD4)42**,,,,,,***3*,4,,,*4,,,3,0****)0*))*)0.************)).'0*1******)*******
          and the according mate-pairs of all of those.

          I do not want three files as they are. I know which barcodes go with which hashes.

          RUN1_I1.fastq
          RUN1_R1.fastq
          RUN1_R2.fastq

          Need to be converted into...

          RUN1_R1_AACCGAGA.fastq
          RUN1_R2_AACCGAGA.fastq
          RUN1_R1_AAACATCA.fastq
          RUN1_R2_AAACATCA.fastq

          etc etc.

          Personally I am beyond flabbergasted that the output of this damnable thing is not the same as the HiSeq - I just want the fastqs sorted by the barcode, it does nothing for me the user to have the barcode/has pairs in a separate file.

          Comment


          • #6
            Did you get this run at a core facility? I am not sure why that facility did not do the de-multiplexing for you. It should be trivial for them to do this since they would have access to the raw data folder and CASAVA pipeline.

            Comment


            • #7
              Hi,

              I've attached my approach to demultiplexing the MiSeq files. Note that it uses the MiSeq assigned sample idx to name the output files, NOT the barcode. This means you get all reads for the sample, also those with a mismatch in the barcode. It outputs three files per sample: forward reads, reverse reads, and interlaced reads. We use the interlaced reads in galaxy for batch workflow starting.

              For files:
              RUN1_I1.fastq
              RUN1_R1.fastq
              RUN1_R2.fastq

              Run as:
              perl demultiplex_miseq.pl RUN1

              Output will be in 'output/' folder. It will also create a file containing all barcodes used per sample, and print the read count per sample.
              Attached Files

              Comment


              • #8


                or

                Galaxy is a community-driven web-based analysis platform for life science research.

                Look under NGS Toolbox Beta, NGS: QC and manipulation

                Barcode splitter and other FASTQ manipulations

                Comment


                • #9
                  What is an interlaced read?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X