Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stop automatic demultiplexing on MiSeq

    Hey guys,

    Our lab does not have any sequencers but we have access to a sequencing core. They recently acquired a couple of MiSeqs and thus far they are not able to give out the index read information (the machine demultiplexes the runs). As we don't have the MiSeq I have no way of knowing how exactly it or more importantly the data processing works, and I have no idea if there is a work around. I know they plan to try to address this starting next week but I'm curious if anybody here has any experience with this? I want the indexing reads along with the sequencing reads, not just the sequencing reads in different files.

  • #2
    Not sure why your core couldnt give you the index reads as well...I can give you some snippets of the fastqs later, but they follow the Illumina conventions as far as I can tell (see the Wikipedia page on FASTQ).

    We demultiplex after the fact on our own, the secondary analysis on the MiSeq is pretty broken for us currently, I'll see if I can dig up a script if you need one.

    Comment


    • #3
      R1.fastq:
      Code:
      @M00182:8:000000000-A0833:1:12:15161:29056 1:N:0:1
      sequencehere
      +
      +114=??0)):??@@@/,6;;=00&55-&)&&0&)&00(+((+8&&)(((+3(((+(+55007&))0(((++()&)&)&)&)&&&&(+((((((((4+((((+4+((+((((++4:((,((+(((+((((++(((+((&+((+((((+((+
      R2.fastq:
      Code:
      @M00182:8:000000000-A0833:1:12:15161:29056 2:N:0:1
      sequencehere
      +
      +8+==22+=@?;+A+<+CA4A+3<A?<<<BCCB@E3)11:?DFGHICHFDHIHGHGEH@FGHIHIEGGGHFEGBC?CDCBB9;ACC@A>CBBDDCDDEEACDDBDDDDDDD@CCDDDDDDDDCC>A@<BCC(9A@AACCDDDDDCC4>A@A
      I1.fastq:
      Code:
      @M00182:8:000000000-A0833:1:12:15161:29056 1:N:0:1
      CTTGTA
      +
      ?@@DDD
      Here are file snippets of the two read files and the index reads, this is NOT using CASAVA, but the MiSeqReporter to generate fastqs (which are not demultiplexed).

      Comment


      • #4
        Thanks, Eric, I will pass this information along. Perhaps they are using CASAVA and not the MiSeqReporter. They just acquired the MiSeqs recently so things are for sure still being figured out.
        Last edited by Heisman; 11-24-2011, 11:09 AM.

        Comment


        • #5
          Originally posted by ECO View Post
          We demultiplex after the fact on our own, the secondary analysis on the MiSeq is pretty broken for us currently, I'll see if I can dig up a script if you need one.
          Hi ECO,
          would be great if you could provide the script you use to demultiplex the MiSeq FASTQ. We just got a MiSeq and I am not keen to use Illumina provided software for any of the analysis downstream of FASTQ generation (bad experience from the GAIIx). Therefore I'll have to demultiplex myself, but as far as I know scripts like the one in the fastx toolkit don't work.
          Any help would be greatly appreciated.
          Cheers
          Seb

          Comment


          • #6
            Seb,

            It's an internal script at my company written by a colleague, I'll see what his feelings are on sharing it. It's modularized inside a bunch of code that interfaces with our LIMS and instruments, so unless you're pretty competent at python it will be difficult to use in in its current form. I'll post back if I can share it.

            I am still dumbfounded that this is a problem on Illumina machines after what...almost 7 years? I still can't access live run metrics outside of some proprietary binary format...so many simple things that have to be reinvented at every customer site.

            </rant>

            Comment


            • #7
              Originally posted by NextGenSeb View Post
              Hi ECO,
              would be great if you could provide the script you use to demultiplex the MiSeq FASTQ. We just got a MiSeq and I am not keen to use Illumina provided software for any of the analysis downstream of FASTQ generation (bad experience from the GAIIx). Therefore I'll have to demultiplex myself, but as far as I know scripts like the one in the fastx toolkit don't work.
              Any help would be greatly appreciated.
              Cheers
              Seb
              Ironically as I made this thread, but tell me what format you need the reads/indexes in and I can almost certainly give you a series of linux commands that will convert the MiSeq output to the format that you need.

              Comment


              • #8
                Thanks for the replies guys

                @ECO:
                My Python is not brilliant, but I am happy to work on that. Would in general be interested in the LIMS integration anyway, as we use the same system and hope to tie it in the workflow. So any tips in that regard are highly appreciated as well. However I understand the complications of internal politics, so please don't rub any noses

                As for Illumina, I have the feeling that they made things even more complicated from the GAIIx to the MiSeq. Just complained to their tech specialist that the only half way convenient method to view the run quality data in realtime (or after the run in fact) is now their windows (!!) based SAV. Moreover there is no way to generate and automated run quality report. Definitely room for improvement there.

                @Heisman:
                All I want is to generate separate fastq files based on index from the one that the MiSeq spits out. After that I can take it through my pipeline. Rumour has it that the new MiSeq reporter software version is able to provide fastq files split by index, although I'd rather not rely on that... So a series of commands would take me a long way.

                Cheers
                Seb

                Comment


                • #9
                  Do you have a script to do this for the data if it was in a different format? If so it would be easier to convert it to that format. If not, I have an idea in mind that will work (basically involves pasting the read 1, read 2, and indexed reads together, along with the qualities, and putting a unique character next to the index, then grepping out that index from the full file and splitting the reads into separate files) that can be put into a bash script. It would not take long to write that but it wouldn't have any functionality that nicer scripts would have (ie, allowing 1bp mismatches).

                  Comment


                  • #10
                    No, I don't have any script yet, and I think fastq is generally a good place to start. Also I think the handling of base-calling errors is important especially as the number of multiplexed samples rises... However appending the index to the front of the read might be feasible as one could use other existing scripts like the one in the fastx toolkit from there... Do you have s.th along these lines already?

                    Comment


                    • #11
                      Originally posted by NextGenSeb View Post
                      No, I don't have any script yet, and I think fastq is generally a good place to start. Also I think the handling of base-calling errors is important especially as the number of multiplexed samples rises... However appending the index to the front of the read might be feasible as one could use other existing scripts like the one in the fastx toolkit from there... Do you have s.th along these lines already?
                      Try something like this (I'm sure expert linux users may have a better way, but this will work):

                      sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1

                      paste [read_1] [indexes_changed_1] | sed 's,[tab_key_here (press "ctrl + v", then tab)],,' > read_1_done

                      And now you're good to go with the index at the end of the read. Then specify that correctly with the barcode splitter (and enter the complement/reverse complement barcodes if necessary for it to work... not sure what is needed).

                      Comment


                      • #12
                        We have a very similar problem! We are getting to grips with a new MiSeq and the data outputs in fastq. We don't really have any way of combining the Read1, Read2 and Index read fastq files for complete analysis. The idea of pasting the Index read to the end of R1 or R2 would be useful. I have very little experience with Linux - what does the first line of script actually do?

                        sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1

                        I've tried it with some sample reads, but got an error about not finding 'indexes'

                        Is galaxy any good for de-multiplexing based on the indexes or are there better scripts to use?

                        Comment


                        • #13
                          Originally posted by smallcompany View Post
                          We have a very similar problem! We are getting to grips with a new MiSeq and the data outputs in fastq. We don't really have any way of combining the Read1, Read2 and Index read fastq files for complete analysis. The idea of pasting the Index read to the end of R1 or R2 would be useful. I have very little experience with Linux - what does the first line of script actually do?

                          sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' indexes | tr '^' '\n' > indexes_changed_1

                          I've tried it with some sample reads, but got an error about not finding 'indexes'

                          Is galaxy any good for de-multiplexing based on the indexes or are there better scripts to use?
                          Dammit, "indexes" should not be there at all. Try:
                          Code:
                          sed -n '2,${p;n;}' [index_read_1] | sed 's,$,^,' | tr '^' '\n' > indexes_changed_1
                          What that does is, with the [index_read_1] input file, it first prints out every other line, starting with the second line. Then, it adds an "^" character to the end of each line, which by itself was probably very confusing and I'm not even sure it works as "^" should denote the start of a line, so if it works great but if not replace the "^" in each instance with an ")". Then, the "tr" command replaces the "^" (or ")") with a new line character.

                          Comment


                          • #14
                            Originally posted by Heisman View Post
                            Hey guys,

                            Our lab does not have any sequencers but we have access to a sequencing core. They recently acquired a couple of MiSeqs and thus far they are not able to give out the index read information (the machine demultiplexes the runs). As we don't have the MiSeq I have no way of knowing how exactly it or more importantly the data processing works, and I have no idea if there is a work around. I know they plan to try to address this starting next week but I'm curious if anybody here has any experience with this? I want the indexing reads along with the sequencing reads, not just the sequencing reads in different files.
                            Hello,

                            If all the multiplexed data belong to your group, you can ask them for the whole run that include .cif (intensity) files, .bcl (base calls) files and .clocs (probably a summary of the intensities?) files.

                            From these, you can generate not-demultiplexed fastq files with CASAVA and then demultiplex the files later.
                            You can also do base-calling with All-Your-Base.




                            For example, this will convert .cif files/.bcl files/.clocs files to fastq files (for dual indexes):

                            HTML Code:
                            sequenceWorld=/rap/nne-790-ab/Instruments/Illumina_HiSeq_1000_Hellbound
                            run=111207_SNL131_0065_AC0947ACXX
                            NSLOTS=8
                            
                            
                            configureBclToFastq.pl \
                            --input-dir $sequenceWorld/$run/Data/Intensities/BaseCalls \
                            --output-dir  $sequenceWorld/$run/Fastq-Sequences \
                            --use-bases-mask Y*,Y*,Y*,Y*
                            
                            cd $sequenceWorld/$run/Fastq-Sequences
                            
                            make -j $NSLOTS



                            Then, you can demultiplex sequences.
                            We use FastDemultiplexer, which allows more mismatches then CASAVA 1.8.2.


                            HTML Code:
                            FastDemultiplexer.py ../../SampleSheet-Nextera.csv  \ 
                            Project_redacted/Sample_lane1 Demultiplexed > stat.txt

                            Sébastien Boisvert

                            Comment


                            • #15
                              Off line processing of MiSeq data

                              We have implemented the software to demux and convert bcl to fastq off line, which is necessary when the number of samples gets too high. Does anyone know how to toggle a MiSeq between doing all the processing during a typical run (a few small genomes) or generating only bcl files during a highly multiplexed run? Beyond a certain number of samples, the MiSeq chokes after the index reads and it takes hours after the run is complete before the fastq files show up in the run folder in MiSeqOutput. Often it fails to even transfer all the fastq files into the run folder of MiSeqOutput. Has anyone else encountered this problem?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X