Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Retrieve MiSeq Data still containing index primers etc

    Hi all

    We have a pipeline that we have developed that currently works on both 454 and Ion Torrent data.

    The pipeline is always run on multiplexed data and the sequence input is currently a fastq file that contains all of the information to undertake the demultiplexing of the data and all subsequent analysis is run on all the data from each MID separately.

    Collaborators have now generated similar data using an Illumina MiSeq however when they sent us the data we see that the data is already demultiplexed with tags etc stripped.

    What I want to know is there anyway that a single fastq/sff/etc file can be created during a MiSeq run from the output data (or during data generation) that contains the MIDs etc still on the data and has all the data together in one file?

    I've done extensive reading on this and it seems that the best way to do this is to convert the multiple .bcl files to fastq?

    Is there a better/easier way to do this?

    Thanks!

  • #2
    One can potentially create a single file from a multiplexed run by running CASAVA pipeline with a single barcode like (NNNNNN-NNNNN). This way all data ends up in "undetermined" file along with all tag information intact in the header. You would need to write a script to de-multiplex this data or reformat it in a way your pipeline expects it. This could potentially work on the MiSeq itself (for analysis) though we have not tried it.

    If you are looking to get the tag reads in a separate file (e.g. Qiime) then one can reanalyze the data using the MiSeq reporter after making a change to the "MiSeqReporterConfig" file. Unless you own the MiSeq (or have direct access to it) this may not be an option for you.
    Last edited by GenoMax; 06-05-2014, 05:21 AM.

    Comment


    • #3
      Thanks for your reply, very helpful!

      I hadn't realised that it was possible to output the index information in the header. In the data I've just received the last element of the descriptor is a number as opposed to the index sequence. This may be related to the fact that both the i5 index and i7 index were used?


      It appears to link to the order in which the sequences are listed in the SampleSheet.csv file:

      e.g.
      1st @M02143:21:000000000-A8YDD:1:1101:16271:1876 2:N:0:1 used N701 and S501
      7th @M02143:21:000000000-A8YDD:1:1101:18009:1813 1:N:0:7 used N701 and S502

      So that would mean that we could use the SampleSheet.csv file to 'demultiplex' the data if people are uploading all the data in a single fastq file (or two in the case of paired-end sequencing).

      Thanks again, you helped me find the missing clue to solve the puzzle!!

      Comment


      • #4
        I always forget that actual sequence tag information is included in the header ONLY if the data was processed by CASAVA (i.e. not on the MiSeq) offline. So for the first option I mentioned the analysis will have to be done offline (would not work on MiSeq) to get the tag information in the read ID's.
        Last edited by GenoMax; 06-05-2014, 05:24 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X