Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Template-specific bidirectional demultiplexing of sff files from 454

    Hey guys,

    I'm looking to demultiplex a single sff into ~480 samples using a fairly complex primer design. We have 3 pools of samples that have plate and template specific sequences (6 template-specific sequences - F+R x 3pools), as well as a 13 454 MIDs at either side of the forward and reverse. Looks very similar to this:

    Forward primer (Primer A-Key):

    5’ - CGTATCGCCTCCCTCGCGCCATCAG - {MID} - {template-specific-sequence} - 3‘

    Reverse primer (Primer B-Key):

    5’ - CTATGCGCCTTGCCAGCCCGCTCAG - {MID} - {template-specific-sequence} - 3’


    Now, we've setup the samples so that we have a different F-MID/R-MID combination (using the 13MIDs available), but we are having difficulties assigning the reverses read, compared to the forward read which we can demultiplex nicely. I suspect that we are doing the demultiplexing in the wrong order. We chose to demultiplex according to the MIDs first and then demultiplex based on the template-specific sequence. If I was to demultiplex based on a template specific sequence first, would I trim off the MID identifer and making me unable to demultiplex based on MIDs?

    Has anyone had any experience and advice on how to demultiplex with this project design?
    Last edited by jimmybee; 04-03-2012, 09:08 PM.

  • #2
    Are you trying to work with the Roche off instrument application suite to do the multiplexing? There are other options if you program and want to work directly with the SFF files, e.g. Biopython and BioHaskell

    Comment


    • #3
      Yeah I'm using the command-line tools in a couple of scripts. I'm not so worried about the tools but just the process at the moment. Just wanted some feedback and advice on what others have confronted and got around it

      I've always wanted to look into Biopython but never found the right project. Might spend a bit of time with it on this

      Comment


      • #4
        Originally posted by jimmybee View Post
        ...but we are having difficulties assigning the reverses read, compared to the forward read which we can demultiplex nicely
        Jimmybee,

        I'm a bit confused as to what you mean by reverse read and forward read. The Roche/454 sequencer is only capable of generating a single, unidirectional read for any given DNA molecule. There are no 'forward' and 'reverse' reads in 454 sequencing. For both of your MID tags to be useful you would have to make sure that the PCR product generated by you fusion primers is short enough so that the entire length of the amplicon can be sequenced on the GS-FLX (Titanium, + or whatever). This would result in barcode A at the 5' end of your read and barcode B at the 3' end.

        Since dual indexing is not supported by Roche/454 their tools won't help with the second index. You will have to do the sorting in two stages. Assuming your amplicon is sized such that the reads span the entire length, including the second barcode into the keytag and primer B, the 454 runProcessor should have recognized the keytag-primer at the 3' end and trimmed it, leaving your barcode as the 3' end of your reads. You could use the SFF tools to sort based on the 5' barcode. Then feed the resultant FASTA files into one of the various barcode sorting tools available (no specific recommendation, check the SeqAnswers Software Wiki) to sub-divide each further. It's also my experience that most barcode sorting scripts expect the tag to be at the 5' end of the read which means you would need to reverse-complement the reads after the first step.

        Comment


        • #5
          Originally posted by kmcarr View Post
          Jimmybee,

          I'm a bit confused as to what you mean by reverse read and forward read. The Roche/454 sequencer is only capable of generating a single, unidirectional read for any given DNA molecule. There are no 'forward' and 'reverse' reads in 454 sequencing. For both of your MID tags to be useful you would have to make sure that the PCR product generated by you fusion primers is short enough so that the entire length of the amplicon can be sequenced on the GS-FLX (Titanium, + or whatever). This would result in barcode A at the 5' end of your read and barcode B at the 3' end.
          This is fine, we have only a 215bp fragment length so we're definitely catching the 3' barcode. Sorry I didnt exactly mean read (more the forward and reverse direction)

          Originally posted by kmcarr View Post
          Since dual indexing is not supported by Roche/454 their tools won't help with the second index. You will have to do the sorting in two stages. Assuming your amplicon is sized such that the reads span the entire length, including the second barcode into the keytag and primer B, the 454 runProcessor should have recognized the keytag-primer at the 3' end and trimmed it, leaving your barcode as the 3' end of your reads. You could use the SFF tools to sort based on the 5' barcode. Then feed the resultant FASTA files into one of the various barcode sorting tools available (no specific recommendation, check the SeqAnswers Software Wiki) to sub-divide each further. It's also my experience that most barcode sorting scripts expect the tag to be at the 5' end of the read which means you would need to reverse-complement the reads after the first step.
          Thanks for the advice, i think i'll use either bioperl or biopython to create something solid now, as we're looking to be doing similar projects in the future. I've heard that the Roche tools are no help in this sort of setup

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X