SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Generating SFF files Xterra 454 Pyrosequencing 8 10-31-2011 01:07 PM
creating Roche's sff files enna80 Bioinformatics 5 11-10-2010 05:30 AM
sff 454 files into fasta Peruano 454 Pyrosequencing 4 03-08-2010 01:21 PM
Difference between .sff files and .fq file edge Bioinformatics 3 10-04-2009 06:30 PM
Assembling .sff files from 454 and finishing Raj Bioinformatics 6 06-03-2009 10:55 AM

Reply
 
Thread Tools
Old 04-03-2012, 09:03 PM   #1
jimmybee
Senior Member
 
Location: Adelaide, Australia

Join Date: Sep 2010
Posts: 119
Default Template-specific bidirectional demultiplexing of sff files from 454

Hey guys,

I'm looking to demultiplex a single sff into ~480 samples using a fairly complex primer design. We have 3 pools of samples that have plate and template specific sequences (6 template-specific sequences - F+R x 3pools), as well as a 13 454 MIDs at either side of the forward and reverse. Looks very similar to this:

Forward primer (Primer A-Key):

5’ - CGTATCGCCTCCCTCGCGCCATCAG - {MID} - {template-specific-sequence} - 3‘

Reverse primer (Primer B-Key):

5’ - CTATGCGCCTTGCCAGCCCGCTCAG - {MID} - {template-specific-sequence} - 3’


Now, we've setup the samples so that we have a different F-MID/R-MID combination (using the 13MIDs available), but we are having difficulties assigning the reverses read, compared to the forward read which we can demultiplex nicely. I suspect that we are doing the demultiplexing in the wrong order. We chose to demultiplex according to the MIDs first and then demultiplex based on the template-specific sequence. If I was to demultiplex based on a template specific sequence first, would I trim off the MID identifer and making me unable to demultiplex based on MIDs?

Has anyone had any experience and advice on how to demultiplex with this project design?

Last edited by jimmybee; 04-03-2012 at 09:08 PM.
jimmybee is offline   Reply With Quote
Old 04-04-2012, 02:51 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Are you trying to work with the Roche off instrument application suite to do the multiplexing? There are other options if you program and want to work directly with the SFF files, e.g. Biopython and BioHaskell
maubp is offline   Reply With Quote
Old 04-04-2012, 12:59 PM   #3
jimmybee
Senior Member
 
Location: Adelaide, Australia

Join Date: Sep 2010
Posts: 119
Default

Yeah I'm using the command-line tools in a couple of scripts. I'm not so worried about the tools but just the process at the moment. Just wanted some feedback and advice on what others have confronted and got around it

I've always wanted to look into Biopython but never found the right project. Might spend a bit of time with it on this
jimmybee is offline   Reply With Quote
Old 04-04-2012, 02:20 PM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,150
Default

Quote:
Originally Posted by jimmybee View Post
...but we are having difficulties assigning the reverses read, compared to the forward read which we can demultiplex nicely
Jimmybee,

I'm a bit confused as to what you mean by reverse read and forward read. The Roche/454 sequencer is only capable of generating a single, unidirectional read for any given DNA molecule. There are no 'forward' and 'reverse' reads in 454 sequencing. For both of your MID tags to be useful you would have to make sure that the PCR product generated by you fusion primers is short enough so that the entire length of the amplicon can be sequenced on the GS-FLX (Titanium, + or whatever). This would result in barcode A at the 5' end of your read and barcode B at the 3' end.

Since dual indexing is not supported by Roche/454 their tools won't help with the second index. You will have to do the sorting in two stages. Assuming your amplicon is sized such that the reads span the entire length, including the second barcode into the keytag and primer B, the 454 runProcessor should have recognized the keytag-primer at the 3' end and trimmed it, leaving your barcode as the 3' end of your reads. You could use the SFF tools to sort based on the 5' barcode. Then feed the resultant FASTA files into one of the various barcode sorting tools available (no specific recommendation, check the SeqAnswers Software Wiki) to sub-divide each further. It's also my experience that most barcode sorting scripts expect the tag to be at the 5' end of the read which means you would need to reverse-complement the reads after the first step.
kmcarr is offline   Reply With Quote
Old 04-04-2012, 03:22 PM   #5
jimmybee
Senior Member
 
Location: Adelaide, Australia

Join Date: Sep 2010
Posts: 119
Default

Quote:
Originally Posted by kmcarr View Post
Jimmybee,

I'm a bit confused as to what you mean by reverse read and forward read. The Roche/454 sequencer is only capable of generating a single, unidirectional read for any given DNA molecule. There are no 'forward' and 'reverse' reads in 454 sequencing. For both of your MID tags to be useful you would have to make sure that the PCR product generated by you fusion primers is short enough so that the entire length of the amplicon can be sequenced on the GS-FLX (Titanium, + or whatever). This would result in barcode A at the 5' end of your read and barcode B at the 3' end.
This is fine, we have only a 215bp fragment length so we're definitely catching the 3' barcode. Sorry I didnt exactly mean read (more the forward and reverse direction)

Quote:
Originally Posted by kmcarr View Post
Since dual indexing is not supported by Roche/454 their tools won't help with the second index. You will have to do the sorting in two stages. Assuming your amplicon is sized such that the reads span the entire length, including the second barcode into the keytag and primer B, the 454 runProcessor should have recognized the keytag-primer at the 3' end and trimmed it, leaving your barcode as the 3' end of your reads. You could use the SFF tools to sort based on the 5' barcode. Then feed the resultant FASTA files into one of the various barcode sorting tools available (no specific recommendation, check the SeqAnswers Software Wiki) to sub-divide each further. It's also my experience that most barcode sorting scripts expect the tag to be at the 5' end of the read which means you would need to reverse-complement the reads after the first step.
Thanks for the advice, i think i'll use either bioperl or biopython to create something solid now, as we're looking to be doing similar projects in the future. I've heard that the Roche tools are no help in this sort of setup
jimmybee is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:00 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO