SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing

Similar Threads
Thread Thread Starter Forum Replies Last Post
Roche Jr amplicon sequencing relaswar 454 Pyrosequencing 12 05-16-2012 05:05 PM
Reagent QC inadequate for amplicon sequencing:complain to Roche! HMorrison 454 Pyrosequencing 0 03-30-2012 07:57 AM
Roche 454 Amplicon Variant Analysis software Lisle Bioinformatics 3 02-25-2010 11:26 AM
Roche Junior amplicon adapters jffkm General 0 02-11-2010 03:22 PM
Roche/454 Titanium Amplicon sequencing? robhall Sample Prep / Library Generation 7 09-01-2009 10:58 AM

Reply
 
Thread Tools
Old 04-04-2012, 01:16 AM   #1
GraemeFox
Member
 
Location: Manchester

Join Date: Oct 2011
Posts: 14
Default Analysis of A-4 amplicon produced by Roche HLA Primer Kit

I have been running the Roche HLA kits (both high and medium resolution) on a GS Junior and getting good results with the exception of one amplicon....

The A-4 amplicon is too long to be sequenced in one continuous read (~746bp).

I understand that the ends will only be sequenced in one direction but the middle section should be sequenced in both, meaning that the correct sequence can be assembled using AVA.

How can AVA be sure that the 2 sequences it uses to produce the consensus sequence are the correct ones?

If variation occurs outside of that middle region which is sequenced in both directions it cannot be verified that the start and end of the consensus sequence belong together, can it?

At the moment HLA genotyping is being done in-house (not using Conexio as Roche recommend) and because the A-4 sequences do not have MIDs on both ends of the sequence straight out of the .sff file they are being missed by the software.

Assembling the sequences seems the obvious thing to do but I'm unsure about the validity of results produced by AVA.

Anybody have any thoughts?
GraemeFox is offline   Reply With Quote
Old 04-04-2012, 01:55 PM   #2
proteasome
Member
 
Location: Wisconsin

Join Date: Jul 2009
Posts: 21
Default

I cannot comment on the use of AVA, since I find it too difficult to use. I use Galaxy instead to define custom workflows for our HLA analysis. I can say from experience that assembly will be difficult since you won't have long enough reads of high enough quality (unless you used FLX+ and got exceptionally good reads)

Our strategy (which is also not based on the Conexio software) is to split the reads into forward and reverse sequences, then trim them so that each read group abuts (but does not overlap) with the reads from the other direction. In your case that would mean trimming the reads to ~373 bp. We then align each read against every possible reference allele using the alignment program BLAT with 100% stringency. Unlike BLAST, BLAT runs quickly enough that this is feasible to do (align 1,000s of reads against 1,000s of reference sequences). If you're computationally limited you could reduce your reference set to only include the A-4 region you're interested in.

We take that output and see what alleles matched to each read group (typically between 15 and 100 per group). Then, we do an inner join on the two datasets to eliminate alleles with improper SNPs. In your case you could then take those alleles and perform another inner join against your A2 and A3 matches.
proteasome is offline   Reply With Quote
Old 04-30-2012, 03:11 AM   #3
Sheila
Member
 
Location: Europe

Join Date: Jun 2009
Posts: 17
Default

Quote:
Originally Posted by proteasome View Post
I cannot comment on the use of AVA, since I find it too difficult to use. I use Galaxy instead to define custom workflows for our HLA analysis. I can say from experience that assembly will be difficult since you won't have long enough reads of high enough quality (unless you used FLX+ and got exceptionally good reads)

Our strategy (which is also not based on the Conexio software) is to split the reads into forward and reverse sequences, then trim them so that each read group abuts (but does not overlap) with the reads from the other direction. In your case that would mean trimming the reads to ~373 bp. We then align each read against every possible reference allele using the alignment program BLAT with 100% stringency. Unlike BLAST, BLAT runs quickly enough that this is feasible to do (align 1,000s of reads against 1,000s of reference sequences). If you're computationally limited you could reduce your reference set to only include the A-4 region you're interested in.

We take that output and see what alleles matched to each read group (typically between 15 and 100 per group). Then, we do an inner join on the two datasets to eliminate alleles with improper SNPs. In your case you could then take those alleles and perform another inner join against your A2 and A3 matches.

Hi there,
How do you obtain the two sequences from both ends of the amplicon in separate files? how do you split them? could you share the tool and parameters you use for this purpose?
Thanks in advance

S.
Sheila is offline   Reply With Quote
Old 04-30-2012, 09:54 AM   #4
proteasome
Member
 
Location: Wisconsin

Join Date: Jul 2009
Posts: 21
Default

We utilize the sfffile utility (a command line tool included with the Roche software) to split the original sff file first by MID, and then by primer sequence.

The first step is to do the primary splitting: `sfffile -s [MIDset_Name] -mcf [MIDconfig.parse] -o [output_folder] [inputSff]`

Note that you need to give the location of the MIDconfig.parse file as an argument. If you're using the default Roche MID set, you can use "GSMIDs" as the [MIDset_Name]. The documentation for how to do this is in the roche software manual, but I can give you more detailed instructions if you need.

This first command will create a group of sff files split by MID.

Next, we modify the MIDconfig.parse file to include a new set of "pseudo-MIDs" which correspond to the primers we're using. The format of the MID set and primers sequences are obvious once you look at the MIDconfig.parse file.

You re-run the command above, but give the program your unique primer set as the [MIDset_Name] parameter, and one of your primary split sff files as the [inputSff].

The program will then create unique sff files for each direction located in the [output_folder] directory.

If you're working with a lot of different MIDs, it is useful to write a basic script wrapper for recursively splitting each of the MID-specific sff files. I have a wrapper written in Perl that does this. Contact me individually if you'd like me to share it with you

Hope this helps!

Simon
proteasome is offline   Reply With Quote
Old 07-24-2012, 04:42 AM   #5
jmrosa
Junior Member
 
Location: Madrid

Join Date: Sep 2010
Posts: 3
Default

Hi,

Could you please give us an example of the MIDconfig.parse?

We analyse junior data and all we get as input is the .sff file.

Cheers!
jmrosa is offline   Reply With Quote
Old 07-25-2012, 09:23 AM   #6
proteasome
Member
 
Location: Wisconsin

Join Date: Jul 2009
Posts: 21
Default

This is the default MIDconfig.parse file that's included with the software:

/*
**
** MIDConfig.parse
**
** This file contains the multiplex sequences used by the Genome Sequence
** MID library kits, and may contain user-defined sets of multiplex
** identifiers. This file is used by the post-run applications to access
** the defined MID sets.
**
** To use your own MID set, you can either copy this file to a local
** directory, add or edit your own sets (see below), then use the
** "-mcf" option of the mapper and assembler to specify the MID
** configuration file. Or, you can edit and save this file, to have
** your MID sets accessed by default by the post-run applications.
**
** To create a new MID set, copy the examples at the end of the file into
** the top section, then edit the text as follows:
**
** * The name of the MID set should begin the group (appear above the
** open brace '{')
**
** * Each line in the MID set should contain three values after the
** equals sign:
** - A name for the specific MID sequence
** - The DNA sequence of the MID
** - The number of errors allowed in matching to the sequence
**
** * The syntax of the line must be preserved (the "mid = " beginning,
** the semi-colon at the end of the line, the open and close braces
** for the set.
**
**
** Note: The names below use a combination of uppercase and lowercase
** characters, but all matching to the names is insensitive to
** case (so, for example "gsmids" will match the MID set below).
**
*******************************************************************************

/*
** User-defined MID sets.
*/





/*
** IMPORTANT: DO NOT EDIT BELOW THIS LINE.
**
** Genome Sequencer MID sets.
*/

GSMIDs
{
mid = "MID1", "ACGAGTGCGT", 2;
mid = "MID2", "ACGCTCGACA", 2;
mid = "MID3", "AGACGCACTC", 2;
mid = "MID4", "AGCACTGTAG", 2;
mid = "MID5", "ATCAGACACG", 2;
mid = "MID6", "ATATCGCGAG", 2;
mid = "MID7", "CGTGTCTCTA", 2;
mid = "MID8", "CTCGCGTGTC", 2;
mid = "MID9", "TAGTATCAGC", 2;
mid = "MID10", "TCTCTATGCG", 2;
mid = "MID11", "TGATACGTCT", 2;
mid = "MID12", "TACTGAGCTA", 2;
mid = "MID13", "CATAGTAGTG", 2;
mid = "MID14", "CGAGAGATAC", 2;
}
proteasome is offline   Reply With Quote
Old 07-27-2012, 04:12 AM   #7
jmrosa
Junior Member
 
Location: Madrid

Join Date: Sep 2010
Posts: 3
Default

Many thanks, IŽll test it
jmrosa is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:02 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.