Unconfigured Ad

**proteasome** · 04-04-2012, 12:55 PM

I cannot comment on the use of AVA, since I find it too difficult to use. I use Galaxy instead to define custom workflows for our HLA analysis. I can say from experience that assembly will be difficult since you won't have long enough reads of high enough quality (unless you used FLX+ and got exceptionally good reads)

Our strategy (which is also not based on the Conexio software) is to split the reads into forward and reverse sequences, then trim them so that each read group abuts (but does not overlap) with the reads from the other direction. In your case that would mean trimming the reads to ~373 bp. We then align each read against every possible reference allele using the alignment program BLAT with 100% stringency. Unlike BLAST, BLAT runs quickly enough that this is feasible to do (align 1,000s of reads against 1,000s of reference sequences). If you're computationally limited you could reduce your reference set to only include the A-4 region you're interested in.

We take that output and see what alleles matched to each read group (typically between 15 and 100 per group). Then, we do an inner join on the two datasets to eliminate alleles with improper SNPs. In your case you could then take those alleles and perform another inner join against your A2 and A3 matches.

**Sheila** · 04-30-2012, 02:11 AM

Originally posted by proteasome View Post

I cannot comment on the use of AVA, since I find it too difficult to use. I use Galaxy instead to define custom workflows for our HLA analysis. I can say from experience that assembly will be difficult since you won't have long enough reads of high enough quality (unless you used FLX+ and got exceptionally good reads)

Our strategy (which is also not based on the Conexio software) is to split the reads into forward and reverse sequences, then trim them so that each read group abuts (but does not overlap) with the reads from the other direction. In your case that would mean trimming the reads to ~373 bp. We then align each read against every possible reference allele using the alignment program BLAT with 100% stringency. Unlike BLAST, BLAT runs quickly enough that this is feasible to do (align 1,000s of reads against 1,000s of reference sequences). If you're computationally limited you could reduce your reference set to only include the A-4 region you're interested in.

We take that output and see what alleles matched to each read group (typically between 15 and 100 per group). Then, we do an inner join on the two datasets to eliminate alleles with improper SNPs. In your case you could then take those alleles and perform another inner join against your A2 and A3 matches.

Hi there,
How do you obtain the two sequences from both ends of the amplicon in separate files? how do you split them? could you share the tool and parameters you use for this purpose?
Thanks in advance

S.

**proteasome** · 04-30-2012, 08:54 AM

We utilize the sfffile utility (a command line tool included with the Roche software) to split the original sff file first by MID, and then by primer sequence.

The first step is to do the primary splitting: `sfffile -s [MIDset_Name] -mcf [MIDconfig.parse] -o [output_folder] [inputSff]`

Note that you need to give the location of the MIDconfig.parse file as an argument. If you're using the default Roche MID set, you can use "GSMIDs" as the [MIDset_Name]. The documentation for how to do this is in the roche software manual, but I can give you more detailed instructions if you need.

This first command will create a group of sff files split by MID.

Next, we modify the MIDconfig.parse file to include a new set of "pseudo-MIDs" which correspond to the primers we're using. The format of the MID set and primers sequences are obvious once you look at the MIDconfig.parse file.

You re-run the command above, but give the program your unique primer set as the [MIDset_Name] parameter, and one of your primary split sff files as the [inputSff].

The program will then create unique sff files for each direction located in the [output_folder] directory.

If you're working with a lot of different MIDs, it is useful to write a basic script wrapper for recursively splitting each of the MID-specific sff files. I have a wrapper written in Perl that does this. Contact me individually if you'd like me to share it with you

Hope this helps!

Simon

**jmrosa** · 07-24-2012, 03:42 AM

Hi,

Could you please give us an example of the MIDconfig.parse?

We analyse junior data and all we get as input is the .sff file.

Cheers!

**proteasome** · 07-25-2012, 08:23 AM

This is the default MIDconfig.parse file that's included with the software:

/*
**
** MIDConfig.parse
**
** This file contains the multiplex sequences used by the Genome Sequence
** MID library kits, and may contain user-defined sets of multiplex
** identifiers. This file is used by the post-run applications to access
** the defined MID sets.
**
** To use your own MID set, you can either copy this file to a local
** directory, add or edit your own sets (see below), then use the
** "-mcf" option of the mapper and assembler to specify the MID
** configuration file. Or, you can edit and save this file, to have
** your MID sets accessed by default by the post-run applications.
**
** To create a new MID set, copy the examples at the end of the file into
** the top section, then edit the text as follows:
**
** * The name of the MID set should begin the group (appear above the
** open brace '{')
**
** * Each line in the MID set should contain three values after the
** equals sign:
** - A name for the specific MID sequence
** - The DNA sequence of the MID
** - The number of errors allowed in matching to the sequence
**
** * The syntax of the line must be preserved (the "mid = " beginning,
** the semi-colon at the end of the line, the open and close braces
** for the set.
**
**
** Note: The names below use a combination of uppercase and lowercase
** characters, but all matching to the names is insensitive to
** case (so, for example "gsmids" will match the MID set below).
**
*******************************************************************************

/*
** User-defined MID sets.
*/

/*
** IMPORTANT: DO NOT EDIT BELOW THIS LINE.
**
** Genome Sequencer MID sets.
*/

GSMIDs
{
mid = "MID1", "ACGAGTGCGT", 2;
mid = "MID2", "ACGCTCGACA", 2;
mid = "MID3", "AGACGCACTC", 2;
mid = "MID4", "AGCACTGTAG", 2;
mid = "MID5", "ATCAGACACG", 2;
mid = "MID6", "ATATCGCGAG", 2;
mid = "MID7", "CGTGTCTCTA", 2;
mid = "MID8", "CTCGCGTGTC", 2;
mid = "MID9", "TAGTATCAGC", 2;
mid = "MID10", "TCTCTATGCG", 2;
mid = "MID11", "TGATACGTCT", 2;
mid = "MID12", "TACTGAGCTA", 2;
mid = "MID13", "CATAGTAGTG", 2;
mid = "MID14", "CGAGAGATAC", 2;
}

**jmrosa** · 07-27-2012, 03:12 AM

Many thanks, I´ll test it

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 100 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 121 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 114 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Analysis of A-4 amplicon produced by Roche HLA Primer Kit

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News