Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analysis of A-4 amplicon produced by Roche HLA Primer Kit

    I have been running the Roche HLA kits (both high and medium resolution) on a GS Junior and getting good results with the exception of one amplicon....

    The A-4 amplicon is too long to be sequenced in one continuous read (~746bp).

    I understand that the ends will only be sequenced in one direction but the middle section should be sequenced in both, meaning that the correct sequence can be assembled using AVA.

    How can AVA be sure that the 2 sequences it uses to produce the consensus sequence are the correct ones?

    If variation occurs outside of that middle region which is sequenced in both directions it cannot be verified that the start and end of the consensus sequence belong together, can it?

    At the moment HLA genotyping is being done in-house (not using Conexio as Roche recommend) and because the A-4 sequences do not have MIDs on both ends of the sequence straight out of the .sff file they are being missed by the software.

    Assembling the sequences seems the obvious thing to do but I'm unsure about the validity of results produced by AVA.

    Anybody have any thoughts?

  • #2
    I cannot comment on the use of AVA, since I find it too difficult to use. I use Galaxy instead to define custom workflows for our HLA analysis. I can say from experience that assembly will be difficult since you won't have long enough reads of high enough quality (unless you used FLX+ and got exceptionally good reads)

    Our strategy (which is also not based on the Conexio software) is to split the reads into forward and reverse sequences, then trim them so that each read group abuts (but does not overlap) with the reads from the other direction. In your case that would mean trimming the reads to ~373 bp. We then align each read against every possible reference allele using the alignment program BLAT with 100% stringency. Unlike BLAST, BLAT runs quickly enough that this is feasible to do (align 1,000s of reads against 1,000s of reference sequences). If you're computationally limited you could reduce your reference set to only include the A-4 region you're interested in.

    We take that output and see what alleles matched to each read group (typically between 15 and 100 per group). Then, we do an inner join on the two datasets to eliminate alleles with improper SNPs. In your case you could then take those alleles and perform another inner join against your A2 and A3 matches.

    Comment


    • #3
      Originally posted by proteasome View Post
      I cannot comment on the use of AVA, since I find it too difficult to use. I use Galaxy instead to define custom workflows for our HLA analysis. I can say from experience that assembly will be difficult since you won't have long enough reads of high enough quality (unless you used FLX+ and got exceptionally good reads)

      Our strategy (which is also not based on the Conexio software) is to split the reads into forward and reverse sequences, then trim them so that each read group abuts (but does not overlap) with the reads from the other direction. In your case that would mean trimming the reads to ~373 bp. We then align each read against every possible reference allele using the alignment program BLAT with 100% stringency. Unlike BLAST, BLAT runs quickly enough that this is feasible to do (align 1,000s of reads against 1,000s of reference sequences). If you're computationally limited you could reduce your reference set to only include the A-4 region you're interested in.

      We take that output and see what alleles matched to each read group (typically between 15 and 100 per group). Then, we do an inner join on the two datasets to eliminate alleles with improper SNPs. In your case you could then take those alleles and perform another inner join against your A2 and A3 matches.

      Hi there,
      How do you obtain the two sequences from both ends of the amplicon in separate files? how do you split them? could you share the tool and parameters you use for this purpose?
      Thanks in advance

      S.

      Comment


      • #4
        We utilize the sfffile utility (a command line tool included with the Roche software) to split the original sff file first by MID, and then by primer sequence.

        The first step is to do the primary splitting: `sfffile -s [MIDset_Name] -mcf [MIDconfig.parse] -o [output_folder] [inputSff]`

        Note that you need to give the location of the MIDconfig.parse file as an argument. If you're using the default Roche MID set, you can use "GSMIDs" as the [MIDset_Name]. The documentation for how to do this is in the roche software manual, but I can give you more detailed instructions if you need.

        This first command will create a group of sff files split by MID.

        Next, we modify the MIDconfig.parse file to include a new set of "pseudo-MIDs" which correspond to the primers we're using. The format of the MID set and primers sequences are obvious once you look at the MIDconfig.parse file.

        You re-run the command above, but give the program your unique primer set as the [MIDset_Name] parameter, and one of your primary split sff files as the [inputSff].

        The program will then create unique sff files for each direction located in the [output_folder] directory.

        If you're working with a lot of different MIDs, it is useful to write a basic script wrapper for recursively splitting each of the MID-specific sff files. I have a wrapper written in Perl that does this. Contact me individually if you'd like me to share it with you

        Hope this helps!

        Simon

        Comment


        • #5
          Hi,

          Could you please give us an example of the MIDconfig.parse?

          We analyse junior data and all we get as input is the .sff file.

          Cheers!

          Comment


          • #6
            This is the default MIDconfig.parse file that's included with the software:

            /*
            **
            ** MIDConfig.parse
            **
            ** This file contains the multiplex sequences used by the Genome Sequence
            ** MID library kits, and may contain user-defined sets of multiplex
            ** identifiers. This file is used by the post-run applications to access
            ** the defined MID sets.
            **
            ** To use your own MID set, you can either copy this file to a local
            ** directory, add or edit your own sets (see below), then use the
            ** "-mcf" option of the mapper and assembler to specify the MID
            ** configuration file. Or, you can edit and save this file, to have
            ** your MID sets accessed by default by the post-run applications.
            **
            ** To create a new MID set, copy the examples at the end of the file into
            ** the top section, then edit the text as follows:
            **
            ** * The name of the MID set should begin the group (appear above the
            ** open brace '{')
            **
            ** * Each line in the MID set should contain three values after the
            ** equals sign:
            ** - A name for the specific MID sequence
            ** - The DNA sequence of the MID
            ** - The number of errors allowed in matching to the sequence
            **
            ** * The syntax of the line must be preserved (the "mid = " beginning,
            ** the semi-colon at the end of the line, the open and close braces
            ** for the set.
            **
            **
            ** Note: The names below use a combination of uppercase and lowercase
            ** characters, but all matching to the names is insensitive to
            ** case (so, for example "gsmids" will match the MID set below).
            **
            *******************************************************************************

            /*
            ** User-defined MID sets.
            */





            /*
            ** IMPORTANT: DO NOT EDIT BELOW THIS LINE.
            **
            ** Genome Sequencer MID sets.
            */

            GSMIDs
            {
            mid = "MID1", "ACGAGTGCGT", 2;
            mid = "MID2", "ACGCTCGACA", 2;
            mid = "MID3", "AGACGCACTC", 2;
            mid = "MID4", "AGCACTGTAG", 2;
            mid = "MID5", "ATCAGACACG", 2;
            mid = "MID6", "ATATCGCGAG", 2;
            mid = "MID7", "CGTGTCTCTA", 2;
            mid = "MID8", "CTCGCGTGTC", 2;
            mid = "MID9", "TAGTATCAGC", 2;
            mid = "MID10", "TCTCTATGCG", 2;
            mid = "MID11", "TGATACGTCT", 2;
            mid = "MID12", "TACTGAGCTA", 2;
            mid = "MID13", "CATAGTAGTG", 2;
            mid = "MID14", "CGAGAGATAC", 2;
            }

            Comment


            • #7
              Many thanks, I´ll test it

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X