Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • analysis amplicon miseq

    Dear,
    I sequenced 5 genes using truseq custom amplicon kit on Miseq. But if I process the data using bwa + samtools, the results seem to be not good.
    Anyone has any idea about the pipeline of analysis? (for alignment and snpcalling)?

    thank you al lot
    ME

  • #2
    Dear Elena,
    Actually, I'm also analysing MiSeq FASTQ files from the TruSeq Custom Amplicon.
    Illumina suggests to use Smith-Waterman algorithm which is included in BWA tool (use BWA-SW, or better BWA-MEM). I also use Bowtie2 to compare the results. Samtools is one of the best caller, but I strongly suggest to use GATK in addition. Other callers can be use if you try to find somatic mutations (MuTect, Lowreq...).
    However, it seems that neither BWA or Bowtie2 generate exactly the same alignment than with Illumina MiSeq Reporter, which induces some miscalling. I'm in contact with Illumina to try to understand their workflow.
    Why do you think your data are not good? Did you open the BAM files with IGV (or another viewer) to check te alignment?
    Ciao,
    Antony

    Comment


    • #3
      Hi lebechec,

      have you received more information from Illumina regarding the internal analysis pipeline that is used in MiSeq? I would also like to be able to reproduce the results coming from Illumina and am therefore interested in what algorithms are applied by them.

      Regards,

      Cindy

      Comment


      • #4
        Some details (though not the exact settings) are available in this note: http://supportres.illumina.com/docum...15042314-b.pdf

        Most Illumina analytical pipelines write detailed log files (that generally include the full command lines used for the programs). We do not use MiSeq reporter but if you have access to an analysis folder then look for "AnalysisLog.txt" file.

        Comment


        • #5
          Hi all,

          GenoMax, you're right! Some notes are available in Illumina website. However, some information are missing.
          Same for the "AnalysisLog.txt". Some steps are well described, especially those from third party tools, but nothing appears when they use their own tools, including softly modified tools/scripts (e.g. Demultiplexing, the CASAVA tool BCL2FASTQ with little changes...).

          CindyF, I think that it's impossible to exactly reproduce the data from Illumina pipeline. I suggest to use your own pipeline, maybe close to Illumina's pipeline.
          Typically: Casava for Demultiplexing, Cutadapt for trimming/clipping, BWA (aln or sw) for alignment, GATK BAM sorting/indexation/localrealignment, GATK/MutTect for Variant Calling, ANNOVAR for variant annotation, Spreadsheet for variant prioritization.
          Then, compare the differences (especially the overlap). If your really want to know in detail what they are doing, ask them for a specific answer.

          GenoMax, what is your pipeline? Especially, I've some problem to clip primer/adapter specific of each Amplicon.

          Recently, I discussed with the BioGenouest platform (http://www.biogenouest.org/en/conten...ouest-genomics). They have developped a powerful pipeline for Amplicon on Illumina.

          Best,

          Antony

          Comment


          • #6
            Thanks lebechec and GenoMax for the information. I'll dive deeper into the topic and see how well I can actually apply the algorithms used by MiSeq.

            Best,

            CindyF

            Comment


            • #7
              Hi,

              we are seeing in IGV some variants being called by samtools that seem odd for amplicons, specially at the end of the Miseq reads.
              We have also noticed that reads are split across amplicons. I believe (although I might be wrong) that since BWA merges the reference sequences in one long string of sequence for quick alignment, aligment is done over different amplicons and I get variants at the end of the amplicons due to chimeric alignment. I am trying to figure out how to avoid this at the alignment step, but I think the only way is to filter after alignment, which I am not sure how.

              Also I am wondering if it is possible to avoid mismatches at the end of the read, say last 1-5 bases of the read.

              Any thoughts will be much appreciated.

              Dave

              Comment


              • #8
                Hi everyone,

                @dnusol: Unfortunately I cannot help you with your issue because I am not yet familiar with amplicon analysis. :/

                @all: I finally found out that a somatic variant caller is used in MiSeq's Amplicon Workflow. Now since the variant caller is proprietary software by Illumina I was wondering what alternative variant caller I could use whose results are close to what is delivered by Illumina's somatic variant caller? All that I could find regarding information about their somatic caller is its technical note (http://res.illumina.com/documents/pr...ant_caller.pdf) and the following excerpt:

                "Developed by Illumina, the somatic variant caller identifies variants present at low frequency in the DNA sample and minimizes false positives. For SNP calling, the somatic variant caller considers each position in the reference genome separately, starting with the bases of aligned reads, and assigns a variant score measuring
                the accuracy of the call for the SNP. Variant scores are computed based on a Poisson model that excludes the SNP if the SNP has a quality score below Q20, which is a 1/100 chance of being a false positive.
                For indels, the somatic variant caller analyzes how many alignments covering a given position include a particular indel compared to the overall coverage at that position. The somatic variant caller does not perform an indel re-alignment step included in other variant callers, such as GATK"

                Best,

                Cindy

                Comment


                • #9
                  Hi everyone,

                  so finally I got in contact with someone from illumina to get some more information about their Amplicon Analysis Workflow. Only to find out that alignment and variant calling is carried out by Illumina's own tools, which is where documentation ends - you were totally right @lebechec.

                  So ok, if I try to "rebuild" the pipeline, I have the problem that Illumina is using a manifest file for targeted alignment and variant calling. The results for BWA/GATK against the whole reference genome are indeed different.
                  Nevertheless, I have no clue about how this can be achieved with tools like BWA or GATK. My approach now would be do adapt the reference fasta file that is used in alignment and variant calling, but I do not know how to transform the given manifest file (find it here: http://supportres.illumina.com/docum...15032433_b.txt) into the appropriate reference fastq format.

                  Any ideas here?

                  Best,

                  Cindy

                  Comment


                  • #10
                    You can get the positional information for the genes from the manifest file you included by looking at: column 1,6,7,8. You can check to see how much sequence you want to retrieve. Column 17 has the expected amplified region size. You can use the BED format file to retrieve sequence from table browser at UCSC.

                    Illumina has open source versions of their aligner/variant caller available here: https://github.com/sequencing/

                    Illumina also has a separate package called "strelka" for somatic SNV's and idels available here: https://github.com/genome-vendor/strelka

                    Hopefully this should get you closer to your goal.
                    Last edited by GenoMax; 01-16-2014, 11:44 AM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X