Seqanswers Leaderboard Ad

**lebechec** · 08-14-2013, 05:26 AM

Dear Elena,
Actually, I'm also analysing MiSeq FASTQ files from the TruSeq Custom Amplicon.
Illumina suggests to use Smith-Waterman algorithm which is included in BWA tool (use BWA-SW, or better BWA-MEM). I also use Bowtie2 to compare the results. Samtools is one of the best caller, but I strongly suggest to use GATK in addition. Other callers can be use if you try to find somatic mutations (MuTect, Lowreq...).
However, it seems that neither BWA or Bowtie2 generate exactly the same alignment than with Illumina MiSeq Reporter, which induces some miscalling. I'm in contact with Illumina to try to understand their workflow.
Why do you think your data are not good? Did you open the BAM files with IGV (or another viewer) to check te alignment?
Ciao,
Antony

**CindyF** · 12-02-2013, 06:07 AM

Hi lebechec,

have you received more information from Illumina regarding the internal analysis pipeline that is used in MiSeq? I would also like to be able to reproduce the results coming from Illumina and am therefore interested in what algorithms are applied by them.

Regards,

Cindy

**GenoMax** · 12-02-2013, 06:23 AM

Some details (though not the exact settings) are available in this note: http://supportres.illumina.com/docum...15042314-b.pdf

Most Illumina analytical pipelines write detailed log files (that generally include the full command lines used for the programs). We do not use MiSeq reporter but if you have access to an analysis folder then look for "AnalysisLog.txt" file.

**lebechec** · 12-02-2013, 07:16 AM

Hi all,

GenoMax, you're right! Some notes are available in Illumina website. However, some information are missing.
Same for the "AnalysisLog.txt". Some steps are well described, especially those from third party tools, but nothing appears when they use their own tools, including softly modified tools/scripts (e.g. Demultiplexing, the CASAVA tool BCL2FASTQ with little changes...).

CindyF, I think that it's impossible to exactly reproduce the data from Illumina pipeline. I suggest to use your own pipeline, maybe close to Illumina's pipeline.
Typically: Casava for Demultiplexing, Cutadapt for trimming/clipping, BWA (aln or sw) for alignment, GATK BAM sorting/indexation/localrealignment, GATK/MutTect for Variant Calling, ANNOVAR for variant annotation, Spreadsheet for variant prioritization.
Then, compare the differences (especially the overlap). If your really want to know in detail what they are doing, ask them for a specific answer.

GenoMax, what is your pipeline? Especially, I've some problem to clip primer/adapter specific of each Amplicon.

Recently, I discussed with the BioGenouest platform (http://www.biogenouest.org/en/conten...ouest-genomics). They have developped a powerful pipeline for Amplicon on Illumina.

Best,

Antony

**CindyF** · 12-03-2013, 12:14 AM

Thanks lebechec and GenoMax for the information. I'll dive deeper into the topic and see how well I can actually apply the algorithms used by MiSeq.

Best,

CindyF

**dnusol** · 12-03-2013, 04:48 AM

Hi,

we are seeing in IGV some variants being called by samtools that seem odd for amplicons, specially at the end of the Miseq reads.
We have also noticed that reads are split across amplicons. I believe (although I might be wrong) that since BWA merges the reference sequences in one long string of sequence for quick alignment, aligment is done over different amplicons and I get variants at the end of the amplicons due to chimeric alignment. I am trying to figure out how to avoid this at the alignment step, but I think the only way is to filter after alignment, which I am not sure how.

Also I am wondering if it is possible to avoid mismatches at the end of the read, say last 1-5 bases of the read.

Any thoughts will be much appreciated.

Dave

**CindyF** · 01-06-2014, 04:48 AM

Hi everyone,

@dnusol: Unfortunately I cannot help you with your issue because I am not yet familiar with amplicon analysis. :/

@all: I finally found out that a somatic variant caller is used in MiSeq's Amplicon Workflow. Now since the variant caller is proprietary software by Illumina I was wondering what alternative variant caller I could use whose results are close to what is delivered by Illumina's somatic variant caller? All that I could find regarding information about their somatic caller is its technical note (http://res.illumina.com/documents/pr...ant_caller.pdf) and the following excerpt:

"Developed by Illumina, the somatic variant caller identifies variants present at low frequency in the DNA sample and minimizes false positives. For SNP calling, the somatic variant caller considers each position in the reference genome separately, starting with the bases of aligned reads, and assigns a variant score measuring
the accuracy of the call for the SNP. Variant scores are computed based on a Poisson model that excludes the SNP if the SNP has a quality score below Q20, which is a 1/100 chance of being a false positive.
For indels, the somatic variant caller analyzes how many alignments covering a given position include a particular indel compared to the overall coverage at that position. The somatic variant caller does not perform an indel re-alignment step included in other variant callers, such as GATK"

Best,

Cindy

**CindyF** · 01-16-2014, 09:44 AM

Hi everyone,

so finally I got in contact with someone from illumina to get some more information about their Amplicon Analysis Workflow. Only to find out that alignment and variant calling is carried out by Illumina's own tools, which is where documentation ends - you were totally right @lebechec.

So ok, if I try to "rebuild" the pipeline, I have the problem that Illumina is using a manifest file for targeted alignment and variant calling. The results for BWA/GATK against the whole reference genome are indeed different.
Nevertheless, I have no clue about how this can be achieved with tools like BWA or GATK. My approach now would be do adapt the reference fasta file that is used in alignment and variant calling, but I do not know how to transform the given manifest file (find it here: http://supportres.illumina.com/docum...15032433_b.txt) into the appropriate reference fastq format.

Any ideas here?

Best,

Cindy

**GenoMax** · 01-16-2014, 11:33 AM

You can get the positional information for the genes from the manifest file you included by looking at: column 1,6,7,8. You can check to see how much sequence you want to retrieve. Column 17 has the expected amplified region size. You can use the BED format file to retrieve sequence from table browser at UCSC.

Illumina has open source versions of their aligner/variant caller available here: https://github.com/sequencing/

Illumina also has a separate package called "strelka" for somatic SNV's and idels available here: https://github.com/genome-vendor/strelka

Hopefully this should get you closer to your goal.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

analysis amplicon miseq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News