SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Ion Torrent



Similar Threads
Thread Thread Starter Forum Replies Last Post
ion torrent PGM 318v2Chip running problem zcm2403 Ion Torrent 1 08-02-2017 06:47 AM
ion torrent proton QC problem zcm2403 Ion Torrent 7 02-09-2017 07:15 AM
Ion Torrent PGM Data Analysis problem!! wizard_ofchaos Ion Torrent 9 06-24-2014 02:40 AM
Ion Torrent and Whole Exome Sequence alexdem Ion Torrent 19 04-17-2014 09:09 AM
Human Exome Sequencing using Ion torrent PGM Nilaksha Ion Torrent 2 03-17-2014 07:37 PM

Reply
 
Thread Tools
Old 02-01-2017, 07:32 AM   #1
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default A problem about Ion Torrent Whole Exome Sequencing

First, I'd like to say Ion Torrent really sucks!
We used to use Illumina data generated by a third party for our research. Last year, my boss decided to buy our own sequencing machine. Although everyone suggested him to use Illumina, due to cost consideration, we switched to Ion Torrent platform...That's where the nightmare began.

We have a pipeline on our cluster for Illumina data which works fine. It reads the vcf files and then output a filtered annotated variants lists. The filter options are really simple, just based on MAF and inheritance type. Since we switched to Ion Torrent platform, we need a pipeline like this as well.

Basically, the vcf files are the same. The pipeline should also work for Ion Torrent vcf files. GATK can call multiple samples together. So you can get genotype of every sample for a variant. Ion Torrent can only call variant for one sample. So if the sample doesn't have this variant, the vcf file won't have it. But you don't really know if the genotype at that position is not observed or same as the reference without looking into the bam files. I asked the Ion Torrent support people, they said if the variant is listed in the vcf, you can consider it same as reference.
Fine. Let's go with this. Then, we lots of cases as the following one:
In a proband, there is a variant like this:
chr2 202006095 . AG A,AA 1/1
If you look for this variant in the unaffected samples, they don't have it. OK, this looks promising, a deletion that proband has but none of the unaffected samples have it.
Then I notice in the vcf files of all the unaffected samples, they have the following variant:
chr2 202006096 . G A 0/1.

Ok, now I believe the proband should have the same variant G->A as other samples at position chr2 202006096. But Ion Torrent identified it as an INDEL. We found lots of cases like this. I believe, in the vcf files of Illumina data, it will at least list this 2 variants, 1 as indel and 1 as a SNP. With genotype of other samples. You can easily recoginize it. But for Ion Torrent, if we don't look into it very carefully, we won't notice this.

I believe everyone here are using Ion Torrent. How do you overcome this problem? Is there a way to overcome this problem without checking the variant manually?
zhoujiayi is offline   Reply With Quote
Old 02-01-2017, 10:41 AM   #2
jhalpin
Member
 
Location: Atlanta, GA

Join Date: Jan 2015
Posts: 15
Default

I don't know the solution, but I can tell you Ion Torrent has a lot of deletion errors that are sequencing artifacts when sequencing through a long (>5bp) homopolymer region. If this is where your variants fall, you're going to have a bad time.
jhalpin is offline   Reply With Quote
Old 02-05-2017, 05:39 AM   #3
mlariviere
Junior Member
 
Location: Montreal, Canada

Join Date: Feb 2017
Posts: 1
Default

Hi Jiayi Zhou,

Using AmpliSeq Exome on Proton must not be that bad. It allowed one of researchers at your institution to find mutation causing hearing loss in Newfoundland population:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5215284/

Have you contacted your local Ion Torrent support regarding your issue?
mlariviere is offline   Reply With Quote
Old 02-05-2017, 08:30 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Hi zhoujiayi,

I don't really like the practice of combining indels and SNPs (or, generally, different variants) on the same line in VCF, as it makes the format even more confusing and difficult to read. So, BBMap's CallVariants tool puts each variant on a unique line, which might be what you're looking for. You use it like this:

callvariants.sh in=sample1.sam,sample2.sam ref=ref.fa out=vars.vcf multisample ploidy=2 prefilter

I'd be interested in hearing how well it performs on Ion Torrent data; I've only been testing it on Illumina so far.
Brian Bushnell is offline   Reply With Quote
Old 02-06-2017, 05:37 AM   #5
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default

Quote:
Originally Posted by mlariviere View Post
Hi Jiayi Zhou,

Using AmpliSeq Exome on Proton must not be that bad. It allowed one of researchers at your institution to find mutation causing hearing loss in Newfoundland population:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5215284/

Have you contacted your local Ion Torrent support regarding your issue?
Yes. I am one of the author of this paper. The performance of detecting SNPs seems good for Ion Torrrent. But we are having troubles with INDELs. We are now working on improving the performance of detecting INDELs.
zhoujiayi is offline   Reply With Quote
Old 02-06-2017, 05:45 AM   #6
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default

Quote:
Originally Posted by Brian Bushnell View Post
Hi zhoujiayi,

I don't really like the practice of combining indels and SNPs (or, generally, different variants) on the same line in VCF, as it makes the format even more confusing and difficult to read. So, BBMap's CallVariants tool puts each variant on a unique line, which might be what you're looking for. You use it like this:

callvariants.sh in=sample1.sam,sample2.sam ref=ref.fa out=vars.vcf multisample ploidy=2 prefilter

I'd be interested in hearing how well it performs on Ion Torrent data; I've only been testing it on Illumina so far.
Thank you for your suggestion. But this software cannot help for my case. Because Ion Torrent detected the variants as a INDEL chr2 202006095 . AG A,AA 1/1, even I use the software to split it into 2 line. I got chr2 202006095 . AG A and chr2 202006095 . AG AA. Then, if I use software to check if the parents have these 2 variants, the parents won't have them. But in fact, chr2 202006095 . AG AA should be chr2 202006096 G A. The parents have this variant. So it is not the variant we are looking for. For sure, we can check this kind of variants manually. But we have lots of variants like this. It is impossible to filter them manually.
zhoujiayi is offline   Reply With Quote
Old 02-06-2017, 09:59 AM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Actually, in this case, BBMap would produce two lines that look like this:

Code:
chr2 202006095 . AG A
chr2 202006096 . G A
Brian Bushnell is offline   Reply With Quote
Old 02-07-2017, 05:10 AM   #8
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default

Quote:
Originally Posted by Brian Bushnell View Post
Actually, in this case, BBMap would produce two lines that look like this:

Code:
chr2 202006095 . AG A
chr2 202006096 . G A
Thank you so much. I'll try it.
zhoujiayi is offline   Reply With Quote
Old 02-07-2017, 07:26 AM   #9
r.rosati
Member
 
Location: Brazil

Join Date: Aug 2015
Posts: 38
Default

Premise: I don't hold a grudge towards semiconductor sequencing. It's pretty good for some applications. It's great for germline data on gene panels. It's ok for investigating by exome analysis a case where there are a few candidate genes and you're not going to order a tailored panel for that specific case.
But it's a nightmare when you want to perform a broader analysis.

I agree with you that Ion Torrent default algorithms are very far from perfect. I've been in your situation too, and it's so unnerving.

Quote:
Originally Posted by zhoujiayi View Post
Ion Torrent can only call variant for one sample. So if the sample doesn't have this variant, the vcf file won't have it. But you don't really know if the genotype at that position is not observed or same as the reference without looking into the bam files. I asked the Ion Torrent support people, they said if the variant is listed in the vcf, you can consider it same as reference.
Here you probably mean "if the variant is not listed in the vcf, then it's equal to the reference".
If this is what the support people told you, I wholeheartedly disagree with them. It's just plainly not true.
Just take a look at the VariantCaller parameters and you'll see a list of things that will result in a variant not being called, which are there in the first place to avoid the sequencer calling too many of the errors it makes.
For example with default parameters, a region with homozygous SNV with < 5 total reads is not called; for indels, you need 10+ reads, and at least 5 of them on either strand. This might work ok in optimal conditions, i.e. high coverage and high quality, end-to-end reads; but in ordinary conditions, where some amplicons have consistently low coverage, you do get some trouble from it.
One example is trying to compare a trio: you'll see a lot of de novo mutations in the child and when you look it up on the parents, you'll see that the child has 4 of 6 reads with the variant (was called), and one parent shows it in 2 of 4 total reads (not called); and that's where the "de novo" variant came from. If you're only interested on a small set of genes, this is hardly a problem because you can verify manually. But broaden up your analysis, and you have to sort 200 de novo variants. Yeah sure. Either they're not de novo, or they're so much "de novo" that they started existing today in the sequencer.

Another example of bad behavior that results in a variant not being listed consistently, which happened in some exomes I've ran, is the fact that the Torrent Suite sometimes aligns a deletion on the 5' side, and sometimes on the 3' side of a repeated element. This results in (a) two possible variants describing the same deletion on a series of patients, and (b) the frequency of the deletion being split between 2 positions, possibly resulting in the variant not being called at all.

What I did once, was to take all "de novo" variants in a trio, make them into a hotspot .bed file to be fed to VariantCaller, and then run the VariantCaller plugin again. Every entry of a hotspot must be listed in the final vcf; if it was not called, then the reason must be stated, so you'd spot the risky calls.
A warning though: the default parameters for hotspot regions in Variant Caller are more stringent, so if you keep the default parameters, you will end up with some variants that will not be called anymore even in the sample that had them in the first place.
Another thing I did in that case was to write a simple Python script to go check every "de novo" variant (from the VCF) on the parents' BED files, and see if they actually, honestly did not have it.
Also if you have trios, IonReporter has a specific analysis option for that, and it might be helpful. If it wasn't so, so slow.

Anyways. I feel your pain too. If BBmap helps, let us know.

EDIT: why wait. I'll try BBMap on that data of mine.

Last edited by r.rosati; 02-07-2017 at 07:46 AM.
r.rosati is offline   Reply With Quote
Old 02-07-2017, 12:27 PM   #10
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Quote:
Originally Posted by r.rosati View Post
EDIT: why wait. I'll try BBMap on that data of mine.
Thanks, I'd like to see the results! Probably I need to do some calibration to get rid of the most obvious IonTorrent-specific error modes.
Brian Bushnell is offline   Reply With Quote
Old 02-08-2017, 06:31 AM   #11
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default

Quote:
Originally Posted by r.rosati View Post
Premise: I don't hold a grudge towards semiconductor sequencing. It's pretty good for some applications. It's great for germline data on gene panels. It's ok for investigating by exome analysis a case where there are a few candidate genes and you're not going to order a tailored panel for that specific case.
But it's a nightmare when you want to perform a broader analysis.

I agree with you that Ion Torrent default algorithms are very far from perfect. I've been in your situation too, and it's so unnerving.



Here you probably mean "if the variant is not listed in the vcf, then it's equal to the reference".
If this is what the support people told you, I wholeheartedly disagree with them. It's just plainly not true.
Just take a look at the VariantCaller parameters and you'll see a list of things that will result in a variant not being called, which are there in the first place to avoid the sequencer calling too many of the errors it makes.
For example with default parameters, a region with homozygous SNV with < 5 total reads is not called; for indels, you need 10+ reads, and at least 5 of them on either strand. This might work ok in optimal conditions, i.e. high coverage and high quality, end-to-end reads; but in ordinary conditions, where some amplicons have consistently low coverage, you do get some trouble from it.
One example is trying to compare a trio: you'll see a lot of de novo mutations in the child and when you look it up on the parents, you'll see that the child has 4 of 6 reads with the variant (was called), and one parent shows it in 2 of 4 total reads (not called); and that's where the "de novo" variant came from. If you're only interested on a small set of genes, this is hardly a problem because you can verify manually. But broaden up your analysis, and you have to sort 200 de novo variants. Yeah sure. Either they're not de novo, or they're so much "de novo" that they started existing today in the sequencer.

Another example of bad behavior that results in a variant not being listed consistently, which happened in some exomes I've ran, is the fact that the Torrent Suite sometimes aligns a deletion on the 5' side, and sometimes on the 3' side of a repeated element. This results in (a) two possible variants describing the same deletion on a series of patients, and (b) the frequency of the deletion being split between 2 positions, possibly resulting in the variant not being called at all.

What I did once, was to take all "de novo" variants in a trio, make them into a hotspot .bed file to be fed to VariantCaller, and then run the VariantCaller plugin again. Every entry of a hotspot must be listed in the final vcf; if it was not called, then the reason must be stated, so you'd spot the risky calls.
A warning though: the default parameters for hotspot regions in Variant Caller are more stringent, so if you keep the default parameters, you will end up with some variants that will not be called anymore even in the sample that had them in the first place.
Another thing I did in that case was to write a simple Python script to go check every "de novo" variant (from the VCF) on the parents' BED files, and see if they actually, honestly did not have it.
Also if you have trios, IonReporter has a specific analysis option for that, and it might be helpful. If it wasn't so, so slow.

Anyways. I feel your pain too. If BBmap helps, let us know.

EDIT: why wait. I'll try BBMap on that data of mine.
I installed bbmap on my cluster. Follow the instructions here: http://jgi.doe.gov/data-and-tools/bb...llation-guide/ . However, if I run $ (installation directory)/stats.sh in=(installation directory)/resources/phix174_ill.ref.fa.gz, I got error as command not found. But I can run bash $ (installation directory)/stats.sh in=(installation directory)/resources/phix174_ill.ref.fa.gz to get the same result as mentioned. So I think I have installed the bbmap correctly.

Then I tried to run your command, this is the command I ran:
bash bbmap/callvariants.sh in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1 > /home/aiden/result.txt 2> /home/aiden/errorbb.txt &
But I cannot get the command run. The error message in the errrorbb.txt is as following:
java -ea -Xmx46006m -Xms46006m -cp /home/aiden/bbmap/current/ var2.CallVariants in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1
Executing var2.CallVariants2 [in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam, ref=/IonTorrent/HG19/hg19.fasta, out=vars.vcf, multisample, ploidy=2, prefilter, 1]

Unknown parameter 1
Exception in thread "main" java.lang.AssertionError: Unknown parameter 1
at var2.CallVariants2.<init>(CallVariants2.java:199)
at var2.CallVariants2.main(CallVariants2.java:49)
at var2.CallVariants.main(CallVariants.java:48)

Do you know what I have done wrong?
Thank you.
zhoujiayi is offline   Reply With Quote
Old 02-08-2017, 06:35 AM   #12
r.rosati
Member
 
Location: Brazil

Join Date: Aug 2015
Posts: 38
Default

Quote:
Originally Posted by zhoujiayi View Post
bash bbmap/callvariants.sh in=/home/aiden/data/IonXpress_016.sam,/home/aiden/data/IonXpress_015.sam,/home/aiden/data/IonXpress_016.sam ref=/IonTorrent/HG19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1 > /home/aiden/result.txt 2> /home/aiden/errorbb.txt &
I reckon the "1" that I underlined above might be the "parameter 1" that is giving the error. If you look at Brian Bushnell's message, there's no "1" in that position.
r.rosati is offline   Reply With Quote
Old 02-08-2017, 06:42 AM   #13
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default

Quote:
Originally Posted by r.rosati View Post
I reckon the "1" that I underlined above might be the "parameter 1" that is giving the error. If you look at Brian Bushnell's message, there's no "1" in that position.
If you want to save the stdout and stderr of a command, 1 is for output the stdout to file result.txt and 2 is for output the stderr to errorbbmap.txt.This is just Linux command.
zhoujiayi is offline   Reply With Quote
Old 02-08-2017, 07:06 AM   #14
r.rosati
Member
 
Location: Brazil

Join Date: Aug 2015
Posts: 38
Default

Quote:
Originally Posted by zhoujiayi View Post
If you want to save the stdout and stderr of a command, 1 is for output the stdout to file result.txt and 2 is for output the stderr to errorbbmap.txt.This is just Linux command.
Oh, my bad. Then the space must be the culprit, i.e. "1>" and not "1 >"
r.rosati is offline   Reply With Quote
Old 02-09-2017, 05:49 AM   #15
zhoujiayi
Member
 
Location: Canada

Join Date: Sep 2013
Posts: 12
Default

I have tried bbmap. I got 123057 snps and 51208 indels called by bbmap. Just based on the figures, it seems too many indels has been called. Ion Torrent called 46821 SNPs, 372 MNPs and 2187 indels. So I guess Ion Torrent did have some tricks in their algorithm to optimize the result of their sequencing data.
I also tried to use GATK or samtools or freebayes to call the Ion Torrent data before, all of them have the same problem as bbmap. Too many false positives have been called.

I tried to use GATK to find the concordance between these 2 results. It seems due to the format of bbmap's result(<ID=FAIL,Description="Fail">), GATK cannot get through.

I think we will keep using the TVC to call the variants, otherwise we need to start from the beginning to find a way to filter the results of other software.
zhoujiayi is offline   Reply With Quote
Old 02-09-2017, 08:50 AM   #16
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Thanks for the feedback! Looks like Ion Torrent variant calling needs custom settings. I'm not sure what's wrong with ##FORMAT=<ID=FAIL,Number=1,Type=String,Description="Fail">, though... it's a perfectly valid header line.

Last edited by Brian Bushnell; 02-09-2017 at 08:53 AM.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
ion torrent

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO