![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Indel detection in high coverage amplicons sequenced by MiSeq | lpalacios | Bioinformatics | 2 | 02-24-2015 07:04 AM |
find areas of very low and extra high coverage in NGS data | willMD | Bioinformatics | 4 | 04-04-2013 11:33 AM |
Variant calling on custom amplicons with extremely high coverage | memento | Bioinformatics | 2 | 02-18-2013 09:38 PM |
Low frequency indel detection | NestorNotabilis | Bioinformatics | 3 | 05-31-2012 12:25 PM |
MAQ and indel detection | fadista | Bioinformatics | 1 | 09-03-2008 01:07 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Boston, MA Join Date: Feb 2015
Posts: 6
|
![]()
I am attempting to detect indels from a panel of clones resulting from CRISPR targeted deletion. Regions around the target were PCR amplified to produce a roughly 160bp amplicon, which was then sequenced with as a PE150 run.
I've been banging my head against finding a tool that can: 1) Detect which clones have indels 2) Identify the location of these indels, ideally in a VCF or similar file such that the full panel of 96 clones can visualized (i.e. in IGV). 3) Provide annotation details about the read depth at the indel position and percentage of the sequences that contain an indel. 4) Is a stand-alone tool that I can install on my Unix box. This question is very similar to one asked back in 2012, which was never sufficiently answered. http://seqanswers.com/forums/showthread.php?t=25190 So far, I have tried a number of methods, each of which have some key failings. CRISPR Genome Analyzer: crispr-ga.net Doesn't localize the indel detected. Only available as a web app; I need something that I can install locally and hammer on. Outknocker: Doesn't give localizations. Web based. Pindel: Pindel ends up missing a number of indels that are clearly there when viewing the reads from the BAM file. Also, pindel format is not good for looking across many clones, and the conversion pindel-to-vcf loses any depth and read-percentage information. GATK HaplotypeCaller: Misses a lot of very obvious indels. The caller appears to not be optimized for non-tiled non-fragmented reads. GASVPRO and SVseq: Simply don't detect any indels where they clearly exist. Again, the callers seem non-optimized for amplicon sequencing. XHMM: Refuses to do the PCA normalization with fewer targets than samples CONTRA: Doesn't deal well with small analysis windows for detecting indels shorter than 11bp. CRISPR-GA uses a BLAT based alignment for their backend, but I can't find any additional information on how I would do further indel detection after a basic BLAT alignment. Any ideas on a BLAT based indel calling solution? This seems like it should be a dead simple problem, something that was solved back in the 80's, but I can't find a good reference to a tool to use. Any help is appreciated. Thanks |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Berlin, Germany Join Date: Jan 2015
Posts: 137
|
![]()
I have the same experience as you - all popular variant calling tools out there are optimized for shotgun libraries. For targeted resequencing with PCR amplicons I do a pileup with samtools and parse the output, calling variants at suitable thresholds.
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: amsterdam Join Date: Jun 2009
Posts: 133
|
![]()
Pindel is able to call indels from many samples at the same time and give you indication of which samples are the carriers. The new pindel2vcf gives you a clean way to view the result with the numbers of reads supporting ref and alt alleles.
you could put a list of bam files in the config file and run all samples together. |
![]() |
![]() |
![]() |
#4 |
Member
Location: New Zealand Join Date: Mar 2014
Posts: 20
|
![]()
Just wondering if you've tried Geneious? It's commercial software, so maybe that doesn't meet your requirements, and I'm biased since I wrote the variant caller in it, but I've never seen it fail to call an obvious variant.
If you're willing to share your data along with the locations of some obvious indels that aren't getting called, then I can run it through Geneious for you and let you know how it goes. |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Boston, MA Join Date: Feb 2015
Posts: 6
|
![]()
Well, after much frustration, I stumbled onto the package freebayes "Bayesian haplotype-based polymorphism discovery and genotyping"
https://github.com/ekg/freebayes The caller seems to work well on amplicon data and ended up being the cleanest and most complete VCF file (with ref and alt allele frequencies). |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Boston, MA Join Date: Feb 2015
Posts: 6
|
![]()
sorry, double post
Last edited by alexholman; 02-24-2015 at 07:03 AM. Reason: double post |
![]() |
![]() |
![]() |
#7 |
Member
Location: New Zealand Join Date: Mar 2014
Posts: 20
|
![]()
Alex shared some of his data with me to run through Geneious and we corresponded a bit by email, so I thought I'd share the results with anyone else who's interested.
Geneious called the indels although it split them into multiple adjacent indels in some situations which isn't ideal. I hope to improve this soon. I also ran his data through FreeBayes, which also found the obvious indels he expected, but it didn't find other 'obvious' indels he wasn't aware of. The main problem was that the data was poorly aligned. For example, in one sample, one allele had a 29bp deletion and the other allele a 44bp deletion in the same region. The alignment created using BWA mem had failed to span the 44bp deletion, so no neither Geneious nor FreeBayes would call this indel from this alignment. I generated a better alignment using Geneious, and then Geneious called both indels, although split it a way that made it difficult to infer the two alleles. FreeBayes still failed to identify the two alleles in this case even when provided with an improved alignment. For indels like this, I recommend aligning using either Geneious, or BBMap which both successfully span large indels. Or maybe other aligners have settings to tweak that will improve results around indels. And for variant calling on this type of data, both Geneious or FreeBayes do OK, although neither works perfectly on the data Alex provided even when I generated a better alignment. |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Berlin, Germany Join Date: Jan 2015
Posts: 137
|
![]()
That's my experience with Freebayes as well, I haven't found a good set of parameters to work with the amplicon data I have. I typically use GMAP (when I have Sanger data) and GSNAP (for Illumina data) to align this kind of data, and with my own custom caller on pileups from samtools am quite happy...
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Berlin, Germany Join Date: Jan 2015
Posts: 137
|
![]()
One of my colleagues has a Geneious license, so I might try it. However, I'm a command line guy - is there a way to automate Geneious for amplicon data? (I typically have thousands of samples with custom inline barcoding schemes)
|
![]() |
![]() |
![]() |
#10 | |
Member
Location: New Zealand Join Date: Mar 2014
Posts: 20
|
![]() Quote:
Or if that's not sufficient you can put together workflows with optional custom code fragments. See https://www.youtube.com/watch?v=uvgB2_YBmD4 for a short demo of workflows. Also, one limitation of Geneious is that you can't yet export to VCF format so you'll have to settle for CSV export for now. |
|
![]() |
![]() |
![]() |
#11 | |
Senior Member
Location: Berlin, Germany Join Date: Jan 2015
Posts: 137
|
![]() Quote:
VCF would be nice but is not a must. If the CSV format contain enough data I could genereate VCF from it where needed. First I'd like to compare the aligner/caller to our current amplicon re-sequencing pipeline. |
|
![]() |
![]() |
![]() |
#12 | |
Member
Location: New Zealand Join Date: Mar 2014
Posts: 20
|
![]() Quote:
It's probably best you just try it with a sample or two to start with to see if Geneious gives acceptable results on your data. |
|
![]() |
![]() |
![]() |
#13 |
Member
Location: London Join Date: Feb 2015
Posts: 18
|
![]()
You may also find Scalpel useful (http://scalpel.sourceforge.net/) which uses an assembly step during indel calling (http://www.ncbi.nlm.nih.gov/pubmed/25128977 ) that may help with some of the alignment-derived false negatives.
|
![]() |
![]() |
![]() |
#14 | |
Junior Member
Location: RUSSIA MOSCOW Join Date: Mar 2015
Posts: 3
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#15 | |
Member
Location: US Join Date: Nov 2009
Posts: 10
|
![]() Quote:
1) I used bwa-mem as an aligner, and which one you used? 2) my freebayes command line: freebayes -f /home/db/chr5.fa --region chr5:112818960-112819204 my_sorted.bam > results.vcf |
|
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: Boston, MA Join Date: Feb 2015
Posts: 6
|
![]()
There are two flags that I'm using that are not in your command.
--no-snps suppresses calling SNPs, because for CRISPR analysis you don't really care about them and --use-duplicate-reads I have a feeling that this one is what you need. When next-gen sequencing amplicons most of your reads are going to be duplicates, simply because you are sequencing identical amplicons. You need to keep these in to get proper depth for the analysis. Make sure your alignment pipeline is not removing duplicates, and use the above flag in freebayes to make sure the indel analysis is using them. |
![]() |
![]() |
![]() |
#17 | |
Member
Location: US Join Date: Nov 2009
Posts: 10
|
![]() Quote:
here is the samtools flaystat info: -bash-4.1$ samtools flagstat test_clean.sorted.bam 18255 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 18243 + 0 mapped (99.93%:-nan%) 18255 + 0 paired in sequencing 9104 + 0 read1 9151 + 0 read2 0 + 0 properly paired (0.00%:-nan%) 18231 + 0 with itself and mate mapped 12 + 0 singletons (0.07%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5) you can see there is no properly paired in the sample. Thanks R |
|
![]() |
![]() |
![]() |
#18 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
You have a different number of read 1 and read 2. It looks like the pairing got corrupted in preprocessing, which would explain why none are proper pairs. You should reprocess the raw reads using both files at the same time, and only pair-aware tools such as BBDuk, to keep the order intact. Then remap. And if you are interested in indels, I suggest using BBMap for mapping. Alternately, if the pairs are all supposed to overlap, you can get more accurate indel calls by merging them first and mapping the merged reads rather than mapping them as pairs.
|
![]() |
![]() |
![]() |
#19 | |
Member
Location: US Join Date: Nov 2009
Posts: 10
|
![]() Quote:
You suggested to use "merged" two reads into one fastq file and map them as single-end reads. I am not sure that freebayes software will work with the single-end reads file. Thanks R |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|