![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Quantifying copies of a single locus | robinweide | Bioinformatics | 0 | 10-17-2014 03:28 PM |
Single-Cell Sequencing: From Individual Cancer Cells to Individualized Therapies | jhodges | Events / Conferences | 0 | 05-21-2014 07:56 AM |
Quantifying allele fraction of small to medium deletions in amplication sequencing | brentonmar | Bioinformatics | 1 | 08-08-2013 06:56 AM |
PubMed: Using Genetic Algorithm in Reconstructing Single Individual Haplotype with Mi | Newsbot! | Literature Watch | 0 | 04-03-2012 04:30 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: USA Join Date: Apr 2013
Posts: 42
|
![]()
Hi all, I've sequenced the mitochondria of an individual using Illumina (100bp PE, insertion size is 150-180nt). So far, I've been able to determine that its mitochondria has a heteroplasmy (estimate is 30-40%) of a ~230bp deletion relative to its reference.
I've yet to find a indel discovery program that also quantifies the relative allele frequency. Normal methods of indel/CNV quantification do not work because mitochondrial indels/CNV heteroplasmy would not occur as simple 50% increases/decreases, as would be seen in a genomic deletion/CNV. What I've done so far is mapped reads to the reference and allowed for large gaps (thereby splitting reads at the deletion) and calculated the fraction that split as a percent of coverage. However, there is an issue where many reads do not split across the deletion and instead have unmapped free ends, as the end of the read isn't long enough for the mapping software to call it as a deletion read. Therefore, there is an underestimate of the deletion frequency. Here's an example of the mapping output that shows reads mapping to the reference, split reads, and reads with unaligned free ends on either side of the deletion (should actually be contributing to the deletion read pool): ![]() I've also considered simply counting the raw reads that correspond to either the deletion or the reference and calculating the fraction that are deletion, e.g. reference: AAAAABBBBBCCCCC deletion read: AAAAACCCCC non-deletion read: AAAAABBBBB So for example, if I counted 2000 of AAAAACCCCC (deletion read) and 5000 of AAAAABBBBB (non-deletion read), the overall rate of the deletion is 2000/7000=29%. I'm not sure if this is an oversimplification, but it would take out the possibility of the bias caused by unaligned free ends. This topic has been visited before but I don't think a satisfactory answer has been found. So my overall question is, is it possible to accurately quantify deletions greater than your read length/insertion size in a non-ploidy population? |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Since your reads overlap, I suggest you merge them first with BBMerge to get longer reads (lower fraction of read will be at tips, which aids detection of indels), then map with BBMap, which will easily span those gaps. That will take 4 steps:
Code:
bbmerge.sh in=read1.fq in2=read2.fq out=merged.fq outu=unmerged.fq vloose bbmap.sh k=10 ref=mito.fa bbmap.sh k=10 in=merged.fq maxindel=400 tipsearch=300 slow out=mapped_merged.sam bbmap.sh k=10 in=unmerged.fq maxindel=400 tipsearch=300 slow out=mapped_unmerged.sam Last edited by Brian Bushnell; 04-28-2015 at 02:34 PM. |
![]() |
![]() |
![]() |
#3 |
Member
Location: USA Join Date: Apr 2013
Posts: 42
|
![]()
Hi Brian,
Thanks for the reply. I used BBMap according to your suggestions. The BBmerge merged ~95% of my reads, then I mapped it to my reference using k=10 (props for making the program very intuitive and easy to run). It definitely decreased the number of unmapped free ends around the deletion. However, the program I've been using to call the % haplotype frequency (Geneious) now calls the deletion as being multiple smaller deletions, rather than one single large one. This is a known issue with the program and I don't think it has been solved yet. Do you happen to be familiar with any indel calling method that would give me a rate of deletion using the alignment file derived from BBMap? Cheers, Jo |
![]() |
![]() |
![]() |
#4 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Hi Jo,
Sorry, I don't really do variant-calling anymore so I have no specific suggestions. You might try mpileup or GATK with indel-realignnment disabled. To calculate it semi-manually, you could calculate coverage in that region with and without reads containing deletions. BBMap's "pileup.sh" script will generate per-base coverage information, and it considers bases under deletion events to be covered. So if you run it twice, once on a sam file with all mapped reads and once on a sam file containing only deletion-free reads (you can run BBMap with the "delfilter=0" flag to ban alignments containing any deletions), then look at the coverage ratio before vs after in that region, that should be a good approximation of the haplotype frequency. Actually, you can directly output coverage from BBMap using the "basecov" flag instead of generating a sam file. |
![]() |
![]() |
![]() |
#5 |
Registered Vendor
Location: New Zealand Join Date: Jun 2011
Posts: 29
|
![]()
Hi Jo,
You can try the caller from RTG Core, which will call the haplotypes over the region as a whole, and includes an allelic depth attribute in the VCF which should include exactly the counts that you want. Specifically referring to the unaligned free ends, the complex haplotype caller internally performs a realignment of each of the reads against the candidate haplotypes, since those reads would presumably align best against the deletion haplotype, they will contribute to the AD count corresponding to the deletion allele. (Our caller is primarily set up for haploid and diploid calling rather than heteroplasmy, but if you tell it the MT is diploid it'll probably do OK.) Cheers, Len. |
![]() |
![]() |
![]() |
#6 |
Member
Location: USA Join Date: Apr 2013
Posts: 42
|
![]()
Hi all,
Here's an update. I used the pipeline suggested by Brian to allow for large gaps in bbmap (k=10 maxindel=400 tipsearch=300 slow). I then tried various callers to see what they would do with the ~220bp deletion. CLC Genomics Workbench: Called it as a deletion at 19% frequency VarScan indel: Called it at 23%, but missed a smaller obvious indel downstream. SNVer: Called it as a large substitution at 34% frequency Geneious: had a difficult time with the large deletion and instead called it as multiple smaller deletions. I believe this is a known problem with Geneious variant calling. Then I used Brian's second suggestion of the pileup.sh of bbmap. I compared the region per-site using "k=10 maxindel=400 tipsearch=300 slow" in one instance and then "delfilter=0" in the second. This gave a reduction in coverage of about 38%, indicating a deletion frequency of approximately 38%. So that's about a 20% range difference between calling methods. I'll give Len's suggestion a try next. Thanks all. Best, Josiah Last edited by jwag; 05-06-2015 at 11:08 AM. |
![]() |
![]() |
![]() |
#7 |
Member
Location: New Zealand Join Date: Mar 2014
Posts: 20
|
![]()
Hi Josiah,
Prior to variant calling, I recommend you try de novo assembling all the reads that map, then map those contigs (along with any unassembled reads) to the reference sequence. By doing that, the reads in the contigs should be correctly aligned around large indels, which will give you a better estimate of the deletion frequency. For the issue in Geneious where it sometimes splits the deletion into multiple smaller deletions, I've fixed this for the next Geneious release. If you're interested in seeing the results sooner, you could share your data with me and I'll run it through Geneious and send you back the results. But even with the split deletion you should still get a deletion frequency which should be accurate. |
![]() |
![]() |
![]() |
#8 |
Member
Location: USA Join Date: Apr 2013
Posts: 42
|
![]()
Hi Matt,
I will give that a try . I'm currently using Geneious for SNP calling -- it seems to be working pretty well. Do you know if people typically remove duplicate reads in an alignment prior to SNP calling with the Geneious caller? Thanks. |
![]() |
![]() |
![]() |
#9 |
Member
Location: New Zealand Join Date: Mar 2014
Posts: 20
|
![]()
Geneious only provides the ability to remove duplicates prior to alignment. It's probably a good idea to do so, but I don't know whether or not people typically do that.
I can see from your screenshot that you're using quite an old version of Geneious. Geneious variant calling has had a few improvements since then, so you may want to upgrade. |
![]() |
![]() |
![]() |
#10 |
Member
Location: USA Join Date: Apr 2013
Posts: 42
|
![]()
Hi again, I wanted to update progress for anyone still following the thread. I've been playing with BBmap, which appears to be doing very well with gapped alignments. I looked into the caller by RTG core but haven't used it yet, because according to the manual it does not call indels larger than 50bp (please correct me if I'm wrong, Len). Here's the pipeline that I think gives a good representation of both the indels and SNPs in a single alignment, in my case:
1. Map merged reads (>150bp after merging) to mtgenome with bbmap using k=10, maxindel=400, tipsearch=300, ambiguous=toss, minratio=0.7. I'm still debating if I should keep or toss ambiguously mapping reads... I thought since there are some highly repetitive regions, it might be best to toss them. 2. Remove duplicate mapping reads with Picard. This seems to be common practice (is done for example in the GATK variant calling pipeline), although people seem to get a bit fuzzy on whether this should be done or not when there is a high coverage saturation (as is the case my extremely deep sequencing of a single mtgenome). 3. Trim mapped reads 10bp on each end in the deduped bam file. I found an older thread that Brian posted in suggesting this can improve deletion calling (not sure if this is still your opinion, Brian). It seemed to help mask the few remaining unmapped free ends around deletion sites. The other option would be indel realignment using something like the GATK pipeline, but I think trimming works fine too. 4. SNP/variant call with Geneious and SNVer. Geneious actually did pretty well with calling the deletions in the BBmap gapped alignment -- hopefully I can find some funds for the new update, because even my older version seems to work better than most of the callers I've tried. If anyone sees a problem with this pipeline please let me know. Cheers! |
![]() |
![]() |
![]() |
#11 | ||
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]() Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#12 |
Member
Location: USA Join Date: Apr 2013
Posts: 42
|
![]()
Thanks for the reply Brian. The DNA was prepared using a Nexetra kit which I believe involves a few cycles of PCR. So just to double check, I shouldn't remove duplicates since I'm trying to quantify allelic frequencies?
For the post-alignment trim, I had the first 10 and last 10 bases in the aligned reads set to be N's -- I will have to look into how this would change the cigar string. |
![]() |
![]() |
![]() |
#13 | |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]() Quote:
I should really modify BBDuk to be able to trim mapped reads. Maybe I'll do that today. |
|
![]() |
![]() |
![]() |
#14 | |
Member
Location: USA Join Date: Apr 2013
Posts: 42
|
![]() Quote:
If BBDuk could trim mapped reads and also update the CIGAR string at the same time, that would be super useful. I've yet to find anything that can do that. Last edited by jwag; 05-13-2015 at 11:29 AM. |
|
![]() |
![]() |
![]() |
#15 |
Registered Vendor
Location: New Zealand Join Date: Jun 2011
Posts: 29
|
![]()
Jo,
The 1-50bp note is more about the default RTG mapping settings (you'd have to start adjusting the aligner settings to look for longer indels, at somewhat of a speed penalty). The variant caller will happily process anything you throw at it. The mappings in your original screenshot looked fine (in that there were reads spanning the deletion, plus some hanging into it, which all gets correctly taken into account by the caller during it's local realignment to the indel hypotheses, so I would say do *not* trim the ends of the reads after alignment). |
![]() |
![]() |
![]() |
#16 | |
Member
Location: USA Join Date: Apr 2013
Posts: 42
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#17 | |
Member
Location: USA Join Date: Apr 2013
Posts: 42
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|