Seqanswers Leaderboard Ad

**SNPsaurus** · 10-02-2013, 03:18 PM

There are a few problems you will face with this project. The first is if you need haplotype information--do you need to know if the 3 variants in each particular fragment, or just the allele frequency at each position? The second is detecting low-frequency mutations. Typically identifying variants at less than 1% is considered problematic since the error rate of most platforms is around that, thus 1000 reads will have a background of 10 As, 10 Ts and 10 Gs even if the nucleotide at that position is always C.

If you need a haplotype, it gets complicated with short reads. Maybe ultra-high depth PacBio reads? I haven't seen how good the consensus sequences can be for that. If you don't, then some approaches have been to tag each fragment with a randomer sequence, or to carry out paired-end sequencing of short fragments so that the two reads confirm each other. My academic lab gets good allele detection at 1 in 5000 or so using the paired-end, low-error approach.

**dnajuice** · 10-02-2013, 03:44 PM

You are right. I need haplotypes and I must capture the 3 point mutations from each of the fragments. PacBio might work in this case due to the long read length, but I am concerned about the high error rate. Paired-end sequencing seems to be a good approach. Should I use 454 for this paired-end approach? All the other NGS platforms apparently don't sequence >750 bp. Can I use paired-end library protocol to barcode or tag each of the fragment?

**SNPsaurus** · 10-02-2013, 03:57 PM

I added the PacBio bit at the last second, but it would have to be the circular consensus style so that the re-sequencing was on the same fragment, otherwise the noise would drown everything out. It would get pretty pricey to do it on a 1.5 kb fragment and trying to get 5-7 reads off of it to make a consensus.

I hadn't tried the PE overlap on 454, but it could work, I suppose. Hmm... interesting problem overall!

**rhall** · 10-03-2013, 09:40 AM

This would be relatively straightforward on PacBio, if I understand correctly that you are looking for the most abundant phased variants in a 1.5kb fragment that you are deep sequencing. I don't see a need for ccs, if you are not looking for extremely rare events. PacBio errors are random, it will be easy to show an underlying pattern of phased variation in raw reads.
Full disclosure, I work at PacBio.

**SNPsaurus** · 10-03-2013, 10:15 AM

Good point. I've been on the rare variant path recently, so I read into the initial question that the allelic frequencies might be sub 1%. Depending on the project, that doesn't have to be the case! Of course, if they are that high, sequencing 24 clones by Sanger would also give the answer.

**dnajuice** · 10-03-2013, 10:24 AM

Right - I am not looking for extremely rare events. In contrast, I am more interested in variants that exhibit high frequency on the PacBio. I have two questions here regarding PacBio: 1) Would ccs have sufficient coverage to show most of the variants? 2) If I only look at the raw reads, how can I distinguish a random sequencing error from the true variant? To me, it does not seem easy to find an underlying pattern if sequencing errors are random.

**SNPsaurus** · 10-03-2013, 10:44 AM

If some 3-mutation variant was 30% of the population, then that would show up by consensus. What do you think will be the frequency levels of the highest variant? I suppose there are additional complications if several variants share one mutation but not the others, as the consensus wouldn't make much sense.

**rhall** · 10-03-2013, 11:18 AM

1. The problem with CCS would defiantly be throughput, feasibility would depend on the frequency of the variant and the size of the pool.
2. I have a background in signal processing, and am now doing bioinformatics, so it is probably just me that thinks this is an easy problem

. Actually I'm not sure how off the shelf the solution would be, this would be relatively novel. But conceptually if you have an alignment of even very noisy data, the signal (variants) are uncorrelated with the noise, http://en.wikipedia.org/wiki/Signal_averaging, having 3 phased variants just makes things simpler. The strength in the statistics comes from looking at the entire alignment, not just single reads.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

targeted mutagenesis sequencing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News