SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Service Providers



Similar Threads
Thread Thread Starter Forum Replies Last Post
Targeted sequencing ulz_peter 454 Pyrosequencing 1 05-04-2010 05:15 AM
Targeted Sequencing - How are you doing sample prep for your targeted sequencing prj? mike lee Sample Prep / Library Generation 0 01-26-2010 07:00 PM
Evaluation of next generation sequencing platforms for population targeted sequencing div1982 Literature Watch 0 04-23-2009 06:39 AM

Reply
 
Thread Tools
Old 10-02-2013, 12:36 PM   #1
dnajuice
Junior Member
 
Location: USA

Join Date: Aug 2012
Posts: 8
Default targeted mutagenesis sequencing

I would like to sequence a pool (hundreds to thousands) of mutagenized variants, each with 3 point mutations at varying loci from the same parental sequence and the length of the sequence is ~1.5 kb, to see which variant dominates in the mutagenesis population. Can you recommend a NGS platform that fits into this application? What is strategy to create the sequencing library? Thanks!
dnajuice is offline   Reply With Quote
Old 10-02-2013, 03:18 PM   #2
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 416
Default

There are a few problems you will face with this project. The first is if you need haplotype information--do you need to know if the 3 variants in each particular fragment, or just the allele frequency at each position? The second is detecting low-frequency mutations. Typically identifying variants at less than 1% is considered problematic since the error rate of most platforms is around that, thus 1000 reads will have a background of 10 As, 10 Ts and 10 Gs even if the nucleotide at that position is always C.

If you need a haplotype, it gets complicated with short reads. Maybe ultra-high depth PacBio reads? I haven't seen how good the consensus sequences can be for that. If you don't, then some approaches have been to tag each fragment with a randomer sequence, or to carry out paired-end sequencing of short fragments so that the two reads confirm each other. My academic lab gets good allele detection at 1 in 5000 or so using the paired-end, low-error approach.
__________________
Providing nextRAD genotyping services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 10-02-2013, 03:44 PM   #3
dnajuice
Junior Member
 
Location: USA

Join Date: Aug 2012
Posts: 8
Default

You are right. I need haplotypes and I must capture the 3 point mutations from each of the fragments. PacBio might work in this case due to the long read length, but I am concerned about the high error rate. Paired-end sequencing seems to be a good approach. Should I use 454 for this paired-end approach? All the other NGS platforms apparently don't sequence >750 bp. Can I use paired-end library protocol to barcode or tag each of the fragment?
dnajuice is offline   Reply With Quote
Old 10-02-2013, 03:57 PM   #4
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 416
Default

I added the PacBio bit at the last second, but it would have to be the circular consensus style so that the re-sequencing was on the same fragment, otherwise the noise would drown everything out. It would get pretty pricey to do it on a 1.5 kb fragment and trying to get 5-7 reads off of it to make a consensus.

I hadn't tried the PE overlap on 454, but it could work, I suppose. Hmm... interesting problem overall!
__________________
Providing nextRAD genotyping services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 10-03-2013, 09:40 AM   #5
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 309
Default

This would be relatively straightforward on PacBio, if I understand correctly that you are looking for the most abundant phased variants in a 1.5kb fragment that you are deep sequencing. I don't see a need for ccs, if you are not looking for extremely rare events. PacBio errors are random, it will be easy to show an underlying pattern of phased variation in raw reads.
Full disclosure, I work at PacBio.
rhall is offline   Reply With Quote
Old 10-03-2013, 10:15 AM   #6
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 416
Default

Good point. I've been on the rare variant path recently, so I read into the initial question that the allelic frequencies might be sub 1%. Depending on the project, that doesn't have to be the case! Of course, if they are that high, sequencing 24 clones by Sanger would also give the answer.
__________________
Providing nextRAD genotyping services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 10-03-2013, 10:24 AM   #7
dnajuice
Junior Member
 
Location: USA

Join Date: Aug 2012
Posts: 8
Default

Right - I am not looking for extremely rare events. In contrast, I am more interested in variants that exhibit high frequency on the PacBio. I have two questions here regarding PacBio: 1) Would ccs have sufficient coverage to show most of the variants? 2) If I only look at the raw reads, how can I distinguish a random sequencing error from the true variant? To me, it does not seem easy to find an underlying pattern if sequencing errors are random.
dnajuice is offline   Reply With Quote
Old 10-03-2013, 10:44 AM   #8
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 416
Default

If some 3-mutation variant was 30% of the population, then that would show up by consensus. What do you think will be the frequency levels of the highest variant? I suppose there are additional complications if several variants share one mutation but not the others, as the consensus wouldn't make much sense.
__________________
Providing nextRAD genotyping services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 10-03-2013, 11:18 AM   #9
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 309
Default

1. The problem with CCS would defiantly be throughput, feasibility would depend on the frequency of the variant and the size of the pool.
2. I have a background in signal processing, and am now doing bioinformatics, so it is probably just me that thinks this is an easy problem . Actually I'm not sure how off the shelf the solution would be, this would be relatively novel. But conceptually if you have an alignment of even very noisy data, the signal (variants) are uncorrelated with the noise, http://en.wikipedia.org/wiki/Signal_averaging, having 3 phased variants just makes things simpler. The strength in the statistics comes from looking at the entire alignment, not just single reads.
rhall is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO