I would like to sequence a pool (hundreds to thousands) of mutagenized variants, each with 3 point mutations at varying loci from the same parental sequence and the length of the sequence is ~1.5 kb, to see which variant dominates in the mutagenesis population. Can you recommend a NGS platform that fits into this application? What is strategy to create the sequencing library? Thanks!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
There are a few problems you will face with this project. The first is if you need haplotype information--do you need to know if the 3 variants in each particular fragment, or just the allele frequency at each position? The second is detecting low-frequency mutations. Typically identifying variants at less than 1% is considered problematic since the error rate of most platforms is around that, thus 1000 reads will have a background of 10 As, 10 Ts and 10 Gs even if the nucleotide at that position is always C.
If you need a haplotype, it gets complicated with short reads. Maybe ultra-high depth PacBio reads? I haven't seen how good the consensus sequences can be for that. If you don't, then some approaches have been to tag each fragment with a randomer sequence, or to carry out paired-end sequencing of short fragments so that the two reads confirm each other. My academic lab gets good allele detection at 1 in 5000 or so using the paired-end, low-error approach.Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
-
You are right. I need haplotypes and I must capture the 3 point mutations from each of the fragments. PacBio might work in this case due to the long read length, but I am concerned about the high error rate. Paired-end sequencing seems to be a good approach. Should I use 454 for this paired-end approach? All the other NGS platforms apparently don't sequence >750 bp. Can I use paired-end library protocol to barcode or tag each of the fragment?
Comment
-
I added the PacBio bit at the last second, but it would have to be the circular consensus style so that the re-sequencing was on the same fragment, otherwise the noise would drown everything out. It would get pretty pricey to do it on a 1.5 kb fragment and trying to get 5-7 reads off of it to make a consensus.
I hadn't tried the PE overlap on 454, but it could work, I suppose. Hmm... interesting problem overall!Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
Comment
-
This would be relatively straightforward on PacBio, if I understand correctly that you are looking for the most abundant phased variants in a 1.5kb fragment that you are deep sequencing. I don't see a need for ccs, if you are not looking for extremely rare events. PacBio errors are random, it will be easy to show an underlying pattern of phased variation in raw reads.
Full disclosure, I work at PacBio.
Comment
-
Good point. I've been on the rare variant path recently, so I read into the initial question that the allelic frequencies might be sub 1%. Depending on the project, that doesn't have to be the case! Of course, if they are that high, sequencing 24 clones by Sanger would also give the answer.Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
Comment
-
Right - I am not looking for extremely rare events. In contrast, I am more interested in variants that exhibit high frequency on the PacBio. I have two questions here regarding PacBio: 1) Would ccs have sufficient coverage to show most of the variants? 2) If I only look at the raw reads, how can I distinguish a random sequencing error from the true variant? To me, it does not seem easy to find an underlying pattern if sequencing errors are random.
Comment
-
If some 3-mutation variant was 30% of the population, then that would show up by consensus. What do you think will be the frequency levels of the highest variant? I suppose there are additional complications if several variants share one mutation but not the others, as the consensus wouldn't make much sense.Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
Comment
-
1. The problem with CCS would defiantly be throughput, feasibility would depend on the frequency of the variant and the size of the pool.
2. I have a background in signal processing, and am now doing bioinformatics, so it is probably just me that thinks this is an easy problem . Actually I'm not sure how off the shelf the solution would be, this would be relatively novel. But conceptually if you have an alignment of even very noisy data, the signal (variants) are uncorrelated with the noise, http://en.wikipedia.org/wiki/Signal_averaging, having 3 phased variants just makes things simpler. The strength in the statistics comes from looking at the entire alignment, not just single reads.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:47 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Today, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Comment