Problem:
I am having trouble confirming my rare snps that I sequenced from a large pool of individuals.
Details:
I have 4800 (96wells*50plates) patient samples' DNA (diploid). I want to screen a few genes (let's say 5) in all 4800 samples. Let's assume that the snps are rare enough that any one snp should only be in 1-2 samples.
The first (pricey and laborious) approach would be to do 4800 PCR reactions * 5 genes, shear, tag with 4800 barcodes/illumina barcode/read/etc, and sequence away.
Instead, what I did was to pool the 4800 samples into pools of 96 samples. So now I have 50 pools to do the PCR, shear, 50 barcodes, tag, and sequence. However, now that I confirmed that (let's say) pool#35 has the snp that I want, I have to deconvolve that pool by Sanger sequencing/RFLP. This is where I am having trouble.
It seems that once I go to the 96samples that I know have the mutation, I cannot confirm it in any of them (maybe PCR error in the original amplicon PCR). Also, it seems that most of my snps being called are of not so great quality (I don't know why)...
Does anybody have any suggestions?
If this was just to find any old snp in the 4800 samples, we would use a large library of random barcodes... however, in the end, our goal is to go back to the original sample.
Maybe amplifying 250bp amplicons so that we can get full overlap using MiSeq 2x250 would increase the quality of called snps.
Maybe Nextera XT could do all the 'tagmentation' for us... but using 1ng of starting material and amplifying again makes me feel that the 1 in 192 alleles that may have the snp would get lost. (192 alleles = 96 samples in one pool *2), Then we use 50 indices...
Any help would be greatly appreciated,
A_Shah
I am having trouble confirming my rare snps that I sequenced from a large pool of individuals.
Details:
I have 4800 (96wells*50plates) patient samples' DNA (diploid). I want to screen a few genes (let's say 5) in all 4800 samples. Let's assume that the snps are rare enough that any one snp should only be in 1-2 samples.
The first (pricey and laborious) approach would be to do 4800 PCR reactions * 5 genes, shear, tag with 4800 barcodes/illumina barcode/read/etc, and sequence away.
Instead, what I did was to pool the 4800 samples into pools of 96 samples. So now I have 50 pools to do the PCR, shear, 50 barcodes, tag, and sequence. However, now that I confirmed that (let's say) pool#35 has the snp that I want, I have to deconvolve that pool by Sanger sequencing/RFLP. This is where I am having trouble.
It seems that once I go to the 96samples that I know have the mutation, I cannot confirm it in any of them (maybe PCR error in the original amplicon PCR). Also, it seems that most of my snps being called are of not so great quality (I don't know why)...
Does anybody have any suggestions?
If this was just to find any old snp in the 4800 samples, we would use a large library of random barcodes... however, in the end, our goal is to go back to the original sample.
Maybe amplifying 250bp amplicons so that we can get full overlap using MiSeq 2x250 would increase the quality of called snps.
Maybe Nextera XT could do all the 'tagmentation' for us... but using 1ng of starting material and amplifying again makes me feel that the 1 in 192 alleles that may have the snp would get lost. (192 alleles = 96 samples in one pool *2), Then we use 50 indices...
Any help would be greatly appreciated,
A_Shah