Hey Everyone,
I'm using a targeted capture approach for sequencing the variable regions in Ig, we're trying to develop a basis for genotyping experiments for Ig. We're going with HiSeq, we have too many samples for PacBio. Ig is hard because of the huge amounts of structural variations that can occur (frequent insertions, deletions, duplications, and 'complex' events) which make sequencing with NGS difficult.
Here's my plan, design an enzyme cocktail that will chop at defined regions in the Ig region from one sample into ~1kb+ segments with the majority of segments in the 1kb-20kb range. Size select (I'm proposing <1kb, 1kb-5kb, 5kb-15kb, 15kb+) and separately fragment -> index each size pool. Then pool everything together (and include a non-restricted regular library prepped genome), and do the capture, amp, and sequencing.
The thought is that the extra information (if the read came from a 1kb, 5kb, 15kb, or 15kb+ region) will help differentiate reads during alignment, indel, and read depth analysis. An ex: Reads 1-2-3-4-5 all align to the same area, reads 1-2-5 are 15kb indexed, 3-4 are 1kb indexed, so it is likely 1-2-5 and 3-4 are from separate areas of the region.
I haven't been able to find anything too similar. I've mostly been going off of RADseq papers but since I still want to sequence the whole region I'm just size selecting and indexing separately then pooling, nothing gets thrown out from the region (assuming everything in the region is capture-able by our custom capture). I've considered doing PacBio for 5-10% of the samples to do de novo assembly and align the HiSeq data to that reference (hg19 is a very poor reference for variable regions in Ig).
Any thoughts/advice?
I'm using a targeted capture approach for sequencing the variable regions in Ig, we're trying to develop a basis for genotyping experiments for Ig. We're going with HiSeq, we have too many samples for PacBio. Ig is hard because of the huge amounts of structural variations that can occur (frequent insertions, deletions, duplications, and 'complex' events) which make sequencing with NGS difficult.
Here's my plan, design an enzyme cocktail that will chop at defined regions in the Ig region from one sample into ~1kb+ segments with the majority of segments in the 1kb-20kb range. Size select (I'm proposing <1kb, 1kb-5kb, 5kb-15kb, 15kb+) and separately fragment -> index each size pool. Then pool everything together (and include a non-restricted regular library prepped genome), and do the capture, amp, and sequencing.
The thought is that the extra information (if the read came from a 1kb, 5kb, 15kb, or 15kb+ region) will help differentiate reads during alignment, indel, and read depth analysis. An ex: Reads 1-2-3-4-5 all align to the same area, reads 1-2-5 are 15kb indexed, 3-4 are 1kb indexed, so it is likely 1-2-5 and 3-4 are from separate areas of the region.
I haven't been able to find anything too similar. I've mostly been going off of RADseq papers but since I still want to sequence the whole region I'm just size selecting and indexing separately then pooling, nothing gets thrown out from the region (assuming everything in the region is capture-able by our custom capture). I've considered doing PacBio for 5-10% of the samples to do de novo assembly and align the HiSeq data to that reference (hg19 is a very poor reference for variable regions in Ig).
Any thoughts/advice?
Comment