SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Genomic Resequencing (http://seqanswers.com/forums/forumdisplay.php?f=28)
-   -   Coverage of reads not homogenous (sureselect target enrichement) (http://seqanswers.com/forums/showthread.php?t=11739)

Sam64 06-01-2011 06:47 AM

Coverage of reads not homogenous (sureselect target enrichement)
 
Hi all,

I have successfully run a targeted enrichment with SureSelect (Agilent protocol) . And now, I try to analyze the data. I need your help for the interpretation of sequencing results.:D:D:D

The problem is about the coverage of reads which is not homogenous for a lot of regions.

For example, I took the region of exon 34 of NOTCH1 gene. In the middle of the exon, I can see a small region with a very low density of reads. I found this region in the tumoral and germline sample.
I can immediately eliminate the hypothesis of a deletion because we had already sequenced this gene and we had found only one mutation but not deletion. In other words, this gene is a control to check the results.

Other hypothesis is the alignment… so, if you think that is the problem, what would be the criteria to modify? (alignment software : BFAST, condition : 4 mismatches).

Other possibility, the hybridization between the baits (manufactured after the design) and the DNA were not enough specific ? maybe, there was a problem during the synthesis of baits…

What is your opinion about this result?
Have you already met this kind of problem?

Thank you for your help...:)


Here are parameters of the sequencing :
- sequencing protocol : paired-end
-Bait length : 120bp
-Bait Tiling Frequency : 1X
- Sequencer : Illumina GA II
- length reads = 2*150pb

Sam64 06-02-2011 02:53 AM

I have just found a topic which gave me a possible answer...
Topic of NGSfan untitled Agilent SureSelect - coverage of high GC regions.

Apparently, as shown in the paper of Tewhey et al (2009 Genome Biol), the regions of high GC content were difficult to capture.
So, I checked with my sample calculating the GC % between regions with a low and high density of reads.
Effectively, I obtained around 65-70% of GC in the regions with low density of reads and 45-50% of GC in the regions with normal or high density of reads. So, there is a real difference.
According to this result, it seems that the problem would come from a bad hybridization between the baits and my DNA because of too high GC %.

Sam

czhang 04-14-2017 09:37 AM

I have similar issue with the clinical research exome. Below is the picard analysis:
sample Sur1CD138 Sur1PBMC Sur4CD138 Sur4PBMC 80CD138 80PBMC 79CD138 79PBMC
GENOME_SIZE 3095693983 3095693983 3095693983 3095693983 3095693983 3095693983 3095693983 3095693983
BAIT_TERRITORY 54098923 54098923 54098923 54098923 54098923 54098923 54098923 54098923
TARGET_TERRITORY 54098923 54098923 54098923 54098923 54098923 54098923 54098923 54098923
BAIT_DESIGN_EFFICIENCY 1 1 1 1 1 1 1 1
TOTAL_READS 122,302,992.00 63,912,464.00 132,320,260.00 63,343,432.00 124,354,098.00 68,929,068.00 111,766,878.00 61,086,712.00
PF_READS 122,302,992.00 63,912,464.00 132,320,260.00 63,343,432.00 124,354,098.00 68,929,068.00 111,766,878.00 61,086,712.00
PF_UNIQUE_READS 65758604 47188078 71359006 41929232 63771538 44979246 81305218 51978018
PCT_PF_READS 1 1 1 1 1 1 1 1
PCT_PF_UQ_READS 0.53767 0.738324 0.53929 0.661935 0.512822 0.652544 0.727454 0.850889
PF_UQ_READS_ALIGNED 62990152 45491928 68250730 40276082 60535187 42684839 78521776 50351126
PCT_PF_UQ_READS_ALIGNED 0.9579 0.964056 0.956442 0.960573 0.949251 0.94899 0.965766 0.9687
PF_UQ_BASES_ALIGNED 4728064586 3415679484 5121777453 3025329378 4494520530 3151628260 5902647790 3786743621
ON_BAIT_BASES 2598046261 1914306672 2735769937 1788234387 2034203798 1475715186 3601924964 2334328319
NEAR_BAIT_BASES 865049095 618460185 943952969 550366284 736580126 573835232 1076180151 664925629
OFF_BAIT_BASES 1264969230 882912627 1442054547 686728707 1723736606 1102077842 1224542675 787489673
ON_TARGET_BASES 2598046261 1914306672 2735769937 1788234387 2034203798 1475715186 3601924964 2334328319
PCT_SELECTED_BASES 0.732455 0.741512 0.718446 0.773007 0.61648 0.650315 0.792543 0.79204
PCT_OFF_BAIT 0.267545 0.258488 0.281554 0.226993 0.38352 0.349685 0.207457 0.20796
ON_BAIT_VS_SELECTED 0.750209 0.755816 0.743472 0.76466 0.734162 0.720019 0.769954 0.778303
MEAN_BAIT_COVERAGE 48.023992 35.385301 50.569767 33.054898 37.601558 27.278088 66.580345 43.149257
MEAN_TARGET_COVERAGE 77.49201 65.917215 77.536289 70.424682 53.048056 46.272356 110.164486 86.094231
PCT_USABLE_BASES_ON_BAIT 0.281239 0.396554 0.273733 0.373751 0.216695 0.283749 0.426687 0.505942
PCT_USABLE_BASES_ON_TARGET 0.281239 0.396554 0.273733 0.373751 0.216695 0.283749 0.426687 0.505942
FOLD_ENRICHMENT 31.443645 32.070358 30.565273 33.823705 25.898851 26.793964 34.91863 35.274871
ZERO_CVG_TARGETS_PCT 0.459094 0.545919 0.423864 0.611789 0.356305 0.476303 0.47652 0.582538
FOLD_80_BASE_PENALTY ? ? ? ? 53.048056 46.272356 ? ?
PCT_TARGET_BASES_2X 0.306677 0.24914 0.338453 0.21699 0.477449 0.435653 0.29477 0.224048
PCT_TARGET_BASES_10X 0.13243 0.122595 0.134753 0.122725 0.151498 0.170015 0.131616 0.122126
PCT_TARGET_BASES_20X 0.120161 0.112346 0.121305 0.112283 0.123854 0.116528 0.119661 0.112129
PCT_TARGET_BASES_30X 0.114073 0.108954 0.114794 0.108733 0.116306 0.105001 0.113736 0.108653
HS_LIBRARY_SIZE 28162978 31706518 29925871 23817041 21100054 19256316 57962030 64088611
HS_PENALTY_10X -1 -1 -1 -1 -1 -1 -1 -1
HS_PENALTY_20X -1 -1 -1 -1 -1 -1 -1 -1
HS_PENALTY_30X -1 -1 -1 -1 -1 -1 -1 -1
AT_DROPOUT 0.412187 0.490799 0.472367 0.408114 12.07872 34.30034 0.557644 0.616652
GC_DROPOUT 16.061497 18.99645 15.736789 18.875212 1.592581 0.569414 19.477861 20.993818

The average coverage is normal but the coverage is not very uniformly.

Do you think which step I might make mistake?

Thanks.


All times are GMT -8. The time now is 05:25 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.