Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coverage of reads not homogenous (sureselect target enrichement)

    Hi all,

    I have successfully run a targeted enrichment with SureSelect (Agilent protocol) . And now, I try to analyze the data. I need your help for the interpretation of sequencing results.

    The problem is about the coverage of reads which is not homogenous for a lot of regions.

    For example, I took the region of exon 34 of NOTCH1 gene. In the middle of the exon, I can see a small region with a very low density of reads. I found this region in the tumoral and germline sample.
    I can immediately eliminate the hypothesis of a deletion because we had already sequenced this gene and we had found only one mutation but not deletion. In other words, this gene is a control to check the results.

    Other hypothesis is the alignment… so, if you think that is the problem, what would be the criteria to modify? (alignment software : BFAST, condition : 4 mismatches).

    Other possibility, the hybridization between the baits (manufactured after the design) and the DNA were not enough specific ? maybe, there was a problem during the synthesis of baits…

    What is your opinion about this result?
    Have you already met this kind of problem?

    Thank you for your help...


    Here are parameters of the sequencing :
    - sequencing protocol : paired-end
    -Bait length : 120bp
    -Bait Tiling Frequency : 1X
    - Sequencer : Illumina GA II
    - length reads = 2*150pb

  • #2
    I have just found a topic which gave me a possible answer...
    Topic of NGSfan untitled Agilent SureSelect - coverage of high GC regions.

    Apparently, as shown in the paper of Tewhey et al (2009 Genome Biol), the regions of high GC content were difficult to capture.
    So, I checked with my sample calculating the GC % between regions with a low and high density of reads.
    Effectively, I obtained around 65-70% of GC in the regions with low density of reads and 45-50% of GC in the regions with normal or high density of reads. So, there is a real difference.
    According to this result, it seems that the problem would come from a bad hybridization between the baits and my DNA because of too high GC %.

    Sam

    Comment


    • #3
      I have similar issue with the clinical research exome. Below is the picard analysis:
      sample Sur1CD138 Sur1PBMC Sur4CD138 Sur4PBMC 80CD138 80PBMC 79CD138 79PBMC
      GENOME_SIZE 3095693983 3095693983 3095693983 3095693983 3095693983 3095693983 3095693983 3095693983
      BAIT_TERRITORY 54098923 54098923 54098923 54098923 54098923 54098923 54098923 54098923
      TARGET_TERRITORY 54098923 54098923 54098923 54098923 54098923 54098923 54098923 54098923
      BAIT_DESIGN_EFFICIENCY 1 1 1 1 1 1 1 1
      TOTAL_READS 122,302,992.00 63,912,464.00 132,320,260.00 63,343,432.00 124,354,098.00 68,929,068.00 111,766,878.00 61,086,712.00
      PF_READS 122,302,992.00 63,912,464.00 132,320,260.00 63,343,432.00 124,354,098.00 68,929,068.00 111,766,878.00 61,086,712.00
      PF_UNIQUE_READS 65758604 47188078 71359006 41929232 63771538 44979246 81305218 51978018
      PCT_PF_READS 1 1 1 1 1 1 1 1
      PCT_PF_UQ_READS 0.53767 0.738324 0.53929 0.661935 0.512822 0.652544 0.727454 0.850889
      PF_UQ_READS_ALIGNED 62990152 45491928 68250730 40276082 60535187 42684839 78521776 50351126
      PCT_PF_UQ_READS_ALIGNED 0.9579 0.964056 0.956442 0.960573 0.949251 0.94899 0.965766 0.9687
      PF_UQ_BASES_ALIGNED 4728064586 3415679484 5121777453 3025329378 4494520530 3151628260 5902647790 3786743621
      ON_BAIT_BASES 2598046261 1914306672 2735769937 1788234387 2034203798 1475715186 3601924964 2334328319
      NEAR_BAIT_BASES 865049095 618460185 943952969 550366284 736580126 573835232 1076180151 664925629
      OFF_BAIT_BASES 1264969230 882912627 1442054547 686728707 1723736606 1102077842 1224542675 787489673
      ON_TARGET_BASES 2598046261 1914306672 2735769937 1788234387 2034203798 1475715186 3601924964 2334328319
      PCT_SELECTED_BASES 0.732455 0.741512 0.718446 0.773007 0.61648 0.650315 0.792543 0.79204
      PCT_OFF_BAIT 0.267545 0.258488 0.281554 0.226993 0.38352 0.349685 0.207457 0.20796
      ON_BAIT_VS_SELECTED 0.750209 0.755816 0.743472 0.76466 0.734162 0.720019 0.769954 0.778303
      MEAN_BAIT_COVERAGE 48.023992 35.385301 50.569767 33.054898 37.601558 27.278088 66.580345 43.149257
      MEAN_TARGET_COVERAGE 77.49201 65.917215 77.536289 70.424682 53.048056 46.272356 110.164486 86.094231
      PCT_USABLE_BASES_ON_BAIT 0.281239 0.396554 0.273733 0.373751 0.216695 0.283749 0.426687 0.505942
      PCT_USABLE_BASES_ON_TARGET 0.281239 0.396554 0.273733 0.373751 0.216695 0.283749 0.426687 0.505942
      FOLD_ENRICHMENT 31.443645 32.070358 30.565273 33.823705 25.898851 26.793964 34.91863 35.274871
      ZERO_CVG_TARGETS_PCT 0.459094 0.545919 0.423864 0.611789 0.356305 0.476303 0.47652 0.582538
      FOLD_80_BASE_PENALTY ? ? ? ? 53.048056 46.272356 ? ?
      PCT_TARGET_BASES_2X 0.306677 0.24914 0.338453 0.21699 0.477449 0.435653 0.29477 0.224048
      PCT_TARGET_BASES_10X 0.13243 0.122595 0.134753 0.122725 0.151498 0.170015 0.131616 0.122126
      PCT_TARGET_BASES_20X 0.120161 0.112346 0.121305 0.112283 0.123854 0.116528 0.119661 0.112129
      PCT_TARGET_BASES_30X 0.114073 0.108954 0.114794 0.108733 0.116306 0.105001 0.113736 0.108653
      HS_LIBRARY_SIZE 28162978 31706518 29925871 23817041 21100054 19256316 57962030 64088611
      HS_PENALTY_10X -1 -1 -1 -1 -1 -1 -1 -1
      HS_PENALTY_20X -1 -1 -1 -1 -1 -1 -1 -1
      HS_PENALTY_30X -1 -1 -1 -1 -1 -1 -1 -1
      AT_DROPOUT 0.412187 0.490799 0.472367 0.408114 12.07872 34.30034 0.557644 0.616652
      GC_DROPOUT 16.061497 18.99645 15.736789 18.875212 1.592581 0.569414 19.477861 20.993818

      The average coverage is normal but the coverage is not very uniformly.

      Do you think which step I might make mistake?

      Thanks.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      9 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X