Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exome Sequencing reads alignment outside capture regions

    For illumina exome sequencing pair end single lane data with Agilent Sure Select protocal, anyone know what is the expected amount of reads you will get from non exon part of genome? Because I found a large portion, 30% of reads aligned outside of capture region. If you have worked with exome sequencing, is this a reasonable number?

    Thanks

  • #2
    That is reasonable.. we see ~40% off target. Perhaps this number changes with their latest 50MB capture kit!
    --
    bioinfosm

    Comment


    • #3
      Here as well. Using solid: 65-70 % of bases (of mappable reads) are on target.

      Comment


      • #4
        Are those reads aligned outside capture region really from non capture regions or are they just misaligned to those place.

        Comment


        • #5
          Out of curiosity, how are you defining 'off target' vs. 'on target' reads. Do you mean reads that fall entirely within the capture sequence (i.e. the boundaries of each probe), or overlap the target region by at least one base, or are within distance X of a target, etc.??

          Comment


          • #6
            I counted the individual base hit in the capture region and outside capture region. I suppose I also could just count the starting of the read loc, it wouldn't make much difference.

            Comment


            • #7
              I pad the target regions with ~80-100bp and then do the intersection with my bam files using bedtools:

              Code:
              intersectBed -wa -abam alignments.bam -b target_regions.100bp_padded.bed > alignments.ontarget.bam

              Comment


              • #8
                Thanks zee. That's what I was wondering. Without the padding it seems like you could substantially overestimate the rate of off-target hits. And since the hybridization is going to grab DNA fragments that have partial overlap with the target regions, you expect a fair amount of off-target sequence. Aligned bases that are within ~80-100 bases off the target region are quite different from bases that are very distant from the target region when evaluating the success of the enrichment.
                Last edited by malachig; 09-16-2010, 10:24 PM.

                Comment


                • #9
                  I picked up this hint from the Bainbridge et al., 2010 paper on whole exome capture sequencing. There is a section in the Materials and Methods which contains:

                  Target exons were padded to a minimum length of 80 bp, and consolidated to remove redundant overlaps.
                  It makes sense to allow for some padding around the target region. Even with their protocol they only recovered at most 51% for SOLiD and 78% with Illumina PE.

                  Comment


                  • #10
                    Originally posted by malachig View Post
                    Out of curiosity, how are you defining 'off target' vs. 'on target' reads. Do you mean reads that fall entirely within the capture sequence (i.e. the boundaries of each probe), or overlap the target region by at least one base, or are within distance X of a target, etc.??
                    We do not use "on target reads" but "on target bases". From all mapped reads the percentage of bases "on target" is determined. So if a read overlaps "target region" for 10%, it contributes 10%. And if overlaps almost completely at 95%, it contributes 95%. It seems a more honest definition of "on target" to me than some cutoff (overlap or distance)...

                    Comment


                    • #11
                      Percentage of coverage for each chromosome - Agilent design

                      Hi there,
                      Does anyone have some numbers on % of coverage for each individual chromosome? I got some weird results for chr7. Looking at the file Agilent provided us with the probe design (based on hg18) I saw that the last coordinates for chr7 were
                      chr7 158828612 158828732
                      chr7 158829397 158829517
                      chr7 158829517 158829637
                      chr7 158835676 158835796
                      chr7 158835748 158835868
                      chr7 158851160 158851280
                      chr7 158896436 158896556
                      chr7 158902496 158902616
                      chr7 158935127 158935247
                      chr7 158937377 158937497

                      whereas the length of chr7 in hg18 was 158821424
                      Was the whole-exome capture kit (the one for 38Mb) designed on hg18 or hg19 human reference. We were told it was hg18.
                      I'm very confused...

                      Regards,

                      S.

                      Comment


                      • #12
                        Originally posted by Sheila View Post
                        Hi there,
                        chr7 158937377 158937497

                        whereas the length of chr7 in hg18 was 158821424
                        Was the whole-exome capture kit (the one for 38Mb) designed on hg18 or hg19 human reference. We were told it was hg18.
                        It is designed to hg19. From my hg19 annotation file:
                        chr7 158937377 158937497 A_36_B106723 1000 +

                        You can get the original file on the agilent earray site
                        earray.chem.agilent.com/earray/

                        Comment


                        • #13
                          Originally posted by Sheila View Post
                          Hi there,
                          Does anyone have some numbers on % of coverage for each individual chromosome? I got some weird results for chr7. Looking at the file Agilent provided us with the probe design (based on hg18) I saw that the last coordinates for chr7 were
                          chr7 158828612 158828732
                          chr7 158829397 158829517
                          ...

                          S.
                          Hi Sheila,

                          I contacted Agilent and they didn't send me the reference file for the exome. As I read, it seems that they did send you this file, so could you please send me your reference file? As I read you used the whole-exome capture kit (the one for 38Mb) in your analysis.

                          Thanks

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-27-2024, 06:37 PM
                          0 responses
                          13 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-27-2024, 06:07 PM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          69 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X