Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Where did my off target reads go?

    So, I have a SureSelect custom designed assay and have calculated my % on target as around 45% using Picard CalculateHSMetrics. While this is helpful, it would be really interesting to know where the other 55% has aligned.
    According to Picard, another 20% ish is near my target region, so what about the rest.
    Is there a way of finding out to where the rest has aligned?
    The sort of thing that I would be interested in is in any ?pseudogene areas or regions of homologous to my targeted regions with hot-spots of 'miss- aligned' reads...
    Thanks

    Chris

  • #2
    What I did once was take all of my bait sequences and blast them against the genome, then I took all of the hits that had at least 50 of 120bp match identically (not necessarily a consecutive stretch of 50), and then saw how much of my data aligned there. It was only about 10% (with about 45% of my data aligning on or near target. I think the rest was just random, but you can try something similar.

    Comment


    • #3
      I was looking for something a bit more systematic, but can see your logic.
      The sort of thing I as after was an approach that, say scanned the whole genome for aligned reads with read depth over a specific threshold, and exclude my targeted regions for alignment calling.

      I haven't done any rnaSeq but I was wondering if I could treat my reads as cDNA and do something a bit like expression analysis (probably have got my terminology wrong as I've only done re-sequencing experiments thus far)...

      Am I on the right track, should I be looking at a tophat & bowtie approach?

      Chris

      Comment


      • #4
        I like the idea of determining coverage everywhere and going from there. You can probably use GATK's DepthOfCoverage, calculate average coverage in intervals of say 100bp (make sure that your intervals are not end to end as GATK will forcibly merge them... ex: chr1:1-100, chr1:101-200 will not work, you'll want chr1:1-100, chr1:102-201, etc), make sure your intervals cover the whole genome excluding target regions, and then determine what intervals have over say 5x average coverage.

        Comment


        • #5
          That sounds like a good idea.
          What's the best way of going about constructing a consecutive intervals file for the whole genome? or can I configure the DepthOfCoverage walker to do this sort of assessment with a switch?
          Chris

          Comment


          • #6
            Originally posted by swNGS View Post
            That sounds like a good idea.
            What's the best way of going about constructing a consecutive intervals file for the whole genome? or can I configure the DepthOfCoverage walker to do this sort of assessment with a switch?
            Chris
            I don't think there's an easy way with DepthOfCoverage. I would write a script to generate the file. It may even be easier to, instead of doing intervals of 100bp, do intervals as large as possible that are outside of your targeted regions and have the coverage for each base output. That will be a large file with almost 3 billion lines, but you only have to do it once, and then you can use a simple awk command to take the subset of bases that have at least X coverage and go from there.

            Comment


            • #7
              It looks like bedtools will at least genrate a bed file of intervals which i could easily convert into an interval list for GATK.
              See this link:


              What about turning the idea of treating as an rnaSeq experiment and treating the reads as 'expressed' RNA?
              Last edited by swNGS; 03-18-2012, 11:13 AM. Reason: Update

              Comment


              • #8
                I don't do rnaSeq but if that makes sense and is doable then go for it. If you can make BEDtools work, that's great too; remember that the intervals for GATK cannot be end to end, they have to have 1bp space in between, or else GATK will merge them.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  Yesterday, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 06:57 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 07:17 AM
                0 responses
                14 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Working...
                X