Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Where did my off target reads go?

    So, I have a SureSelect custom designed assay and have calculated my % on target as around 45% using Picard CalculateHSMetrics. While this is helpful, it would be really interesting to know where the other 55% has aligned.
    According to Picard, another 20% ish is near my target region, so what about the rest.
    Is there a way of finding out to where the rest has aligned?
    The sort of thing that I would be interested in is in any ?pseudogene areas or regions of homologous to my targeted regions with hot-spots of 'miss- aligned' reads...
    Thanks

    Chris

  • #2
    What I did once was take all of my bait sequences and blast them against the genome, then I took all of the hits that had at least 50 of 120bp match identically (not necessarily a consecutive stretch of 50), and then saw how much of my data aligned there. It was only about 10% (with about 45% of my data aligning on or near target. I think the rest was just random, but you can try something similar.

    Comment


    • #3
      I was looking for something a bit more systematic, but can see your logic.
      The sort of thing I as after was an approach that, say scanned the whole genome for aligned reads with read depth over a specific threshold, and exclude my targeted regions for alignment calling.

      I haven't done any rnaSeq but I was wondering if I could treat my reads as cDNA and do something a bit like expression analysis (probably have got my terminology wrong as I've only done re-sequencing experiments thus far)...

      Am I on the right track, should I be looking at a tophat & bowtie approach?

      Chris

      Comment


      • #4
        I like the idea of determining coverage everywhere and going from there. You can probably use GATK's DepthOfCoverage, calculate average coverage in intervals of say 100bp (make sure that your intervals are not end to end as GATK will forcibly merge them... ex: chr1:1-100, chr1:101-200 will not work, you'll want chr1:1-100, chr1:102-201, etc), make sure your intervals cover the whole genome excluding target regions, and then determine what intervals have over say 5x average coverage.

        Comment


        • #5
          That sounds like a good idea.
          What's the best way of going about constructing a consecutive intervals file for the whole genome? or can I configure the DepthOfCoverage walker to do this sort of assessment with a switch?
          Chris

          Comment


          • #6
            Originally posted by swNGS View Post
            That sounds like a good idea.
            What's the best way of going about constructing a consecutive intervals file for the whole genome? or can I configure the DepthOfCoverage walker to do this sort of assessment with a switch?
            Chris
            I don't think there's an easy way with DepthOfCoverage. I would write a script to generate the file. It may even be easier to, instead of doing intervals of 100bp, do intervals as large as possible that are outside of your targeted regions and have the coverage for each base output. That will be a large file with almost 3 billion lines, but you only have to do it once, and then you can use a simple awk command to take the subset of bases that have at least X coverage and go from there.

            Comment


            • #7
              It looks like bedtools will at least genrate a bed file of intervals which i could easily convert into an interval list for GATK.
              See this link:


              What about turning the idea of treating as an rnaSeq experiment and treating the reads as 'expressed' RNA?
              Last edited by swNGS; 03-18-2012, 11:13 AM. Reason: Update

              Comment


              • #8
                I don't do rnaSeq but if that makes sense and is doable then go for it. If you can make BEDtools work, that's great too; remember that the intervals for GATK cannot be end to end, they have to have 1bp space in between, or else GATK will merge them.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 08:47 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                59 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                54 views
                0 likes
                Last Post seqadmin  
                Working...
                X