Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to call "gaps" from spliced read alignment?

    Hi all,

    I have aligned my long 454 reads to the reference genome by using GMAP, a spliced read aligner. Now I want to extract the best set of intronic regions (gaps) in the alignments. Is there any tool could generate such information from a SAM file?

    I feel this task is somewhat similar with indel calling, if you considered intron regions as huge deletions. But I'm not sure whether I can use indel calling tools to fit my need.

    A simple (and naive) way I can image is counting the coverage of the gaps. The chromosome regions with many gaps mapped should be reliable intronic regions. For this purpose, is there a convenient way to calculate the coverage of gaps from a SAM file?

    Thanks a lot,
    Shuli

  • #2
    Are you familiar with R/Bioconductor at all? It should be pretty straightforward to make a table of how many times each intron is observed using the GenomicRanges package.

    Comment


    • #3
      Thanks. It seems this package doesn't output the gap regions directly:

      The genomic intervals between the "start" and "end" of the query that are "covered" by the alignment. Saying that the full [start,end] interval is covered is the same as saying that the alignment has no gap (no N in the CIGAR). It is then considered a simple alignment. Note that a simple alignment can have ismatches or deletions (in the reference). In other words, a deletion, encoded with a D, is NOT considered a gap.

      So I still need calculate the gap regions from the CIGAR strings...

      And I have just found a solution this afternoon. BEDTools (http://code.google.com/p/bedtools/) could calculate the coverage for each nucleotide with/without intronic regions counted. Then the difference is the the coverage/depth in the intronic regions.

      Comment


      • #4
        tabulating gaps in bam files using bioconductor's GenomicRanges IS POSSIBLE

        Originally posted by sulicon View Post
        Thanks. It seems this package doesn't output the gap regions directly:
        ...
        I think you may misunderstand the manual. In any case, the key word in your comment is perhaps "directly"

        I've just blogged on using GenomicRanges for this exact purpose here. It works quite nicely.

        Originally posted by sulicon View Post
        And I have just found a solution this afternoon. BEDTools (http://code.google.com/p/bedtools/) could calculate the coverage for each nucleotide with/without intronic regions counted. Then the difference is the the coverage/depth in the intronic regions.
        Hmmm. I wondered about this approach too. At the very least it requires you to know beforehand the location of all your introns. Let us know how you fare if you take this route....
        Last edited by malcook; 06-09-2011, 08:31 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X