Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to call "gaps" from spliced read alignment?

    Hi all,

    I have aligned my long 454 reads to the reference genome by using GMAP, a spliced read aligner. Now I want to extract the best set of intronic regions (gaps) in the alignments. Is there any tool could generate such information from a SAM file?

    I feel this task is somewhat similar with indel calling, if you considered intron regions as huge deletions. But I'm not sure whether I can use indel calling tools to fit my need.

    A simple (and naive) way I can image is counting the coverage of the gaps. The chromosome regions with many gaps mapped should be reliable intronic regions. For this purpose, is there a convenient way to calculate the coverage of gaps from a SAM file?

    Thanks a lot,
    Shuli

  • #2
    Are you familiar with R/Bioconductor at all? It should be pretty straightforward to make a table of how many times each intron is observed using the GenomicRanges package.

    Comment


    • #3
      Thanks. It seems this package doesn't output the gap regions directly:

      The genomic intervals between the "start" and "end" of the query that are "covered" by the alignment. Saying that the full [start,end] interval is covered is the same as saying that the alignment has no gap (no N in the CIGAR). It is then considered a simple alignment. Note that a simple alignment can have ismatches or deletions (in the reference). In other words, a deletion, encoded with a D, is NOT considered a gap.

      So I still need calculate the gap regions from the CIGAR strings...

      And I have just found a solution this afternoon. BEDTools (http://code.google.com/p/bedtools/) could calculate the coverage for each nucleotide with/without intronic regions counted. Then the difference is the the coverage/depth in the intronic regions.

      Comment


      • #4
        tabulating gaps in bam files using bioconductor's GenomicRanges IS POSSIBLE

        Originally posted by sulicon View Post
        Thanks. It seems this package doesn't output the gap regions directly:
        ...
        I think you may misunderstand the manual. In any case, the key word in your comment is perhaps "directly"

        I've just blogged on using GenomicRanges for this exact purpose here. It works quite nicely.

        Originally posted by sulicon View Post
        And I have just found a solution this afternoon. BEDTools (http://code.google.com/p/bedtools/) could calculate the coverage for each nucleotide with/without intronic regions counted. Then the difference is the the coverage/depth in the intronic regions.
        Hmmm. I wondered about this approach too. At the very least it requires you to know beforehand the location of all your introns. Let us know how you fare if you take this route....
        Last edited by malcook; 06-09-2011, 08:31 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X