Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating iit file for GSNAP

    gsnap can use information about known splice sites if you give it a .iit file with this format:


    >NM_004448.ERBB2.exon1 17:35110090..35110091 donor 6678
    >NM_004448.ERBB2.exon2 17:35116768..35116769 acceptor 6678
    >NM_004448.ERBB2.exon2 17:35116920..35116921 donor 1179
    >NM_004448.ERBB2.exon3 17:35118099..35118100 acceptor 1179
    >NM_004449.ERG.exon1 21:38955452..38955451 donor 783
    >NM_004449.ERG.exon2 21:38878740..38878739 acceptor 783
    >NM_004449.ERG.exon2 21:38878638..38878637 donor 360
    >NM_004449.ERG.exon3 21:38869542..38869541 acceptor 360

    The number at the end is the length of the intron and it is optional. In this example (which comes from the gmap/gsnap README file), each donor is getting spliced to the following acceptor. But I wonder if this is a requirement. If you have one exon that can get spliced to three different acceptors, does that mean that you have to list that exon three times, once followed by acceptor 1, another followed by acceptor two and the third followed by acceptor 3? Or can I just list all the donor sites and all the acceptor sites without regard to how they are matched up?

    Thank you.

    Eric

  • #2
    It would be nice to have a PSL to IIT program (that answered your questions by providing pristine output). Anybody know where one is?

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 11:49 AM
    0 responses
    15 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    61 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X