Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gsnap complains about iit file

    I am trying to run gsnap with RNA seq data. It starts to look like it's working, but then it dies with this complaint:

    IIT file splice_sites_102311_1.iit appears to have an offset that is too large (offset after sigmas 6929586440, filesize 5395915 ...

    Does anyone know what I am doing wrong? The lines in my iit file look like this:


    >FBgn0034742_CG4294 2R:18492684..18492683 acceptor
    >FBgn0034742_CG4294 2R:18490231..18490230 donor
    >FBgn0034742_CG4294 2R:18490174..18490173 acceptor
    >FBgn0034742_CG4294 2R:18489794..18489793 donor
    >FBgn0034742_CG4294 2R:18489732..18489731 acceptor
    >FBgn0034742_CG4294 2R:18489453..18489452 donor
    >FBgn0034742_CG4294 2R:18489321..18489320 acceptor
    >FBgn0034742_CG4294 2R:18489035..18489034 donor
    >FBgn0034742_CG4294 2R:18488876..18488875 acceptor
    >FBgn0034742_CG4294 2R:18488718..18488717 donor
    >FBgn0034742_CG4294 2R:18488653..18488652 acceptor
    >FBgn0034742_CG4294 2R:18488870..18488869 acceptor
    >FBgn0034742_CG4294 2R:18488718..18488717 donor
    >FBgn0034742_CG4294 2R:18488653..18488652 acceptor
    >FBgn0034742_CG4294 2R:18488301..18488300 donor
    >FBgn0032640_Sgt 2L:17473468..17473469 acceptor
    >FBgn0032640_Sgt 2L:17471852..17471853 donor
    >FBgn0032640_Sgt 2L:17473560..17473561 acceptor
    >FBgn0259204_CG42308 X:6905441..6905442 donor
    >FBgn0259204_CG42308 X:6905526..6905527 acceptor
    >FBgn0259204_CG42308 X:6905789..6905790 donor
    >FBgn0259204_CG42308 X:6905850..6905851 acceptor
    >FBgn0259204_CG42308 X:6905522..6905523 acceptor
    >FBgn0259204_CG42308 X:6905789..6905790 donor

    Thank you.

    Eric

  • #2
    For what it's worth, this was my command:

    gsnap --gunzip -d from_Flybase_5p41 -D /home/efoss/gene_databases/GMAP_GSNAP_db -t 10 --format=sam -N 1 -s splice_sites_102311_1.iit D09NJACXX_s8_1_illumina12index_7_SL7776.fastq.gz D09NJACXX_s8_2_illumina12index_7_SL7776.fastq.gz > D09NJACXX_s8_illumina12index_7_SL7776.sam

    Comment


    • #3
      Hi efoss

      I'm having the same error... did you find a solution?

      I'm using:

      gsnap -d hg19_12k-mer ../SJ.15M.mut -N 1 -B 5 -t 8 -O -m 0.06 -A sam -k 12 -s ~/db/transcriptome/hg19/Gene_models/gencode.v9.annotation.splice_sites_gsnap

      I tried to run GSNAP with a smaller iit file, but the result is the same


      cheers,
      Last edited by geparada; 12-23-2011, 10:49 AM.

      Comment


      • #4
        Originally posted by geparada View Post
        Hi efoss

        I'm having the same error... did you find a solution?

        I'm using:

        gsnap -d hg19_12k-mer ../SJ.15M.mut -N 1 -B 5 -t 8 -O -m 0.06 -A sam -k 12 -s ~/db/transcriptome/hg19/Gene_models/gencode.v9.annotation.splice_sites_gsnap

        I tried to run GSNAP with a smaller iit file, but the result is the same


        cheers,

        Hi Geparada,

        Yes - I figured out what I was doing wrong. My iit file was just like I showed above, and that was my mistake. Apparently an iit file is not a text file but rather a binary file. I used gsnap utilities to convert this text file to an iit file like this:

        cat splice_sites.txt | iit_store -o splice_sites

        Then things worked.

        Good luck!

        Best wishes,

        Eric

        Comment


        • #5
          I din't know that the iit files are binary.
          Now I worked!

          Cheers!
          Last edited by geparada; 12-29-2011, 12:45 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 11:49 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X