Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GSNAP gives Bus Error: 10

    Hello everypony!

    I am using GSNAP to map my RNA-seq paired-end reads to a reference genome. It used to run normally (a few months ago), but I needed to remap some stuff using the exact same command line as before and now GSNAP decided not to work anymore.
    It starts the alignment normally and then after a short while gives out Bus error:10.
    This is how it looks like:

    Code:
    gsnap -d oregonR_reference --quality-protocol illumina -N 1 -s /Volumes/Temp/Anna/reference/oregonR_reference/oregonR_reference.maps/dmel-all-transcript-r5.49-Parent1.iit -t 20 -A sam --split-output dmel_oregonR_t13_rep1 /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_1  /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_2 1 > log13r1.txt 2 > err13r1.txt
    GSNAP version 2014-02-28 called with args: gsnap -D /Volumes/Temp/Anna/reference/ -d oregonR_reference --quality-protocol illumina -N 1 -s /Volumes/Temp/Anna/reference/oregonR_reference/oregonR_reference.maps/dmel-all-transcript-r5.49-Parent1.iit -t 20 -A sam --split-output dmel_oregonR_t13_rep1 /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_1 /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_2 1 2
    Checking compiler assumptions for popcnt: 000041A7 clz=17 clz=0 popcount=7 
    Checking compiler assumptions for SSE2: 000041A7 10D63AF1 xor=10D67B56
    Checking compiler assumptions for SSE4.1: -89 -15 max=241
    Novel splicing (-N) and known splicing (-s) both turned on => assume reads are RNA-Seq
    Note: >1 sequence detected, so index files are being memory mapped.
      GSNAP can run slowly at first while the computer starts to accumulate
      pages from the hard disk into its cache.  To copy index files into RAM
      instead of memory mapping, use -B 3, -B 4, or -B 5, if you have enough RAM.
      For more speed, also try multiple threads (-t <int>), if you have multiple processors or cores.
    Pre-loading compressed genome (oligos).....,...,...,...,...,...,...,...,..done (63,276,204 bytes, 15449 pages, 0.17 sec)
    Pre-loading compressed genome (bits).....,...,...,...,...,...,...,...,..done (63,276,204 bytes, 15449 pages, 0.16 sec)
    Pre-loading suffix array...............................................................................................................................,............................................................................................................................................done (674,946,152 bytes)
    Looking for index files in directory /Volumes/Temp/Anna/reference//oregonR_reference
      Pointers file is oregonR_reference.ref12153bitpackptrs
      Offsets file is oregonR_reference.ref12153bitpackcomp
      Positions file is oregonR_reference.ref153positions
    Offsets compression type: bitpack
    Allocating memory for ref offset pointers, kmer 15, interval 3...done (134,217,736 bytes, 1.45 sec)
    Allocating memory for ref offsets, kmer 15, interval 3...done (226,957,088 bytes, 2.48 sec)
    Pre-loading ref positions, kmer 15, interval 3........................................................................................done (215,791,212 bytes, 52684 pages, 0.60 sec)
    Reading splicing file /Volumes/Temp/Anna/reference/oregonR_reference/oregonR_reference.maps/dmel-all-transcript-r5.49-Parent1.iit locally...found donor and acceptor tags, so treating as splicesites file
    splice distances present...37770 unique splicesites...
    Non-standard nucleotide N near splice site YHet_Parent1:291284.  Discarding...
    37769 splicesites are valid...splicetrie_obs has 37773 entries...splicetrie_max has 3858412 entries...done
    GMAP modes: pairsearch, indel_knownsplice, terminal, improvement
    Starting alignment
    Bus error: 10
    Does anybody know what could be wrong this time and how to fix it?

    Thanks in advance!

    Ana Marija

  • #2
    So for those of you who ever come across this type of very uninformative error message here is how I have found the cause of it:

    Since all my fastq files but one gave no error messages after mapping except for one fastq file I went on to a binary search through my problematic fastq file to find the problem because I assumed the problem is not in the mapper and all the standard fastq checks gave no clue of what was wrong.
    So, the way I did this "binary search" is I had split my file(s) in half, reran mapping on both halves and whichever half gave an error, I split it again and redo the procedure until finally I got only two reads in my final fastq file.
    After 24 iterations, I got a tiny fastq file (which was still giving me the Bus error: 10) containing 2 reads, one of which looked normal, and another which looked like a microsatellite read.
    So I took the microsat read, remapped it by itself, and this time it gave a different error:
    Code:
    Paired-end accessions FCD20FCACXX:2:1302:15509:87068#ATCACGAT/2 and FCD20FCACXX:2:1302:15509:87068#ATCACGAT/1 do not match
    When I remapped the other "normal" read, it mapped normally, with no errors.

    So obviously, the microsat read was the one causing the problem.
    I tried remapping it again but after removing the first nucleotide in one of the pair reads and it's quality so I made both read sequences complementary again. After doing this, the mapping worked perfectly, with no errors.

    So there is a weird issue in GSNAP-2014-02-28 with complementarity of microsat paired reads.

    What is the reason for it and why GSNAP gives two different error messages if the reads are mapped with other reads or by themselves, I have no idea.
    But at least this could be a hint for someone else out there who has the same problem I had.

    To half my fastq I just used
    Code:
    split -l n dmel_oregonR_t13_rep1_1 splitrep1_1 
    split -l n dmel_oregonR_t13_rep1_2 splitrep1_2
    #the output is 2 files with aa and ab extension: splitrep1_1aa & splitrep1_1ab
    where n is the number of lines of the fastq file divided by 2.

    And that's it!

    Cheers,

    Ana Marija

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 08:47 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    59 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    54 views
    0 likes
    Last Post seqadmin  
    Working...
    X