GSNAP gives Bus Error: 10

unicornich

Member

Join Date: Apr 2014
Posts: 10

GSNAP gives Bus Error: 10

08-11-2014, 04:31 AM

Hello everypony!

I am using GSNAP to map my RNA-seq paired-end reads to a reference genome. It used to run normally (a few months ago), but I needed to remap some stuff using the exact same command line as before and now GSNAP decided not to work anymore.
It starts the alignment normally and then after a short while gives out Bus error:10.
This is how it looks like:

Code:

gsnap -d oregonR_reference --quality-protocol illumina -N 1 -s /Volumes/Temp/Anna/reference/oregonR_reference/oregonR_reference.maps/dmel-all-transcript-r5.49-Parent1.iit -t 20 -A sam --split-output dmel_oregonR_t13_rep1 /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_1  /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_2 1 > log13r1.txt 2 > err13r1.txt
GSNAP version 2014-02-28 called with args: gsnap -D /Volumes/Temp/Anna/reference/ -d oregonR_reference --quality-protocol illumina -N 1 -s /Volumes/Temp/Anna/reference/oregonR_reference/oregonR_reference.maps/dmel-all-transcript-r5.49-Parent1.iit -t 20 -A sam --split-output dmel_oregonR_t13_rep1 /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_1 /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_2 1 2
Checking compiler assumptions for popcnt: 000041A7 clz=17 clz=0 popcount=7 
Checking compiler assumptions for SSE2: 000041A7 10D63AF1 xor=10D67B56
Checking compiler assumptions for SSE4.1: -89 -15 max=241
Novel splicing (-N) and known splicing (-s) both turned on => assume reads are RNA-Seq
Note: >1 sequence detected, so index files are being memory mapped.
  GSNAP can run slowly at first while the computer starts to accumulate
  pages from the hard disk into its cache.  To copy index files into RAM
  instead of memory mapping, use -B 3, -B 4, or -B 5, if you have enough RAM.
  For more speed, also try multiple threads (-t <int>), if you have multiple processors or cores.
Pre-loading compressed genome (oligos).....,...,...,...,...,...,...,...,..done (63,276,204 bytes, 15449 pages, 0.17 sec)
Pre-loading compressed genome (bits).....,...,...,...,...,...,...,...,..done (63,276,204 bytes, 15449 pages, 0.16 sec)
Pre-loading suffix array...............................................................................................................................,............................................................................................................................................done (674,946,152 bytes)
Looking for index files in directory /Volumes/Temp/Anna/reference//oregonR_reference
  Pointers file is oregonR_reference.ref12153bitpackptrs
  Offsets file is oregonR_reference.ref12153bitpackcomp
  Positions file is oregonR_reference.ref153positions
Offsets compression type: bitpack
Allocating memory for ref offset pointers, kmer 15, interval 3...done (134,217,736 bytes, 1.45 sec)
Allocating memory for ref offsets, kmer 15, interval 3...done (226,957,088 bytes, 2.48 sec)
Pre-loading ref positions, kmer 15, interval 3........................................................................................done (215,791,212 bytes, 52684 pages, 0.60 sec)
Reading splicing file /Volumes/Temp/Anna/reference/oregonR_reference/oregonR_reference.maps/dmel-all-transcript-r5.49-Parent1.iit locally...found donor and acceptor tags, so treating as splicesites file
splice distances present...37770 unique splicesites...
Non-standard nucleotide N near splice site YHet_Parent1:291284.  Discarding...
37769 splicesites are valid...splicetrie_obs has 37773 entries...splicetrie_max has 3858412 entries...done
GMAP modes: pairsearch, indel_knownsplice, terminal, improvement
Starting alignment
Bus error: 10

Does anybody know what could be wrong this time and how to fix it?

Thanks in advance!

Ana Marija

Tags: None

unicornich

Member

Join Date: Apr 2014

Posts: 10
- Share
- Tweet
#2

08-20-2014, 07:53 AM

So for those of you who ever come across this type of very uninformative error message here is how I have found the cause of it:

Since all my fastq files but one gave no error messages after mapping except for one fastq file I went on to a binary search through my problematic fastq file to find the problem because I assumed the problem is not in the mapper and all the standard fastq checks gave no clue of what was wrong.
So, the way I did this "binary search" is I had split my file(s) in half, reran mapping on both halves and whichever half gave an error, I split it again and redo the procedure until finally I got only two reads in my final fastq file.
After 24 iterations, I got a tiny fastq file (which was still giving me the Bus error: 10) containing 2 reads, one of which looked normal, and another which looked like a microsatellite read.
So I took the microsat read, remapped it by itself, and this time it gave a different error:

Code:

Paired-end accessions FCD20FCACXX:2:1302:15509:87068#ATCACGAT/2 and FCD20FCACXX:2:1302:15509:87068#ATCACGAT/1 do not match

When I remapped the other "normal" read, it mapped normally, with no errors.

So obviously, the microsat read was the one causing the problem.
I tried remapping it again but after removing the first nucleotide in one of the pair reads and it's quality so I made both read sequences complementary again. After doing this, the mapping worked perfectly, with no errors.

So there is a weird issue in GSNAP-2014-02-28 with complementarity of microsat paired reads.

What is the reason for it and why GSNAP gives two different error messages if the reads are mapped with other reads or by themselves, I have no idea.
But at least this could be a hint for someone else out there who has the same problem I had.

To half my fastq I just used

Code:

split -l n dmel_oregonR_t13_rep1_1 splitrep1_1 split -l n dmel_oregonR_t13_rep1_2 splitrep1_2 #the output is 2 files with aa and ab extension: splitrep1_1aa & splitrep1_1ab

where n is the number of lines of the fastq file divided by 2.

And that's it!

Cheers,

Ana Marija
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

GSNAP gives Bus Error: 10

Comment

Latest Articles

ad_right_rmr

News