Seqanswers Leaderboard Ad

**atma_weapon** · 12-04-2015, 02:48 PM

Originally posted by hzhou View Post

I've just tried running GSNAP version 2014-12-29 to predict novel splice junctions using -N. My input files are paired-end fast files, 75t.

gsnap -m 5 --gunzip -A sam -N 1 -D gmapdb/ -d human $FASTQ_DIR/A549.nucleus.polyA/test.1.fastq.gz $FASTQ_DIR/A549.nucleus.polyA/test.2.fastq.gz > NJ_test.sam

I set -m to 5 following the suggestions in help:

-m, --max-mismatches=FLOAT Maximum number of mismatches allowed (if not specified, then defaults to the ultrafast level of (readlength+index_interval-1)/kmer - 2))...Otherwise, treated as an integral number of mismatches (including indel and splicing penalties). For RNA-Seq, you may need to increase this value slightly to align reads extending past the ends of an exon.

However, it looks like I get *a lot* of mismatches in my mapping, c.f. the attached screenshot from IGV. The reads are consistent enough in the sense that they're assembled into a longer stretch of RNA, but it doesn't align at all to the suggested location in the genome.

I suspect I'm missing something obvious here...

I have the same problem

**Brian Bushnell** · 12-04-2015, 02:55 PM

The image looks like the coordinates are wrong, or the reference is wrong. For example, the data was mapped against reference A, and is being displayed in IGV using reference B. There's no way the reads would actually align like that (with ~75% of the bases mismatching). Are you sure you're using the same reference?

**atma_weapon** · 12-04-2015, 03:01 PM

Originally posted by Brian Bushnell View Post

The image looks like the coordinates are wrong, or the reference is wrong. For example, the data was mapped against reference A, and is being displayed in IGV using reference B. There's no way the reads would actually align like that (with ~75% of the bases mismatching). Are you sure you're using the same reference?

I ran gsnap with default parameters for mismatched (my reads are 75bp) and it should be((readlength+2)/kmer - 2) mismatches
But sometimes I get up to 15 mismatches per reads. They are usually lumped all in the same place

Attached Files

Screenshot from 2015-12-04 14:59:42.png (3.3 KB, 39 views)

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

GSNAP: (too many?) mismatches in -N mode

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News