I've just tried running GSNAP version 2014-12-29 to predict novel splice junctions using -N. My input files are paired-end fast files, 75t.
gsnap -m 5 --gunzip -A sam -N 1 -D gmapdb/ -d human $FASTQ_DIR/A549.nucleus.polyA/test.1.fastq.gz $FASTQ_DIR/A549.nucleus.polyA/test.2.fastq.gz > NJ_test.sam
I set -m to 5 following the suggestions in help:
-m, --max-mismatches=FLOAT Maximum number of mismatches allowed (if not specified, then defaults to the ultrafast level of (readlength+index_interval-1)/kmer - 2))...Otherwise, treated as an integral number of mismatches (including indel and splicing penalties). For RNA-Seq, you may need to increase this value slightly to align reads extending past the ends of an exon.
However, it looks like I get *a lot* of mismatches in my mapping, c.f. the attached screenshot from IGV. The reads are consistent enough in the sense that they're assembled into a longer stretch of RNA, but it doesn't align at all to the suggested location in the genome.
I suspect I'm missing something obvious here...
gsnap -m 5 --gunzip -A sam -N 1 -D gmapdb/ -d human $FASTQ_DIR/A549.nucleus.polyA/test.1.fastq.gz $FASTQ_DIR/A549.nucleus.polyA/test.2.fastq.gz > NJ_test.sam
I set -m to 5 following the suggestions in help:
-m, --max-mismatches=FLOAT Maximum number of mismatches allowed (if not specified, then defaults to the ultrafast level of (readlength+index_interval-1)/kmer - 2))...Otherwise, treated as an integral number of mismatches (including indel and splicing penalties). For RNA-Seq, you may need to increase this value slightly to align reads extending past the ends of an exon.
However, it looks like I get *a lot* of mismatches in my mapping, c.f. the attached screenshot from IGV. The reads are consistent enough in the sense that they're assembled into a longer stretch of RNA, but it doesn't align at all to the suggested location in the genome.
I suspect I'm missing something obvious here...
Comment