Seqanswers Leaderboard Ad

**twu** · 01-10-2013, 09:32 AM

Meaning of -n and -Q

The -n flag limits the number of results reported, not the number of alignment results. Suppose you specify "-n 1", and GSNAP finds no alignments to the genome. Then the result would contain 0 hits, obviously. Likewise, if GSNAP finds 1 hit in the genome, the result would contain that one hit. But if GSNAP found multiple hits, then the "-n 1" flag would constrain the results to a single one.

To find uniquely matching hits, you would need to add the "-Q" or "--quiet-if-excessive" flag. With "-n 1" and that flag, if GSNAP found multiple hits, then it would pretend that it really found no (unique) hits to the genome, and report no hits.

I hope that makes sense. If you have other questions, you can also join the gsnap-users mailing list at EBI.

Tom

**ppoudel** · 01-11-2013, 02:30 AM

Thanks Parthav and Thomas!!!

I have ~5 GB (each) of paired end reads from RNA seq experiment and I would like use GSNAP in HPC farm, however, the farm allows 12 hours maximum for a job. I tried to use 1 nodes and 16 process , time=12 hours and memory 64000. But the job crashed. Is there any way I can run this in HPC within 12 hours.

**twu** · 01-11-2013, 03:36 PM

Running GSNAP on a farm

For HPC or Linux farms, it is probably best if you run GSNAP on several nodes for a given input file. You can do this easily with the -q or --part flag, which breaks up the input into parts. For example, if you want to spread the GSNAP computation over 50 nodes, you can run GSNAP like this:

gsnap -q 00/50 <fastq file> > output.00
gsnap -q 01/50 <fastq file> > output.01
gsnap -q 02/50 <fastq file> > output.02
...
gsnap -q 49/50 <fastq file> > output.49

The meaning of "-q 02/50" is that in every set of 50 input reads, compute on the second one. If you submit each of the above jobs to a different node on your cluster, you should theoretically see a 50x speedup. However, your output will be spread among 50 different files.

Note that I am working on making GSNAP a bit faster, but adding an initial alignment step that computes the easy alignments very quickly, but falls back upon the existing algorithm to harder alignments.

Regards,

Tom

**ParthavJailwala** · 01-12-2013, 02:00 PM

Tom,

Thanks for your input on how to break up the input fastq file into parts for using multiple HPC nodes, using the "-q" flag.
In case of a gsnap mapping run involving paired-end fastq reads, I am wondering how the '-q' works. How would one specify picking the read-pairs from the R1 and R2 files?

Thanks
Parthav

Originally posted by twu View Post

For HPC or Linux farms, it is probably best if you run GSNAP on several nodes for a given input file. You can do this easily with the -q or --part flag, which breaks up the input into parts. For example, if you want to spread the GSNAP computation over 50 nodes, you can run GSNAP like this:

gsnap -q 00/50 <fastq file> > output.00
gsnap -q 01/50 <fastq file> > output.01
gsnap -q 02/50 <fastq file> > output.02
...
gsnap -q 49/50 <fastq file> > output.49

The meaning of "-q 02/50" is that in every set of 50 input reads, compute on the second one. If you submit each of the above jobs to a different node on your cluster, you should theoretically see a 50x speedup. However, your output will be spread among 50 different files.

Note that I am working on making GSNAP a bit faster, but adding an initial alignment step that computes the easy alignments very quickly, but falls back upon the existing algorithm to harder alignments.

Regards,

Tom

**twu** · 01-16-2013, 03:46 PM

-q flag and paired-end reads

If you have paired-end reads (by providing two files to GSNAP), then the -q flag takes the correct pairs from each of the files. That's the only thing that makes sense.

For example, -q 2/50 takes the second read out of each set of 50, from each of the two files.

Tom

**ppoudel** · 01-18-2013, 02:01 AM

Thanks Tom.

**amolkolte** · 01-24-2013, 02:42 AM

I have been running GSNAP as a part of trinity pipeline.

To begin with, I have paired ends reads of 7GB of size for each read file.
The pipeline has executed the following code,

Code:

gsnap -d target.gmap -D . -A sam -N 1 -w 10000 -n 20 -t 45 /home/amol/trimmed_datasets/AP_treated/R33_APT_s_3_1_trimmed.fastq /home/amol/trimmed_datasets/AP_treated/R33_APT_s_3_2_trimmed.fastq

More that 30 hours have been passed and the script is showing a message "Starting alignment". I can see (using top) that, gsnap is taking resources but at the same time its showing the process status as 'sleeping'

Can anyone please explain.

Thanks

**amolkolte** · 01-29-2013, 03:07 AM

After running for 6 days, gsnap finally generated an 34 GB sam file.

Alternatively, I noticed at later point that you can monitor the output file size under your 'gsnap_out' directory and estimate where the alignment have reached.

Code:

du -h gsnap_out/gsnap.sam

File size would increase as the alignment proceeds and you can make sure that the program is running.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News