Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GMAP-GSNAP Run Facing Issues

    Hello All,

    I am writing to understand if my installation of Gmap is accurate or not, since I am facing issues with its run. I have tested both the 10-30-2017 and 11-15-2017 version of Gmap, and in both cases face identical issues. I have tested on the local machine and on an LSF cluster and in both cases the runs have failed due to memory issues. I'll just talk about the local machine failures here. I have attached some output and screenshots here, but if anyone needs more information to troubleshoot please let me know.

    LOCAL MACHINE FAILURES: (See "System Configuration.txt" for machine cpu and mem configuration)

    Command:
    Code:
    nohup /gsap/tools/bin/gmap -d B73v4_genome_masked -D /anno/sanyalab/GMAP/GMAP-DB/ -f gff3_gene -F -t 6 -n 10 -K 50000 --min-identity=0.95 --min-trimmed-coverage=0.90 /anno/foo/PUB_DATASETS/ZEA/Zea_mays_EST.fasta.clean > /anno/foo/GMAP/RESULT/Zea_mays_EST_B73v4.gff3 2>nohup.out &
    Comments: I built the maize B73v4_genome_masked gmap database using the 10-30-2017 version of Gmap. The evidence set is all ESTs belonging to genus Zea. I have cleaned these ESTs using seqclean, prior to running the Gmap command.

    Results: An incomplete gff3 file gets built because local memory limit is reached (local memory is 128GB) and Gmap job is Killed (See screenshot just before the job is killed "Capture1.png"). The B73 genome is 2.0 gigs in size and the evidence set of EST is 916 MB in size. The memory consumption is very high given the size of the EST file. Attached is the error file ("nohup.out"). The last lines from the error file indicates the execution just stops due to memory maximum getting hit. I tried the same command with the "11-15-2017" version and got the same result (did not retain result file). I rebuilt the gmap database with the "11-15-2017" version thinking that there might be version to version differences, but got the same fail result.

    I repeated the same command with an earlier Gmap version "2014-06-10". This time it passed. The error file is "nohup3.out". The last lines of the file suggests that it was a successful run and I got a 1.7G results file. Earlier the file size was 69M.

    Thinking that I might have better luck running cDNA with the new Gmap version, I ran the following command locally

    Command:
    Code:
    nohup /gsap/tools/gmap-2017-11-15/bin/gmap -d B73v4_genome_masked -D /anno/sanyalab/GMAP/GMAP-DB/ -f gff3_gene -F -t 14 -n 10 -K 50000 --min-identity=0.95 --min-trimmed-coverage=0.90 /anno/foo/PUB_DATASETS/PASA/ZEA/Zea_mays_cDNA.fasta > /anno/foo/GMAP/RESULT/Zea_mays_cDNA_B73v4_gmap2.gff3 2>nohup7.out &
    Comments: I built the maize B73v4_genome_masked gmap database using the 11-15-2017 version of Gmap. The evidence set is all cDNAs belonging to genus Zea. No sequence cleaning was done prior to running the command.

    Result: The run was successful. Attached error file "nohup7.out"

    I know GMAP can handle EST data, but looking at the results I am confused. I am unsure whether the program has a memory management issue or something on my side is cracking. Attached is the "config.site" I am using. Please advice what I need to do.

    Thank you
    Abhijit
    Attached Files

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin


    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
39 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
41 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
35 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X