Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GSNAP Output

    I have been able to run GSNAP on a 8GB, 8 cpu machine using the following command (executed from the GSNAP bin directory):

    Code:
    ./gsnap --format=sam --read-group-id=30D --batch=4 --nthreads=8 --read-group-name=Condition1 --input-buffer-size=5000 --max-mismatches=4 --kmer=15 --dir=/home/myPath/gsnap/gmapdb/mm9 --db=mm9 --novelsplicing=1 -s mm9.splicesites.iit /home/myPath/gsnap/30D_217_12WKS_R1.fastq /home/myPath/gsnap/30D_217_12WKS_R2.fastq
    As it ran, it streamed what appeared to be output of aligned/mapped and annotated reads.

    However, I am unable to find any output file (SAM or otherwise) generated by the process. The process appeared to end without error, reporting the reads/sec performance of GSNAP.

    What do I need to do to either find the output file (already searched for *.sam, and *.* within past day, everywhere), or specify where the output should be written?
    Best Regards,

    Paul Bergmann

  • #2
    Umm, I might be wrong since I have't run your configuration, but shouldn't redirecting the stdout output into a file be all you need? SAM is a line-based ASCII format, probably that's exactly the running output you're seeing. If that is the case just append
    Code:
    > mysamfile.sam
    at the end of the command and you should be good. Btw, having a bioinformatician in the department helps, too

    Comment


    • #3
      Hi,

      I am the developer of GSNAP. The program as you have run it will send the output to stdout (which is the stream of alignments you saw). You should therefore send the output into a file, as suggested above, using the "> outputfile" convention for Unix.

      Alternatively, there is a flag called --split-output that will send output to individual files, according to how they aligned (i.e., concordant unique, halfmapping multiple, and so on).

      Regards,

      Thomas Wu

      Comment


      • #4
        gsnap runn error?

        Can any one solve this error:

        Code:
        gsnap --split-output bcsf -A sam -B 4 -t 8 -k 15 --read-group-id=SN388 --read-group-name=bcsf --input-buffer-size=3000 -D /data/gmap/hg19 -d hg19 -N 1 -s /data/gmap/hg19/hg19.maps/hg19.refGene.splicesitesfile.iit /seq/L005.ft.R1_1.fastq /seq/L005.ft.R2_2.fastq
        Error message:

        Novel splicing (-N) and known splicing (-s) both turned on => assume reads are RNA-Seq
        Allocating memory for compressed genome...done (1,160,885,244 bytes, 1.79 sec)
        Looking for index files in directory /data/gmap/hg19
        Gammaptrs file is hg19.ref12153gammaptrs
        Offsetscomp file is hg19.ref12153offsetscomp
        Positions file is hg19.ref12153positions
        Allocating memory for ref gammaptrs, kmer 15, interval 3...done (67,108,868 bytes, 0.11 sec)
        Allocating memory for ref offsets, kmer 15, interval 3...done (349,277,996 bytes, 0.53 sec)
        Allocating memory for ref positions, kmer 15, interval 3...done (3,815,118,780 bytes, 5.79 sec)
        Reading splicing file /data/gmap/hg19/hg19.maps/hg19.refGene.splicesitesfile.iit locally...found donor and acceptor tags, so treating as splicesites file
        splice distances present...390347 unique splicesites...390347 valid splicesites...splicetrie_obs has 644549 entries...splicetrie_max has 48518627 entries...done
        GMAP modes: pairsearch, terminal, improvement
        Starting alignment
        Illegal instruction

        Comment


        • #5
          Illegal instruction in GSNAP

          The illegal instruction error is occurring because the environment where you built GSNAP is different from the environment where you are running it. When you build GSNAP, the configure command sees if it can compile and run the program with the flag "-mpopcnt", which uses a machine-level instruction to count the number of bits in a word and results in a minor increase in speed. However, if you build on a machine where -mpopcnt works and then you move the program to a machine where the built-in popcount instruction does not exist, the machine will complain about an illegal instruction.

          The solution is to rebuild GSNAP, this time running configure with the flag --disable-popcnt, which will not compile the programs with the "-mpopcnt" flag, even if it works in your build environment.

          If you have any further questions or problems, please feel free to send email directly to me at [email protected].

          Regards,

          Thomas Wu

          Comment


          • #6
            @Twu; I recompiled gmap in the current computer its working fine so far...Thanks!!!

            Comment


            • #7
              Hi Thomas (@twu),

              We have been trying to use GSNAP for alignment, followed by cufflinks for downstream analysis on RNA-seq data. When we look at the SAM file produced by GSNAP (alignment done with splicing on), the 'XS:A: ' tag should have either 'XS:A:+' or 'XS:A:-' to represent strand information for splicing reads; however, we also notice numerous alignment entries to have an 'XS:A:?'. These entries are not recognized by cufflinks as it expects either a '+' or a '-' in the XS:A tag. Any clues on what we should do to make the SAM file compatible with cufflinks? Thanks

              Comment


              • #8
                I think new version might have rectified 'XS:A:?'.
                check here for the updated version

                "Eliminated reporting of XS:A:? in cases of long deletions being considered incorrectly as introns "

                Comment


                • #9
                  @gpcr,
                  Thanks for that pointer....it turns out we had the path to the older version in the .bash_profile.
                  We will run GSNAP using the latest version to see if the XS tag errors are resolved.

                  Thanks

                  Comment


                  • #10
                    Also, I am about to release a new version later today (version 2012-03-20). I believe this new version should completely eliminate all occurrences of "XS:A:?".

                    Comment


                    • #11
                      Originally posted by twu View Post
                      Also, I am about to release a new version later today (version 2012-03-20). I believe this new version should completely eliminate all occurrences of "XS:A:?".
                      Hi Thomas,

                      Thanks for your response. I checked the GSNAP webpage, and yes, there is a new version indeed (2012-03-20). We will use this version and see if the 'XS:A:?' related errors are resolved.

                      Thanks

                      Comment


                      • #12
                        Parse error in SAM file from GSNAP

                        I used the latest version of GSNAP with default flags to produce a SAM file. When trying to convert SAM to BAM for input to cufflinks, I get this error:

                        'Parse error in line 19446415: missing colon in auxiliary data'
                        Aborted

                        Then, when I look at the read at that line number, I get this:
                        >$sed -n 19446415p MC01.sam

                        HTML Code:
                        3PO5HQ1:154:D0FV2ACXX:2:1104:9525:98311	387	15	82664506	1	100M1S	=	82664506	0	CTTAGATCTTATTCATCAGCCTGCTGAACAGTTCCTTTTTCAGAGACATAGATACCATCCAAAAATTTCCTGATATCCTTGTTTTTAACTGTTGTGGCTTT	CCCFFFFFHHHHHJJJJJJJJJJJJJJIJJJIIIJJJJJJIJIIJJJJJJJJJJJJJIJIJJIGIJJJHIJJJJJJEHIHHHHHHFFFFFEEEEEDDDDD3MD:Z:100	NH:i:4	HI:i:3	NM:i:0	SM:i:1	XQ:i:40	X2:i:40
                        What is wrong with this read entry in the SAM file that is throwing the error above?

                        Thanks
                        Last edited by ParthavJailwala; 03-22-2012, 11:38 AM.

                        Comment


                        • #13
                          hi,

                          what does -n and -Q option mean in GSNAP, does -n mean the number of places that a read maps to a genome? I would like to write only the uniquely mapped reads to the output file.

                          Comment


                          • #14
                            Originally posted by ppoudel View Post
                            hi,

                            what does -n and -Q option mean in GSNAP, does -n mean the number of places that a read maps to a genome? I would like to write only the uniquely mapped reads to the output file.
                            Hi ppoudel,

                            As per the GSNAP README, there is a -n flag that the user can specify for the number of hits that GSNAP should show. I did not find the -Q option, but I guess you imply the "--quiet-if-excessive" which is linked to the -n option, as per this paragraph from the README:
                            Hope that helps.

                            Thanks
                            Parthav

                            ------------------------------------------------------------------------------------
                            After the query line, each of the genomic hits is shown, up to the
                            '-n' parameter. If too many hits were found (more than the '-n'
                            parameter), the behavior depends on whether the "--quiet-if-excessive"
                            flag is given to GSNAP. If not, then the first n hits will be printed
                            and the rest will not be printed. If the "--quiet-if-excessive" flag
                            is given to GSNAP, then no hits will be printed if the number exceeds
                            n.

                            Each of the genomic hits contains one or more alignment segments,
                            which is a region of continuous one-to-one correspondence (matches or
                            mismatches) between the query and the genome. Multiple segments occur
                            when the alignment contains an insertion, deletion, or splice. The
                            first segment is marked by a space (" ") at the beginning of the line,
                            while the second and following segments are marked by a comma (",") at
                            the beginning of the line. (In the current implementation of GSNAP
                            that allows only a single indel or splice, the number of segments is
                            at most two.)
                            ---------------------------------------------------------------------

                            Comment


                            • #15
                              Re: GSNAP output

                              Originally posted by ppoudel View Post
                              hi,

                              what does -n and -Q option mean in GSNAP, does -n mean the number of places that a read maps to a genome? I would like to write only the uniquely mapped reads to the output file.
                              Hi ppoudel,

                              As per the GSNAP README, there is a -n flag that the user can specify for the number of hits that GSNAP should show. I did not find the -Q option, but I guess you imply the "--quiet-if-excessive" which is linked to the -n option, as per this paragraph from the README:
                              Hope that helps.

                              Thanks
                              Parthav

                              ------------------------------------------------------------------------------------
                              After the query line, each of the genomic hits is shown, up to the
                              '-n' parameter. If too many hits were found (more than the '-n'
                              parameter), the behavior depends on whether the "--quiet-if-excessive"
                              flag is given to GSNAP. If not, then the first n hits will be printed
                              and the rest will not be printed. If the "--quiet-if-excessive" flag
                              is given to GSNAP, then no hits will be printed if the number exceeds
                              n.

                              Each of the genomic hits contains one or more alignment segments,
                              which is a region of continuous one-to-one correspondence (matches or
                              mismatches) between the query and the genome. Multiple segments occur
                              when the alignment contains an insertion, deletion, or splice. The
                              first segment is marked by a space (" ") at the beginning of the line,
                              while the second and following segments are marked by a comma (",") at
                              the beginning of the line. (In the current implementation of GSNAP
                              that allows only a single indel or splice, the number of segments is
                              at most two.)
                              ---------------------------------------------------------------------

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              31 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X