Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samscope: A new OpenGL based SAM/BAM visualization tool

    There are roughly a zillion ways to visualize SAM/BAM data, from homebrewed perl scripts to GUI-button laden programs like IGV, and to this zillion I'm adding yet another: Samscope. Samscope was born out of dissatisfaction with the awkwardness of other visualization software, and the desire for something that can pan and zoom as fast and naturally as we've become accustomed to with things like Google maps, and hopefully help to answer (and inspire) questions from curious biologists who see something unusual and go "what's causing that??". The key features that separate it from other visualization software are:
    • automatic generation of aggregate feature layers (like coverage, polarity (aka strand bias. very helpful for ChIP-Seq!), mutations, indels, minor allele frequency, etc).
    • visualization of a full distribution of values even while zoomed way out (see the 'c' key).
    • ability to view these layers simultaneously overlaid in different colors or in multiple synchronized windows.
    • include data from separate data files as separate but simultaneously viewable layers (again: different colors or multiple windows).
    • intuitive read inspection, closely integrated with main viewer.
    • paired end classification and visualization.
    • simple consensus calling, handy for mutation search.
    • batch operation for automated image generation.
    • tuning-free operation on any SAM/BAM file with headers (no need for pre-defining chromosome names/sizes/etc., or even a reference FASTA file!).
    • support for annotation formats like GTF/GFF, WIG/BED, etc.


    To be fair, IGV can do a lot of these if you work at it, but I think Samscope makes a lot of these much easier, does some of them much much better, and in general runs much faster.

    I'm sure I've introduced some new awkwardness, but so far it's been quite useful for the project I've used it on, and has been accepted by Bioinformatics, and has been downloaded a lot, so somebody's using it (or at least trying to)! If anyone here in the seqanswers community has questions, or feature requests, please feel free to use this thread.

    There's binary (and source) packages published for Ubuntu and Debian, and it's known to work on CentOS and Fedora (sorry, no .rpms yet). If you're daring, I'll bet it would work in Mac OS X and Cygwin if you can satisfy the library dependencies, but this is untested territory (patches/pull requests to make this easier are of course welcome!).

    Download samscope for free. A lightweight OpenGL SAM/BAM viewer. A lightweight OpenGL based interactive SAM/BAM viewer. Quickly and easily generate aggregate statistics from SAM/BAM files like coverage, polarity, and minor allele frequencies, then scroll and explore freely with a simple mouse based interface.


    If this isn't at all helpful or interesting to you, please forgive this post blatantly hawking my own GPL wares
    Last edited by Crypticfortune; 04-04-2012, 02:37 AM. Reason: clarity (again)

  • #2
    Quoting the abstract,
    Existing SAM visualization tools like “samtools tview” (Li et al., 2009) are limited to a small region of the genome, and tools like Tablet (Milne et al., 2010) are limited to a relatively small number of reads and may fail outright on large data sets
    Have you got any actual examples of large data sets where Tablet or 'samtools tview' fail outright? You don't actually talk about this in the main text at all.

    Disclaimer: Although I don't work on Tablet directly, I have and do contribute ideas and feedback to the core team, and am an author on the new Tablet paper Milne et al. (2012) in BiB.
    Last edited by maubp; 04-04-2012, 03:21 AM. Reason: Added disclaimer as a potential conflict of interest

    Comment


    • #3
      Originally posted by maubp View Post
      Quoting the abstract,

      Have you got any actual examples of large data sets where Tablet or 'samtools tview' fail outright? You don't actually talk about this in the main text at all.
      Sorry for being brief on that. The format for bioinformatics application notes is limited to 2 pages, so there's really no space to go into much detail, but yes, some RNA-Seq data with pockets of extremely high coverage from a project our lab was working on last year was causing Tablet to crash. I'll see if I can find/release at least part of that file, if you're curious.

      Basically though, at least when we were testing, Tablet appeared to attempt to load all reads in the viewing area into memory, resulting in extreme thrashing to disk and becoming unresponsive, sometimes crashing. For viewers that try to provide a stable view of all the reads in a given region, it's a very hard problem to solve in desktop memory, and one that Samscope seeks to sidestep (by focusing on aggregate statistics) rather than solve directly. So that line is really directed at the class of "read drawing visualizations", and is using Tablet as a representative example, to point out that trying to visualize some 100k reads in one window the "read by read" approach has limits.

      Comment


      • #4
        This looks very interesting. I have a strange setup and it looks like it will take a while to corral and compile the correct libraries. Regardless, I do want to ask how do you deal with the Operating System independence? Is it all done using the GL libraries?

        Comment


        • #5
          Originally posted by Richard Finney View Post
          I do want to ask how do you deal with the Operating System independence? Is it all done using the GL libraries?
          Yes, using OpenGL libraries is critical to getting fast hardware accelerated drawing performance. OpenGL provides a standardized API which lots of different vendors provide shared libraries for, so we can write code for generic OpenGL spec and it's promised to produce the same result regardless of the hardware/operating system running it. This is the basic theme for writing any cross platform software and isn't really specific to OpenGL (Samscope also uses standard C++ libraries (aka "the STL") which follow a similar kind of vendor independent specification system). The other libraries Samscope uses are also fairly system independent (Boost, GLUT, zlib, and libdevil are all written in a cross platform fashion and generally strive fairly hard to achieve good performance on all of their target platforms).

          Comment


          • #6
            Hard to get this working. For building from source, scons is not supported on my servers so had to install that first. Then ran into compile errors. Probably related to OpenGL. So ... I decided to go to a Ubuntu system and just install the binary. But when I run I get a segfault.

            Code:
            >/usr/bin/samscope 001370_S1.coordSorted.bam 
            SAM/BAM Input: 001370_S1.coordSorted.bam
             Filetype 'sambam' specified for 001370_S1.coordSorted.bam
            Search based on 001370_S1.coordSorted.bam
            Searching in .
            0 layer files found matching 001370_S1.coordSorted.bam
            001370_S1.coordSorted.bam bip sources missing. Building.
            Opening 001370_S1.coordSorted.bam
            Total length: 0
            Converting 001370_S1.coordSorted.bam to simple bip files
            Reading alignments...
            Segmentation fault
            The file is there and sizable.

            Code:
            > ls -lh 001370_S1.coordSorted.bam
            -rw-r--r-- 1 westerm 14 12G 2012-04-05 10:47 001370_S1.coordSorted.bam
            So the segfault could be related to the memory model.

            Might try again after a bit.

            Comment


            • #7
              Originally posted by westerman View Post
              Hard to get this working. For building from source, scons is not supported on my servers so had to install that first. Then ran into compile errors. Probably related to OpenGL.
              Thanks for giving it a try! It's hard to make a build system that makes everyone happy. Autoconf/autotools make files are clumsy and too hard for some users, and for everything else somebody somewhere doesn't have it installed. Personally I think scons makes a lot of sense, is easy to work with, and if you have python you can run it. And yes, for building from source you need boost, OpenGL drivers/headers, and GLUT (which is a tougher requirement than scons >_<).

              Originally posted by westerman View Post
              So ... I decided to go to a Ubuntu system and just install the binary. But when I run I get a segfault.

              Code:
              {snip}
              Opening 001370_S1.coordSorted.bam
              [COLOR="Red"]Total length: 0[/COLOR]
              Converting 001370_S1.coordSorted.bam to simple bip files
              Reading alignments...
              Segmentation fault
              Well there's your problem Samscope seems to think that the total length of your target sequences is 0, which sounds like a malformed/inappropriate header to me. Samscope parses BAM files using bamtools, so if bamtools can read it, you should be ok (and likewise, if samscope can't read it, bamtools should also have problems). A total target length of 0 is a weird edge case I honestly hadn't considered, and that's where the segfault comes from, so I'll fix that to make a more graceful error, but that file apparently has no targets sequences? If you're sure that BAM file is correct, can you show me the output of "samtools view -H 001370_S1.coordSorted.bam" or "bamtools header -in 001370_S1.coordSorted.bam" and/or send me a small fragment of the BAM file? (PM is fine).

              Incidentally, running samscope on a normal BAM file will produce a list of target sequences and their lengths, like this:

              Code:
              toy2.bam bip sources missing. Building.
              Opening toy2.bam
              Target 0: name='17_random' len=2911
              Target 1: name='Z_random' len=346234
              Total length: 349145
              Therefore it it looks like samscope/bamtools isn't finding any target sequences listed in your BAM's header data. If necessary you can reheader a BAM file with the "samtools reheader" command.
              Last edited by Crypticfortune; 04-05-2012, 07:42 AM. Reason: formatting

              Comment


              • #8
                How about checking for that problem and outputting a message.
                Segfaulting is not good.

                Comment


                • #9
                  Output of bam header. If it is not obvious this is a Trinity-generated header.

                  Code:
                  @SQ	SN:001370_S1_comp2_c0_seq1	LN:1369
                  @SQ	SN:001370_S1_comp4_c0_seq1	LN:1066
                  @SQ	SN:001370_S1_comp4_c1_seq1	LN:1059
                  @SQ	SN:001370_S1_comp4_c2_seq1	LN:1089
                  @SQ	SN:001370_S1_comp4_c3_seq1	LN:3371
                  @SQ	SN:001370_S1_comp4_c4_seq1	LN:3394
                  @SQ	SN:001370_S1_comp4_c5_seq1	LN:1082
                  ...
                  @SQ	SN:001370_S1_comp990134_c0_seq1	LN:206
                  @SQ	SN:001370_S1_comp1007009_c0_seq1	LN:221
                  @SQ	SN:001370_S1_comp1004783_c0_seq1	LN:229
                  103411 header lines total.

                  Comment


                  • #10
                    And just to prove that reads do map to the reference contigs, here is the first couple of 'grepped' lines from a samtools view

                    Code:
                    H-148:116:C0J0EACXX:6:1213:11934:32523	153	001370_S1_comp2_c0_seq1	1	
                    255	101M	*	0	0	TTTTTTTTTTTTTTTTTTTTTTTACAGAAAAAATAATTTT
                    CGACATTTATTGACAGACAGCCATGGGATCTCTTCATATATATTCACTTTACATCATTGGC	BBBDDBDDDDDDDDDD
                    @CFFFFGEHGHHJIIJJJJJJJJJJIGIIIIJIGGHDIJJJJJJJIHIJJJIJJJJIJGJJJJJJJJHHGJJGHHHHFFD
                    FFCCC	XA:i:0	MD:Z:101	NM:i:0
                    H-148:116:C0J0EACXX:6:1311:17645:55954	131	001370_S1_comp2_c0_seq1	2	
                    255	101M	=	117	216	TTTTTTTTTTTTTTTTTTTTATACAGAAAAAAAAATTTTC
                    GACATTTATTGACAGACAGCCATGGGATCTCTTCATATATATTCACTTTACATCATTGGCA	BCCFFFFFHHHHHJJ/
                    6ABD2(,(5(,>CDBB&63?@CD802(&)8B(+999(3392(+8?8@@AD?#############################
                    #####	XA:i:2	MD:Z:20T11T68	NM:i:2

                    Comment


                    • #11
                      Richard Finney> Indeed, segfaulting is not good. Version 1.6.6.2, which is hitting the servers as I type this, detects this case and outputs a more sensible error message (and a dump of the headers it saw for debugging purposes like this

                      westerman> Thanks for debugging with me. Now we're getting into confusing territory. It seemed vaguely possible that bamtools might have problems opening a 12GB file on a 32bit system, but I just tested a 55GB BAM file with 28k target sequences on a 32bit system and that worked, so that's not the problem (though 32bit systems are ultimately a problem for reference lengths greater than 2GB, though not at this early a stage as parsing the BAM header). I can't reproduce this error with any BAM files I've generated, so my next guess is that there's some incompatibility between the version of bamtools shipped with samscope and your BAM file, but that seems unlikely (I've never encountered a BAM file that bamtools can't parse but samtools can). Does "bamtools header -in 001370_S1.coordSorted.bam" work? You can build bamtools out of the samscope source directory if necessary. something like
                      Code:
                      apt-get source samscope
                      sudo apt-get build-dep samscope
                      cd samscope*/bamtools
                      ./build.sh
                      cd build
                      make
                      should do it. As an alternative, could you send me a fragment of that failing BAM file so I can reproduce this error? something like
                      Code:
                      samtools view -bo foo.bam 001370_S1.coordSorted.bam 001370_S1_comp2_c0_seq1:1-100
                      Finally, as a debugging sanity check, can you show me the output of "uname -a" and "samscope --version"?

                      Thanks for your feedback and patience!

                      Comment


                      • #12
                        Code:
                        > uname -a
                        Linux rick.genomics.purdue.edu 2.6.32-39-generic #86-Ubuntu SMP Mon Feb 13 21:47:32 UTC 2012 i686 GNU/Linux
                        
                        > samscope --version
                        samscope 1.6.6.1 tarball
                        Code:
                         apt-get source samscope
                        Reading package lists... Done
                        Building dependency tree       
                        Reading state information... Done
                        E: Unable to find a source package for samscope
                        So after some other work I finally decided to do get the new version via:

                        Code:
                        apt-get upgrade
                        Which now gives me the following error:

                        Code:
                        > rm 001370_S1.coordSorted.bam.bip.coverage.layer 
                        
                        samscope 001370_S1.coordSorted.bam
                        SAM/BAM Input: 001370_S1.coordSorted.bam
                         Filetype 'sambam' specified for 001370_S1.coordSorted.bam
                        Search based on 001370_S1.coordSorted.bam
                        Searching in .
                        0 layer files found matching 001370_S1.coordSorted.bam
                        001370_S1.coordSorted.bam bip sources missing. Building.
                        Opening 001370_S1.coordSorted.bam
                        Total length: 0
                        Raw header: 
                        error: Cannot generate BIP files for BAM files without target sequences! (What is this BAM file aligned to??)
                        Which is, of course, more informative. However I can not see the switch to specify the target sequences.

                        Comment


                        • #13
                          Scons requires python, to add to the dependencies. What is wrong with a Makefile?

                          Comment


                          • #14
                            Plain old makefiles are a pain to get set up for all systems. Autoconf helps a lot in this regard and you'll see many packages using autoconf. However the scons group believes that they can do better than autoconf. I did not find any problem in installing scons (aside from having to do it). I can't say if scons is any better than autoconf but it may be the new wave. Python should be present on almost every modern system so I do not see that as being a problem.

                            Comment


                            • #15
                              Originally posted by westerman View Post
                              Plain old makefiles are a pain to get set up for all systems. Autoconf helps a lot in this regard and you'll see many packages using autoconf. However the scons group believes that they can do better than autoconf. I did not find any problem in installing scons (aside from having to do it). I can't say if scons is any better than autoconf but it may be the new wave. Python should be present on almost every modern system so I do not see that as being a problem.
                              Well I had to install Python on my desktop. In theory its better to have fewer unnecessary dependencies, so as not to limit your audience. Certainly scons and Python are both unnecessary.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X