Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Doing things with CrossBow output ?

    I've used CrossBow ( took 3 hours on a 3 node, 10 core Hadoop cluster ) and worked through the mouse, chromosome 17 example given in the manual successfully to the end.. I now have 276,819 SNPs in SoapSNP output format and all the intermediate files..

    I want to view this stuff graphically - zooming in, exploding it, seeing if they cluster in certain areas etc - sort of like what the original paper http://genomebiology.com/2009/10/10/R112 does.

    I think I'm looking for a graphical display tool - something that can take in the reference sequence of the mouse chromosome 17 (all 95 million bases of it), the SNP calls( all 276K of them ) that Crossbow produced in SoapSNP output format and then let me look at it visually.

    A tool that could marry the two to produce a new sequence, substituting the SNPs at the relevant points would be nice too.

    Is this asking too much ? I've seen some fancy graphical tools at the EBI and NCBI websites so they have to be there, surely? Any ideas ?

  • #2
    You may wish to check out IGV for the BAM file for the visualization. using the SNp lists like a directory/index

    for others you can see
    I gathered up some of the recent free next generation sequence viewers that were capabale of viewing BAM files - and put each through the m...

    Pubmed Link Bioinformatics. 2011 Jan 28. [Epub ahead of print] Bambino: a variant detector and alignment viewer for next-generation sequenc...
    http://kevin-gattaca.blogspot.com/

    Comment


    • #3
      Thanks - I executed the 10 min. IGV tutorial. There's a LOT to that software isn't there. Checking out Bambino as well.

      Comment


      • #4
        Well I've scanned a whole set of visualizer products ( IGV, JMAPViewer, conversion tools ) to see if I can visualize the output of CrossBow out of the box so to speak. No success yet - seems like the output format of CrossBow is rather specific.. The description of it is at the bottom of this post. I just can't seem to find a tool that will convert it to a format that a visualizer will use, or a visualizer that will display that format out of the box. Am I missing something here ? Just checking before I start writing some code to do the conversions to a visualizer compatible format myself.


        ======================================
        Each individual record is in the SOAPsnp output format. SOAPsnp's format consists of 1 SNP per line with several tab-separated fields per SNP. The fields are:

        1. Chromosome ID
        2. 1-based offset into chromosome
        3. Reference genotype
        4. Subject genotype
        5. Quality score of subject genotype
        6. Best base
        7. Average quality score of best base
        8. Count of uniquely aligned reads corroborating the best base
        9. Count of all aligned reads corroborating the best base
        10. Second best base
        11. Average quality score of second best base
        12. Count of uniquely aligned reads corroborating second best base
        13. Count of all aligned reads corroborating second best base
        14. Overall sequencing depth at the site
        15. Sequencing depth of just the paired alignments at the site
        16. Rank sum test P-value
        17. Average copy number of nearby region
        18. Whether the site is a known SNP from the file specified with -s
        =================================

        Comment


        • #5
          Well, I've looked at a lot of genome browsers of varying pedigree. JMapViewer came closest in allowed me to look at the SNP file, in SOAPsnp format visually.. out of the box. It also wanted an alignment file, in Soap(Aligner) output format so I had to fake that and if I had access to the source code...

          No joy anywhere else.

          So I'm resigned to doing the conversions as needed, to VFC, BED, whichever formats are required by the software. From that perspective, the richness, pedigree and existing data availability ( annotation tracks ) of the genome browsers of the really large orgs look very good - I mean the UCSC Genome Browser, the Broad Institute's IGV and the NCBI MAP Viewer look good. That they make the source available is also good.

          I did have a problem with the IGV with mouse mm9, chr17 SNPs. when "loading from server", the SNP128 ( and also the repeats ) annotation tracks, it came back with

          An error occurred while loading igvdata.broadinstitute.org/annotations/mm9/variations/snp128.bed.gz data
          Error parsing header.


          Pity. No such problems with hg19 though - and looking at the file-names when it works (hg19) v when it doesn't(mm9), the .gz is a difference I note.

          The UCSC Genome Browser had no problems displaying their known versions of the SNPs.

          1. Now if anybody can explain to me where chromosome ideogram band labels/names come from ? There's a naming convention there, in those p13.2, q21.32 labels right ? google doesn't answer a simple question like that.

          EDIT: looks like the personal Genome SNP format ( PGSNP ) for an upload of a custom annotation track is now available on the main UCSC genome browser, not just the test one.. as per an archived message from them in the mailing list.
          Last edited by karve; 03-08-2011, 09:24 AM. Reason: mention PGSNP format

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 05-14-2024, 07:03 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-10-2024, 06:35 AM
          0 responses
          42 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-09-2024, 02:46 PM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          42 views
          0 likes
          Last Post seqadmin  
          Working...
          X