Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Doing things with CrossBow output ?

    I've used CrossBow ( took 3 hours on a 3 node, 10 core Hadoop cluster ) and worked through the mouse, chromosome 17 example given in the manual successfully to the end.. I now have 276,819 SNPs in SoapSNP output format and all the intermediate files..

    I want to view this stuff graphically - zooming in, exploding it, seeing if they cluster in certain areas etc - sort of like what the original paper http://genomebiology.com/2009/10/10/R112 does.

    I think I'm looking for a graphical display tool - something that can take in the reference sequence of the mouse chromosome 17 (all 95 million bases of it), the SNP calls( all 276K of them ) that Crossbow produced in SoapSNP output format and then let me look at it visually.

    A tool that could marry the two to produce a new sequence, substituting the SNPs at the relevant points would be nice too.

    Is this asking too much ? I've seen some fancy graphical tools at the EBI and NCBI websites so they have to be there, surely? Any ideas ?

  • #2
    You may wish to check out IGV for the BAM file for the visualization. using the SNp lists like a directory/index

    for others you can see
    I gathered up some of the recent free next generation sequence viewers that were capabale of viewing BAM files - and put each through the m...

    Pubmed Link Bioinformatics. 2011 Jan 28. [Epub ahead of print] Bambino: a variant detector and alignment viewer for next-generation sequenc...
    http://kevin-gattaca.blogspot.com/

    Comment


    • #3
      Thanks - I executed the 10 min. IGV tutorial. There's a LOT to that software isn't there. Checking out Bambino as well.

      Comment


      • #4
        Well I've scanned a whole set of visualizer products ( IGV, JMAPViewer, conversion tools ) to see if I can visualize the output of CrossBow out of the box so to speak. No success yet - seems like the output format of CrossBow is rather specific.. The description of it is at the bottom of this post. I just can't seem to find a tool that will convert it to a format that a visualizer will use, or a visualizer that will display that format out of the box. Am I missing something here ? Just checking before I start writing some code to do the conversions to a visualizer compatible format myself.


        ======================================
        Each individual record is in the SOAPsnp output format. SOAPsnp's format consists of 1 SNP per line with several tab-separated fields per SNP. The fields are:

        1. Chromosome ID
        2. 1-based offset into chromosome
        3. Reference genotype
        4. Subject genotype
        5. Quality score of subject genotype
        6. Best base
        7. Average quality score of best base
        8. Count of uniquely aligned reads corroborating the best base
        9. Count of all aligned reads corroborating the best base
        10. Second best base
        11. Average quality score of second best base
        12. Count of uniquely aligned reads corroborating second best base
        13. Count of all aligned reads corroborating second best base
        14. Overall sequencing depth at the site
        15. Sequencing depth of just the paired alignments at the site
        16. Rank sum test P-value
        17. Average copy number of nearby region
        18. Whether the site is a known SNP from the file specified with -s
        =================================

        Comment


        • #5
          Well, I've looked at a lot of genome browsers of varying pedigree. JMapViewer came closest in allowed me to look at the SNP file, in SOAPsnp format visually.. out of the box. It also wanted an alignment file, in Soap(Aligner) output format so I had to fake that and if I had access to the source code...

          No joy anywhere else.

          So I'm resigned to doing the conversions as needed, to VFC, BED, whichever formats are required by the software. From that perspective, the richness, pedigree and existing data availability ( annotation tracks ) of the genome browsers of the really large orgs look very good - I mean the UCSC Genome Browser, the Broad Institute's IGV and the NCBI MAP Viewer look good. That they make the source available is also good.

          I did have a problem with the IGV with mouse mm9, chr17 SNPs. when "loading from server", the SNP128 ( and also the repeats ) annotation tracks, it came back with

          An error occurred while loading igvdata.broadinstitute.org/annotations/mm9/variations/snp128.bed.gz data
          Error parsing header.


          Pity. No such problems with hg19 though - and looking at the file-names when it works (hg19) v when it doesn't(mm9), the .gz is a difference I note.

          The UCSC Genome Browser had no problems displaying their known versions of the SNPs.

          1. Now if anybody can explain to me where chromosome ideogram band labels/names come from ? There's a naming convention there, in those p13.2, q21.32 labels right ? google doesn't answer a simple question like that.

          EDIT: looks like the personal Genome SNP format ( PGSNP ) for an upload of a custom annotation track is now available on the main UCSC genome browser, not just the test one.. as per an archived message from them in the mailing list.
          Last edited by karve; 03-08-2011, 09:24 AM. Reason: mention PGSNP format

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          27 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X