Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing Genome Distribution of Position Data

    Hello everyone,

    I would like to take my aligned small RNA data sets and create a graphical representation in order to see where the most and least expression takes place on a per chromosome or genome wide basis (preferably chromosome).

    I've noticed a lot of papers producing such graphs, but have yet to see one that mentions in any particular detail the program or technique used to create such graphs.

    These are the types of graphs I would like to replicate:


    Ideally I would like to use BED files of my data in order to create the graphs, but an alternative would also be acceptable. The data is coming from Solexa sequencing.

    Thank you in advance.


    -Brandon

  • #2
    I have also faced this problem and never found any software that (I would say) produces publication-quality figures. SeqMonk can produce plots that are useful for visual inspection but I didn't find them to be what I needed for publication.

    In the end I generated a bed file of fixed-width sliding windows for the whole genome and then used bed tools to quantify the number of reads within each window (these were bam files). I then used R to produce the plots. This approach was not optimal and very piecemeal but that represents my own limitations - for example I'm sure someone good at BioPython could come up with a nice script.

    Comment


    • #3
      Mosaik coverage from the Mosaik aligner creates similar types of plots.

      J

      Comment


      • #4
        J,

        Thanks for the recommendation of Mosaik aligner. Looking at the documentation for it, it looks promising.

        This will be a tomorrow project for me. I'll post up any results I'm able to come up with.


        Thanks again,
        Brandon

        Comment


        • #5
          Originally posted by natstreet View Post
          I have also faced this problem and never found any software that (I would say) produces publication-quality figures. SeqMonk can produce plots that are useful for visual inspection but I didn't find them to be what I needed for publication.
          Not wishing to divert the thread - but could you be a bit more specific about why you don't like the SeqMonk figures for publication? It might be something we can fix :-)

          Comment


          • #6
            How to generate coverage plots 101

            Well I discovered how to generate these graphs. After learning their proper name "coverage plots" I was able to find a wealth of information. Yes, I'm pretty new to bioinformatics. haha

            At any rate, this is how I was able to generate them and you will need SAMtools, BEDtools, and genomeCoverageBED from UCSC.


            Procedure-

            Aligned my clipped and collapsed reads with Bowtie and generated a SAM output:
            Code:
            bowtie -f -p 4 -k 15 -v 0 -S -t <index> <input.fa> <output.sam>
            Then I used SAMtools to covert from SAM to BAM:
            Code:
            samtools view -S -b -o <output.bam> <input.sam>
            Then sorted the BAM file:
            Code:
            samtools sort <input.bam> <output.sorted.bam>
            Finished up by generating a CoverageBed file:
            Code:
            genomeCoverageBed -bg -ibam <input.sorted.bam>  -g <chromosomesize.file> > <output.bedgraph>
            I then used IGV to upload the output.bedgraph file and this was the result:



            Here is a link describing steps other people have taken to generate their coverage plots:
            -coverage plot techniques

            Also, a very informative PDF containing any and all information you could possibly need in order to generate a coverage plot with your data from varying source files: Link

            IGV seems to be the most useful viewer for this type of file I have used thus far. I have tried using Savant, which is supposedly able to display these files however it requires the conversion of a BAM files into a BAM coverage file within the viewer and when I tried I let the coversion progress go on for about 20 minutes and the progress bar had not moved a micron so I gave up using it.

            The current version of SeqMonk is also unable to view these files, however I read in a few posts by the SeqMonk people that their next release is going to have the capability of viewing chromosome wide coverage plots, the beta images I saw looked promising, that was around September 2010 so maybe it will be coming soon?

            I hope this has been hopefully to others like myself who have been wondering how to generate these graphs.

            -Brandon
            Last edited by DrD2009; 01-23-2011, 01:37 AM.

            Comment


            • #7
              Originally posted by DrD2009 View Post
              The current version of SeqMonk is also unable to view these files, however I read in a few posts by the SeqMonk people that their next release is going to have the capability of viewing chromosome wide coverage plots, the beta images I saw looked promising, that was around September 2010 so maybe it will be coming soon?
              The genome wide quantitation plot (which is probably what you want for this) has been available in SeqMonk since September last year, from v0.12.0, so it's possible to make these plots now, however the workflow would be slightly different to how you laid it out above. The reason is that SeqMonk is not a viewer for pre-quantitated data (BED/WIG/BigWIG/BedGraph etc), but instead takes in your raw data (BAM/SAM etc) and does both the quantitation and display in real time.

              In your case the workflow would be:
              1. Align your reads with bowtie to generate a SAM/BAM file
              2. Load the raw data into SeqMonk
              3. Create tiled probes over the whole genome
              4. Quantitate the enrichment of reads in each tile (there are a few options for doing this depening on exactly what you want to see)
              5. Look at and export the genome wide or chromosome wide plots of the quantitated data


              The benefit of working this way would be that if you wanted to change your quantitation (eg exclude duplicates, change between log / linear scales, measure only over genes etc. etc.) then you can easily do this within the same session rather than having to go outside the program to requantitate and reimport BED files. You can also use the quantitations to filter out regions with unusual coverage and report on these.

              In the end this may or may not be an ideal workflow for you, but hopefully it gives you a clearer idea of your options here.

              Comment


              • #8
                Simon,

                Thank you for that rapid reply. After playing with it a little more I was able to generate the coverage plots based on the steps you laid out.

                Can you please explain how you were able to generate this graphic?
                http://seqanswers.com/forums/attachm...5&d=1292664453

                When I export the chromosome view it just shows the chromosomes in solid blue.

                I'll explain in detail what I did once I get that visualization.

                Thank you very much.


                -Brandon

                Comment


                • #9
                  Originally posted by DrD2009 View Post
                  Can you please explain how you were able to generate this graphic?
                  http://seqanswers.com/forums/attachm...5&d=1292664453

                  When I export the chromosome view it just shows the chromosomes in solid blue.
                  Just select the data store you want to see from the data panel (the set of folders on the top left). The quantitation for that store will then show up in the whole genome view.

                  Comment


                  • #10
                    It works perfectly and is much easier to perform in SeqMonk than having to convert files to another format.


                    My lab members were wondering if there is a way to place scales on the X and Y-axis?


                    Thanks,
                    Brandon

                    Comment


                    • #11
                      Originally posted by DrD2009 View Post
                      My lab members were wondering if there is a way to place scales on the X and Y-axis?
                      There's no current option to do that. I think the only way it might be possible to do this would be with some kind of scale bar rather than actual text on the axes. Many organisms have far more chromosomes than Arabidopsis so the amount of space in which to draw particularly the y axis is pretty limited.

                      I'll take a look and see what I can do in the next version.

                      Comment


                      • #12
                        Is there a way for SeqMonk to recognize score values such as those in BED or GFF formats?

                        Comment


                        • #13
                          Originally posted by DrD2009 View Post
                          Is there a way for SeqMonk to recognize score values such as those in BED or GFF formats?
                          GFF or BED files in SeqMonk are used solely as annotation tracks so there's no option to display or use quantitations which come in from those sort of formats (although it would be simple enough to add that information to the annotation feature properties so you could at least view it for a single feature if you really wanted to).

                          SeqMonk is not an exactly equivalent program to something like IGV - it's designed to work from raw mapped data and to handle downstream calculations and visualisations internally. You can import regions of interest from outside (eg hits from an external peak caller), but you'd still quantitate these internally in order to display them.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          25 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          28 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X