Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Plotting per chromosome coverage for BAM files

    Hi,

    I wish to plot coverage profiles for each of the chromosomes from my BAM files which I obtained after aligning exome sequencing data (hg19).

    I had initially thought of using samtools to generate pileup files, and then plot them however, that would take a long time and I guess that would be an inefficient way to go about it.

    Are there any R/other packages that can quickly help me get these plots ?

    Any suggestions would be much appreciated.

  • #2
    Bedtools can do it straight from the bam (genomeCoveragebed). Use the -d option for per-base coverage

    Comment


    • #3
      samtools depth can also generate per-base coverage directly. the result simply consist of chromose coordinates and its depth, it is not too hard to draw a coverage plot with R in my opioin

      Comment


      • #4
        SeqMonk can make these kinds of plots very quickly if you're happy with a more manual solution (see attached).
        Attached Files

        Comment


        • #5
          I'd recommend bedtools genomeCoverage over samtools depth for this purpose. See http://www.biostars.org/p/67579/ for discussion. Samtools skips secondary alignments and aberrant read pairs. Plots can, of course, be generated in R.

          Comment


          • #6
            Simon, could you please explain little further how we can generate such a plot in SeqMonk? I tried to find in documentation but couldnt find anything... Thanks!

            Comment


            • #7
              Originally posted by kadircaner View Post
              Simon, could you please explain little further how we can generate such a plot in SeqMonk? I tried to find in documentation but couldnt find anything... Thanks!
              The quickest way would be to simply load your data and then run Data > Quantitation Pipelines > Wiggle Plot

              In the options you'd set the region to cover to be the whole genome, and if you want real coverage values then turn off the "total count correction" option. Depending on the scale of your data you could try with or without the log transformation.

              Once you've done the quantitation you can select any dataset in the data view (top left) and it will be shown in the genome view, as in the plot I attached before. You can make the colours fixed by pressing the 4th button from the left in the toolbar, and you can save the image by going to File > Export Current View > Genome View.

              Hope this helps

              Comment


              • #8
                Thanks Simon!!!

                Comment


                • #9
                  Originally posted by simonandrews View Post
                  The quickest way would be to simply load your data and then run Data > Quantitation Pipelines > Wiggle Plot

                  In the options you'd set the region to cover to be the whole genome, and if you want real coverage values then turn off the "total count correction" option. Depending on the scale of your data you could try with or without the log transformation.

                  Once you've done the quantitation you can select any dataset in the data view (top left) and it will be shown in the genome view, as in the plot I attached before. You can make the colours fixed by pressing the 4th button from the left in the toolbar, and you can save the image by going to File > Export Current View > Genome View.

                  Hope this helps
                  Hi, Simon. I am new to SeqMonk and now I am using your method to get the genome coverage of my alignment file. The problem is for SeqMonk, I can only download genomes as GRCH37/NCBI34/35/36. However, the reference I used to do the alignment is HG19. Is it possible to load the genomes from my reference HG19 fa file? Or it doesn't matter, I can just choose any genome files, SeqMonk will give me the correct coverage?

                  Comment


                  • #10
                    Originally posted by zhoujiayi View Post
                    Hi, Simon. I am new to SeqMonk and now I am using your method to get the genome coverage of my alignment file. The problem is for SeqMonk, I can only download genomes as GRCH37/NCBI34/35/36. However, the reference I used to do the alignment is HG19. Is it possible to load the genomes from my reference HG19 fa file? Or it doesn't matter, I can just choose any genome files, SeqMonk will give me the correct coverage?
                    The NCBIXX/GRChXX vs hgXX all refer to exactly the same assemblies, it's just a nomenclature difference between different centres. As long as you pick the correct equivalent name they are completely interchangeable. NCBI36 is hg18 and GRCh37 is hg19.

                    Comment


                    • #11
                      Originally posted by medalofhonour View Post
                      Hi,

                      I wish to plot coverage profiles for each of the chromosomes from my BAM files which I obtained after aligning exome sequencing data (hg19).

                      I had initially thought of using samtools to generate pileup files, and then plot them however, that would take a long time and I guess that would be an inefficient way to go about it.

                      Are there any R/other packages that can quickly help me get these plots ?

                      Any suggestions would be much appreciated.
                      Hello- As an alternative solution... This one is command-line based and doesn't require to plot millions of datapoints. First, divide each chromosome in the human (or whatever) genome in windows of n bases, then count reads in each window. The output file is pretty small and easy to plot with R or anything else:

                      Code:
                      ## Get human chromosomes, if you don't have them already
                      mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
                      	"select chrom, size from hg19.chromInfo" > hg19.genome
                      
                      ## Make windows and count reads
                      bedtools makewindows -g hg19.genome -w 100000 \
                          | coverageBed -abam myaln.bam -b - -counts \
                          | sort -k 1,1 -k2,2n -k3,3n > myaln.counts.txt
                      
                      
                      ## Output example:
                      cat myaln.counts.txt
                      chr1    0       100000  17
                      chr1    100000  200000  10
                      chr1    200000  300000  16
                      chr1    300000  400000  0
                      chr1    400000  500000  0
                      chr1    500000  600000  33
                      chr1    600000  700000  32
                      ...
                      Dario

                      Comment


                      • #12
                        When loading BAM into seqmonk I get 401980 warnings like this:

                        Reading position 195567590 was 95619bp beyond the end of chr1 (195471971)


                        What does that mean?


                        I trying to re-build my own genome GRCm38, just to check, it was the one I used for alignment.
                        Last edited by sindrle; 03-05-2014, 08:02 PM.

                        Comment


                        • #13
                          Originally posted by medalofhonour View Post
                          Hi,

                          I wish to plot coverage profiles for each of the chromosomes from my BAM files which I obtained after aligning exome sequencing data (hg19).

                          I had initially thought of using samtools to generate pileup files, and then plot them however, that would take a long time and I guess that would be an inefficient way to go about it.

                          Are there any R/other packages that can quickly help me get these plots ?

                          Any suggestions would be much appreciated.
                          Creating a bam, then sorting, then indexing it, then doing what you want is inefficient. If you have enough memory to store the complete genome (around 3 bytes per reference base), I recommend using pileup.sh (from the BBMap package). You can use bbmap.sh and pipe output to pileup.sh, which will generate a coverage file as the only output, skipping sam/bam files, and requiring no sorting (at the expense of using more memory - around 4 bytes per reference base). You can also use already generated sam files (and bam files if you have samtools installed).

                          For example:
                          pileup.sh in=samfile.sam out=coverage.txt
                          Please run pileup.sh with no arguments to display more options. The default gives the output that people at JGI usually want.

                          Comment


                          • #14
                            Originally posted by sindrle View Post
                            When loading BAM into seqmonk I get 401980 warnings like this:

                            Reading position 195567590 was 95619bp beyond the end of chr1 (195471971)


                            What does that mean?


                            I trying to re-build my own genome GRCm38, just to check, it was the one I used for alignment.
                            Those errors almost certainly mean that you've used a different version of the genome assembly to do your mapping to the one you've used to create your project. You should go back and check. If you have samtools installed on your machine then you can run samtools view -H some.bam to look at the headers, which will include the command used to generate the file, which should in turn contain the name of the assembly.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin




                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                              04-22-2024, 07:01 AM
                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 08:47 AM
                            0 responses
                            16 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            60 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            60 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            54 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X