Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pileup from BAM or SAM

    Hi,

    Currently I'm trying to solve a simple problem for bioinformatician, but being a experimental biologist it's proving to be very tricky for me.

    At the moment I've obtained mapped reads in SAM format. I want to determine the sequencing coverage in term of number of times sequenced per base. I know I can use pileup function in samtool however it does not give strand specific information.

    Are there any way of counting the bases per strand?

    Or perhaps handling the pileup dataset in Galaxy to obtain strand specific count?

    Many help would be great thank you.

  • #2
    Originally posted by nilmot13 View Post
    Hi,

    Currently I'm trying to solve a simple problem for bioinformatician, but being a experimental biologist it's proving to be very tricky for me.

    At the moment I've obtained mapped reads in SAM format. I want to determine the sequencing coverage in term of number of times sequenced per base. I know I can use pileup function in samtool however it does not give strand specific information.

    Are there any way of counting the bases per strand?

    Or perhaps handling the pileup dataset in Galaxy to obtain strand specific count?

    Many help would be great thank you.
    The better tool for this job is the BEDTools Suite, a collection of programs for manipulating genomic range/alignment data. For your particular problem the tool to use is genomeCoverageBed. You provide as input your sorted BAM file and a genome file. The genome file is simply a text file listing the name of each chromosome and its size in bp (see the documentation). The program has options to calculate either global coverage parameters or per base depth. It also has an option to calculate strand specific coverage.

    Comment


    • #3
      Dear kmcarr,

      That's sound exactly like what I need. Thank you every so much, I'll try it out!!

      Comment


      • #4
        Originally posted by kmcarr View Post
        The better tool for this job is the BEDTools Suite, a collection of programs for manipulating genomic range/alignment data. For your particular problem the tool to use is genomeCoverageBed. You provide as input your sorted BAM file and a genome file. The genome file is simply a text file listing the name of each chromosome and its size in bp (see the documentation). The program has options to calculate either global coverage parameters or per base depth. It also has an option to calculate strand specific coverage.
        Hi kmcarr,

        I've managed to get it to work.

        If you have the time do you mind helping me with a couple of more question with regards to BEDTools?

        I've setup test file to run the genomeCoverageBed script. Is it possible to output the report as a file? At the moment I'm using this command.

        $ genomeCoverageBed -ibam <test file.bam> -bga -strand -g <genome file.genome>

        Many thanks!

        Comment


        • #5
          Other confusions

          I think i've worked it out:

          by doing "command as before" > filename.file

          However, the coverage information is slighly confusing?
          I was expecting count per base with strand specificity, but I think it's giving me coverage percentage with respect to entire genome?

          Comment


          • #6
            When using the -strand option you must follow that with either a '+' or '-' to indicate which strand to report. The output data will be limited to coverage on the strand you indicated so you will need to run the command twice, once for each strand.

            Using the -bga option would not report fractions. It outputs a standard bedGraph format file, reporting intervals of coverage at a particular depth with a new interval started when the depth changes. For example:

            Code:
            The columns are:
            Chrom  startPos endPos depth
            
            Chr1	3658	3684	1
            Chr1	3684	3693	2
            Chr1	3693	3733	3
            Chr1	3733	3759	2
            Chr1	3759	3763	1
            Chr1	3763	3768	3
            Chr1	3768	3773	2
            Chr1	3773	3838	3
            Chr1	3838	3848	1
            Chr1	4008	4055	1
            Chr1	4055	4077	2
            Chr1	4077	4083	3
            Chr1	4083	4119	2
            Chr1	4119	4125	3
            Chr1	4125	4130	4
            Chr1	4130	4139	3
            Chr1	4139	4144	4
            Chr1	4144	4152	5
            Chr1	4152	4154	4
            If you want to report the depth at each and every position of the genome you should use the -d option instead of -bg(a).

            Comment


            • #7
              Thank, I've just setup a test file. It works fine now with the +/- sign after the -strand command.

              Thanks every so much.

              Comment


              • #8
                Hi,

                A very quick question I don't know if you have encountered this before

                during the run of the genomeCoverageBed scipt the output resulted in this error message

                BGZF ERROR: read block failed - could not read data from block

                I have sorted my bam file beforehand and have check the total chromosome length is correct.
                Can anyone suggest potential cause of error?

                Thank you very much

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                23 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                21 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X