Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bedtools coverage is inaccurate

    I am using bedtools to calculate coverage in an RNA-seq bam file. I get screwy readings when it encounters a gap where there are no reads. Rather than filling it in with zeros, it takes a value that is somewhere near the last read and uses that value to fill across the gap. For example, I just looked in IGV at a suspicious area in a bam file. IGV shows readcounts of 11, 11, 10, 10, 10, 9, 9, 9, 9, 1 and then a bunch of zeros. bedtools, on the other hand, lists the same numbers except that when it comes to the final 1, it says 7, and then the very long list of zeros is instead a very long list of 7s. Can anyone suggest a solution? The code I used is pasted below. (The last little thing where I pipe results to grep is to output only lines with non-zero read counts. Perhaps that's where I'm screwing up, but I don't think so.)

    Thanks.

    Eric



    genomeCoverageBed -d -split -ibam input.bam -g /home/efoss/gene_databases/dot_genome_files/g1k_v37.genome | grep -v -P "\S+\t\S+\t0" > output.txt
    Last edited by efoss; 09-19-2013, 11:48 AM.

  • #2
    Hey Eric,

    My guess would be that if a read is split (such as spanning a splice junction), then that read may count towards the coverage for the entire interval, even the skipped bases. This link seems to talk about a similar issue. Looks like the solution in the post is to split those reads and try coverage again.

    (Edit: although looks like you did use the -split argument so maybe it is a different issue).

    Justin
    Last edited by BAMseek; 09-19-2013, 10:16 PM.

    Comment


    • #3
      Hi!

      Make sure the BAM file is sorted by coordinate and the splitted alignment is defined by N cigar code (true for tophat by default).

      Also you might check some alternative ways to calculate coverage, for example using coverageBed and intersection with annotation file:
      This post is actually inspired by the discussion on the Qualimap google groups . Assume we need to calculate mean coverage inside of the ...

      Comment


      • #4
        Hi kokonech,

        Thanks for the suggestion. I finally figured it out - there was a version of bedtools that had this as a known bug. I updated to the latest version and the problem went away.

        I bookmarked your site. It looks very helpful.

        Best wishes,

        Eric

        Comment


        • #5
          Hi efoss,

          can you tell us which version of bedtools you were using before?

          thanks

          Comment


          • #6
            Hi SylvainL,

            I'm afraid I don't know which version it was. I think it was installed on our system in 2011. The one I'm using now is 2.17.0.

            Eric

            Comment


            • #7
              Hey Guys,

              I am getting this weird error for my hg19.fa genome file.

              this is my command line argument:
              genomeCoverageBed -d -i Sample_23430_B14_T.sorted.dedup.rg.resorted.realigned.bed -g /pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/bwa/genome/hg19.fa 1> Sample_23430_B14_T.sorted.dedup.rg.resorted.realigned.coverage

              and I get this error:
              Less than the req'd two fields were encountered in the genome file (/pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/bwa/genome/hg19.fa) at line 1. Exiting.

              anyone know why, I can't seem to figure it out.

              Thanks!,
              Nino

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X