SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   bedtools coverage is inaccurate (http://seqanswers.com/forums/showthread.php?t=33876)

efoss 09-19-2013 11:25 AM

bedtools coverage is inaccurate
 
I am using bedtools to calculate coverage in an RNA-seq bam file. I get screwy readings when it encounters a gap where there are no reads. Rather than filling it in with zeros, it takes a value that is somewhere near the last read and uses that value to fill across the gap. For example, I just looked in IGV at a suspicious area in a bam file. IGV shows readcounts of 11, 11, 10, 10, 10, 9, 9, 9, 9, 1 and then a bunch of zeros. bedtools, on the other hand, lists the same numbers except that when it comes to the final 1, it says 7, and then the very long list of zeros is instead a very long list of 7s. Can anyone suggest a solution? The code I used is pasted below. (The last little thing where I pipe results to grep is to output only lines with non-zero read counts. Perhaps that's where I'm screwing up, but I don't think so.)

Thanks.

Eric



genomeCoverageBed -d -split -ibam input.bam -g /home/efoss/gene_databases/dot_genome_files/g1k_v37.genome | grep -v -P "\S+\t\S+\t0" > output.txt

BAMseek 09-19-2013 10:13 PM

Hey Eric,

My guess would be that if a read is split (such as spanning a splice junction), then that read may count towards the coverage for the entire interval, even the skipped bases. This link seems to talk about a similar issue. Looks like the solution in the post is to split those reads and try coverage again.

(Edit: although looks like you did use the -split argument so maybe it is a different issue).

Justin

kokonech 09-19-2013 11:30 PM

Hi!

Make sure the BAM file is sorted by coordinate and the splitted alignment is defined by N cigar code (true for tophat by default).

Also you might check some alternative ways to calculate coverage, for example using coverageBed and intersection with annotation file:
http://okko73313.blogspot.de/2013/04...hromosome.html

efoss 09-20-2013 12:44 PM

Hi kokonech,

Thanks for the suggestion. I finally figured it out - there was a version of bedtools that had this as a known bug. I updated to the latest version and the problem went away.

I bookmarked your site. It looks very helpful.

Best wishes,

Eric

SylvainL 09-23-2013 01:02 AM

Hi efoss,

can you tell us which version of bedtools you were using before?

thanks

efoss 09-23-2013 11:03 AM

Hi SylvainL,

I'm afraid I don't know which version it was. I think it was installed on our system in 2011. The one I'm using now is 2.17.0.

Eric

Nino 05-20-2014 09:44 AM

Hey Guys,

I am getting this weird error for my hg19.fa genome file.

this is my command line argument:
genomeCoverageBed -d -i Sample_23430_B14_T.sorted.dedup.rg.resorted.realigned.bed -g /pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/bwa/genome/hg19.fa 1> Sample_23430_B14_T.sorted.dedup.rg.resorted.realigned.coverage

and I get this error:
Less than the req'd two fields were encountered in the genome file (/pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/bwa/genome/hg19.fa) at line 1. Exiting.

anyone know why, I can't seem to figure it out.

Thanks!,
Nino


All times are GMT -8. The time now is 07:52 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.