SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to measure percent of reference covered ashutosh Bioinformatics 0 05-21-2013 09:16 AM
Not covered regions between bam and bed. gmarco Bioinformatics 8 05-07-2013 02:07 PM
Exome not fully covered in WGS tahamasoodi Bioinformatics 5 12-07-2012 10:52 AM
Sequencing all for the diagnosis--now it's even covered. Joann Literature Watch 0 03-03-2011 09:03 AM
Percentage of mapped reads ? zack80.liu Bioinformatics 6 03-01-2011 09:08 AM

Reply
 
Thread Tools
Old 06-25-2013, 07:28 AM   #1
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default Percentage of exons covered

Hi all,

I have a bed file with the start and end positions of all the exons. How can we find out, what percentage of these exons are covered in the sequenced sample(bam file)?

I have tried using,

samtools view -b <BAM> | coverageBed -abam stdin -b exons.bed

and a few lines from the output looks like,

chr1 60817388 60817507 0 0 119 0.0000000
chr1 91226064 91226183 0 0 119 0.0000000
chr1 101711776 101711895 0 0 119 0.0000000
chr1 108003221 108003340 0 0 119 0.0000000
....

Also tried,
samtools view -b | coverageBed -abam stdin -b exons.bed -hist

Few lines of output:

chr1 60817388 60817507 32 1 119 0.0084034

chr1 60817388 60817507 33 2 119 0.0168067

chr1 60817388 60817507 34 3 119 0.0252101

Which output should be used to get the percentage of exons covered?

Or Any other method to do so, would be highly appreciated.

Thanks

Last edited by meher; 06-25-2013 at 08:27 AM.
meher is offline   Reply With Quote
Old 06-25-2013, 09:49 AM   #2
fengqi
Member
 
Location: US

Join Date: Aug 2012
Posts: 10
Default

The output one (w/o -hist) only gives you the fraction of exons bases has non-zero coverage by your reads. There is no information of the depth of the coverage.

The output two gives you the fraction covered of each depth of ceverage.

At the end of the output two, you can see a histogram summarizing the coverage among all exons.
fengqi is offline   Reply With Quote
Old 06-25-2013, 09:52 AM   #3
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Quote:
Originally Posted by fengqi View Post
The output one (w/o -hist) only gives you the fraction of exons bases has non-zero coverage by your reads. There is no information of the depth of the coverage.

The output two gives you the fraction covered of each depth of ceverage.

At the end of the output two, you can see a histogram summarizing the coverage among all exons.
Can we get a single value like x% of exons are covered from the output?
meher is offline   Reply With Quote
Old 06-25-2013, 09:58 AM   #4
fengqi
Member
 
Location: US

Join Date: Aug 2012
Posts: 10
Default

Quote:
Originally Posted by meher View Post
Can we get a single value like x% of exons are covered from the output?
default output will report 4 columns for each interaval in B (that is your exon info file)

The first column is the nubmer of reads in your bam file that overlappled the exon interval.

You can easily pick up those exons with zero value. Then you will know how many exons covered, then you can get x%.

Hope I understood correctly what you want.
fengqi is offline   Reply With Quote
Old 06-25-2013, 10:01 AM   #5
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Quote:
Originally Posted by fengqi View Post
default output will report 4 columns for each interaval in B (that is your exon info file)

The first column is the nubmer of reads in your bam file that overlappled the exon interval.

You can easily pick up those exons with zero value. Then you will know how many exons covered, then you can get x%.

Hope I understood correctly what you want.
Sorry that i couldn't understand, what do you mean by default output? Is it from

samtools view -b <BAM> | coverageBed -abam stdin -b exons.bed

or

samtools view -b | coverageBed -abam stdin -b exons.bed -hist

Either of these outputs have more than 4 columns. So, which one are you referring?
meher is offline   Reply With Quote
Old 06-25-2013, 10:44 AM   #6
fengqi
Member
 
Location: US

Join Date: Aug 2012
Posts: 10
Default

Quote:
Originally Posted by meher View Post
Sorry that i couldn't understand, what do you mean by default output? Is it from

samtools view -b <BAM> | coverageBed -abam stdin -b exons.bed

or

samtools view -b | coverageBed -abam stdin -b exons.bed -hist

Either of these outputs have more than 4 columns. So, which one are you referring?
The first one, w/o -hist
fengqi is offline   Reply With Quote
Old 06-25-2013, 10:48 AM   #7
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Quote:
Originally Posted by fengqi View Post
The first one, w/o -hist
Okay. But there are seven columns in the output. First three gives the chromosome number, start and end position. The next two columns are zero's and sixth is the length of the exon and the last is again zero.

So, how can we get the x% out of this?
meher is offline   Reply With Quote
Old 06-25-2013, 02:55 PM   #8
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

it sounds like you first want to make the coverage of each exon into a binary call (ie either covered or not covered). Then you can just count the number that are covered verses the total.

The last column of the coverageBed output is the percent coverage of the exon feature in that row. Let's say you want to call 95% coverage "covered" and anything below that as "not covered". Also consider that if you're working with a typical alternatively spliced species your list of exons may have redundant entries for exons that are shared between isoforms. So to put it together this might work:

first count all of the unique lines in the exons file:
Code:
sort -k1,1 -k2,2m -k3,3n exons.bed | uniq > exons_unique.bed
wc -l exons_unique.bed
Now you have a total count of exons. Now run the coverageBed with the unique list and then count the number of rows that have a value in the last column greater than your "covered" threshold:

Code:
coverageBed -abam <bam_file> -b exons_unique.bed > exons_coverage.bed
awk '{if($7 > 0.95) print $0}' exons_coverage.bed | wc -l
that should do it. realize, of course, that if you are working with an alternatively spliced transcriptome this result is a little weird because instead of knowing which isoforms the coverage actually belongs to we're assigning to any that the coverage intersects with.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 06-26-2013, 03:31 AM   #9
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Quote:
Originally Posted by sdriscoll View Post
it sounds like you first want to make the coverage of each exon into a binary call (ie either covered or not covered). Then you can just count the number that are covered verses the total.

The last column of the coverageBed output is the percent coverage of the exon feature in that row. Let's say you want to call 95% coverage "covered" and anything below that as "not covered". Also consider that if you're working with a typical alternatively spliced species your list of exons may have redundant entries for exons that are shared between isoforms. So to put it together this might work:

first count all of the unique lines in the exons file:
Code:
sort -k1,1 -k2,2m -k3,3n exons.bed | uniq > exons_unique.bed
wc -l exons_unique.bed
Now you have a total count of exons. Now run the coverageBed with the unique list and then count the number of rows that have a value in the last column greater than your "covered" threshold:

Code:
coverageBed -abam <bam_file> -b exons_unique.bed > exons_coverage.bed
awk '{if($7 > 0.95) print $0}' exons_coverage.bed | wc -l
that should do it. realize, of course, that if you are working with an alternatively spliced transcriptome this result is a little weird because instead of knowing which isoforms the coverage actually belongs to we're assigning to any that the coverage intersects with.
Thanks it worked. But could it be possible to find, how much percentage of the exons are covered by atleast 1X/5X/10X reads out of the coverageBed output?
meher is offline   Reply With Quote
Reply

Tags
coverage, exon capture

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:39 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO