Go Back   SEQanswers > Applications Forums > RNA Sequencing

Similar Threads
Thread Thread Starter Forum Replies Last Post
BEDTools: Unexpected file format. Please use tab-delimited BED, GFF, or VCF. id0 Bioinformatics 16 02-22-2016 05:43 AM
Read counts from SAM file mapped to de novo assembled transcripts using HTSeq-count alan_sm RNA Sequencing 2 06-12-2015 08:54 PM
Is there a BED file format validator? Does a BED file have to be sorted position? LauraSmith Bioinformatics 3 05-21-2013 11:54 AM
BEDtools intersect output is BED instead of BAM syfo Bioinformatics 1 12-18-2012 04:26 AM
bed file for each read m_elena_bioinfo Bioinformatics 2 01-25-2011 07:52 AM

Thread Tools
Old 12-20-2013, 01:22 PM   #1
Location: Santa Fe, NM

Join Date: Nov 2013
Posts: 20
Default Getting mapped read count from a BED file - bedtools coverage?

Hi, all. I've been trying to analyze an experiment that I downloaded from GEO, GSE34241, which has four samples assayed with RNA-Seq (AB SOLiD System). Apart from being interested in some gene expression in this experiment, I'm using it as a tutorial for dealing with new file formats (which never ends).

The authors did not upload any data in the Series Matrix or SOFT files. Instead, they uploaded four BED files. After spending probably way too much time trying to figure out how to extract matches against the TAIR10 genome, I finally downloaded the latest bedtools2 from github, and lo and behold it has a nice coverage sub-command that works with these files. I've checked the first output number, # features in sample file that overlap the interval in the genome file, and it pans out for some genes I know. The other three outputs are: # bases in genome file that had non-zero coverage; length of entry in genome file; fraction of bases in genome file that had non-zero coverage.

SO, I'm tempted to use the first number, # features that overlap, as my read counts to do the usual further analysis with DESeq2 (normaliztion, DE analysis). But are there some things I should look out for from bedtools coverage output, like, say, if the fraction of bases in the genome file that were not covered is large, for example?

The command I used is, for example:

bedtools coverage -a GSM845432_F1DPI_TAIR10.bed -b TAIR10_GFF3_genes.gff > GSM845432_F1DPI_TAIR10.txt

Thanks for any tips! This is a learning exercise, the RNA-Seq data that my lab generated was read-mapped by DNANexus and I'm hoping that they know what they're doing; at least it's a standardized workflow.
Sam Hokin
Computational Scientist, Carnegie and NCGR
samhokin is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:06 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO