The authors did not upload any data in the Series Matrix or SOFT files. Instead, they uploaded four BED files. After spending probably way too much time trying to figure out how to extract matches against the TAIR10 genome, I finally downloaded the latest bedtools2 from github, and lo and behold it has a nice coverage sub-command that works with these files. I've checked the first output number, # features in sample file that overlap the interval in the genome file, and it pans out for some genes I know. The other three outputs are: # bases in genome file that had non-zero coverage; length of entry in genome file; fraction of bases in genome file that had non-zero coverage.
SO, I'm tempted to use the first number, # features that overlap, as my read counts to do the usual further analysis with DESeq2 (normaliztion, DE analysis). But are there some things I should look out for from bedtools coverage output, like, say, if the fraction of bases in the genome file that were not covered is large, for example?
The command I used is, for example:
bedtools coverage -a GSM845432_F1DPI_TAIR10.bed -b TAIR10_GFF3_genes.gff > GSM845432_F1DPI_TAIR10.txt
Thanks for any tips! This is a learning exercise, the RNA-Seq data that my lab generated was read-mapped by DNANexus and I'm hoping that they know what they're doing; at least it's a standardized workflow.
![Smile](https://www.seqanswers.com/core/images/smilies/smile.png)