Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting mapped read count from a BED file - bedtools coverage?

    Hi, all. I've been trying to analyze an experiment that I downloaded from GEO, GSE34241, which has four samples assayed with RNA-Seq (AB SOLiD System). Apart from being interested in some gene expression in this experiment, I'm using it as a tutorial for dealing with new file formats (which never ends).

    The authors did not upload any data in the Series Matrix or SOFT files. Instead, they uploaded four BED files. After spending probably way too much time trying to figure out how to extract matches against the TAIR10 genome, I finally downloaded the latest bedtools2 from github, and lo and behold it has a nice coverage sub-command that works with these files. I've checked the first output number, # features in sample file that overlap the interval in the genome file, and it pans out for some genes I know. The other three outputs are: # bases in genome file that had non-zero coverage; length of entry in genome file; fraction of bases in genome file that had non-zero coverage.

    SO, I'm tempted to use the first number, # features that overlap, as my read counts to do the usual further analysis with DESeq2 (normaliztion, DE analysis). But are there some things I should look out for from bedtools coverage output, like, say, if the fraction of bases in the genome file that were not covered is large, for example?

    The command I used is, for example:

    bedtools coverage -a GSM845432_F1DPI_TAIR10.bed -b TAIR10_GFF3_genes.gff > GSM845432_F1DPI_TAIR10.txt

    Thanks for any tips! This is a learning exercise, the RNA-Seq data that my lab generated was read-mapped by DNANexus and I'm hoping that they know what they're doing; at least it's a standardized workflow.
    Sam Hokin
    Computational Scientist, Carnegie and NCGR

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:47 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X