![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Raw read counts for RNAseq | biofreak | Introductions | 13 | 01-16-2013 06:28 AM |
How can one get raw read counts from RPKM values | gen2prot | General | 6 | 06-24-2011 12:08 PM |
Raw read counts for RNAseq | biofreak | RNA Sequencing | 2 | 06-15-2011 06:56 AM |
miRNA read counts regarding DE | vebaev | RNA Sequencing | 16 | 05-19-2011 12:52 PM |
DESeq: Read counts vs. BP counts | burkard | Bioinformatics | 0 | 08-06-2010 12:52 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
Hi everyone,
I'm using Illumina RNAseq to test for differential expression between two conditions. I have successfully mapped reads with tophat/bowtie, and used cufflinks/cuffcompare to test for differential expression. A number of methods (edgeR, DEGseq, DESseq), however, ask for raw read counts rather FPKM. Is there an easy way to go from tophat output (say the bedgraph or .sam) to raw read counts for each transcript (say using a gtf that I would supply)? Thanks! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Charlottesville Join Date: Sep 2008
Posts: 119
|
![]()
If I understand your question correctly, BEDTools has a utility called coverageBed that will compute the raw counts of reads in a BAM file among features in a BED or GTF file.
For example: Code:
$ coverageBed -abam alignedReads.bam -b transcripts.gtf > transcript.cov.txt |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Here is the script that I made for this purpose: http://www-huber.embl.de/users/ander...doc/count.html
Simon |
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: USA Join Date: Jan 2008
Posts: 482
|
![]()
Yeah, I have used bedtools for similar analysis and it is quite intuitive
Quote:
|
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: california Join Date: Jul 2009
Posts: 24
|
![]() Quote:
/usr/lib64/python2.4/distutils/dist.py:236: UserWarning: Unknown distribution option: 'requires' warnings.warn(msg) However, when I ran "python setup.py install --prefix=...", I got the error message byte-compiling $HOME/librarys/lib64/python2.4/site-packages/HTSeq/StepVector.py to StepVector.pyc File $HOME/librarys/lib64/python2.4/site-packages/HTSeq/StepVector.py", line 390 start = index.start if index.start is not None else self.start_index() ^ SyntaxError: invalid syntax The reason seems to that "Conditional expressions" is not supported in Python2.4, how can I change the codes without upgrading to Python 2.5 Thanks Ying Last edited by Auction; 04-09-2010 at 01:46 PM. |
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
When I started developing HTSeq, I had to make a decision on which Python version to require. I refrained from using any Python 2.6 features as many people still work with 2.5, but I didn't expect many 2.4 installations to be still around.
Python 2.5 brought a few nifty feature but I think I didn't use too many of them, and most are only syntactic sugar anyway. I guess that not too many changes would be needed to make it run on 2.4 But wouldn't it be much easier if you convinced your cluster admin to update the system? (Or install it yourself in your home directory: This takes only a couple of minute and Python typically builds from source out of the box.) Python 2.5 was released 4 years ago, and by now, the release version is 2.6.5, and 2.7 is in alpha. You shouldn't need to waste your time to get software to run in such an out-of-date system. |
![]() |
![]() |
![]() |
#7 | |
Member
Location: Ames, IA Join Date: Nov 2008
Posts: 57
|
![]() Quote:
You would be surprised to know that one of the clusters I use needs Python 2.3!! ![]() Siva |
|
![]() |
![]() |
![]() |
#8 | |
Member
Location: california Join Date: Jul 2009
Posts: 24
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 | |
Member
Location: california Join Date: Jul 2009
Posts: 24
|
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: Charlottesville Join Date: Sep 2008
Posts: 119
|
![]() Quote:
You are correct, coverageBed processes every BAM entry in the BAM file provided. The rationale here is that one can use samtools to filter your data before passing to coverageBed. For example, to only count proper pairs, do the following (passing the output of samtools to coverageBed via a pipe, hence the "stdin" for the -abam parameter: Code:
$ samtools -f 0x2 <BAM> | coverageBed -abam stdin -b transcripts.bed > transcripts.proper.cov |
|
![]() |
![]() |
![]() |
#11 | |
Member
Location: california Join Date: Jul 2009
Posts: 24
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#12 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
Hi Simon,
Sorry to trouble you with with what is surely a silly question. I've just installed HT-Seq from source on my Mac. I've just tried to run a couple of basic commands to see if things were working: python -m HTSeq.scripts.qa s_3_sequence.txt but I'm getting an error message: Code:
Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/runpy.py", line 85, in run_module loader = get_loader(mod_name) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pkgutil.py", line 456, in get_loader return find_loader(fullname) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pkgutil.py", line 466, in find_loader for importer in iter_importers(fullname): File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pkgutil.py", line 422, in iter_importers __import__(pkg) ImportError: No module named HTSeq.scripts Any idea what I've done wrong? |
![]() |
![]() |
![]() |
#13 | |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]() Quote:
Once you have unpacked the HTSeq tarball, you should run the commands Code:
python setup.py build python setup.py install If this does not work, use only the first command. It makes a subdirectory 'build', and in there something like 'lib.macos' (or called similarly), and in there, 'HTSeq'. You can just tell Python to search for HTSeq there by writing Code:
export PYTHONPATH=$PYTHONPATH:/<the_path_to_your_directory/build/lib.macos/ Do not run the script in the directory that you have just unpacked, go somewhere else, as Python otherwise tends to get confused. |
|
![]() |
![]() |
![]() |
#14 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
Thanks Simon,
THIngs seem to be working... Will be try DESeq as soon as I get these counts finalized. One question: for the counting script, you ask for a gff, and elsewhere you use gtf. I've noticed in tophat as well that is some cases gffs are requested (or output) and sometimes gtf. Is there are particular aspect of gff that makes it more suitable for this counting algorithm? just curious... |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Hi Chris,
GTF is just a tightening of the GFF specification, i.e., every GTF file is also a GFF file. The tightening is mainly with respect to the "feature type" column: in GTF, it has to be one of "exon", "CDS", "stop_codon", etc., while in GFF, it is not clearly said which words should be used. In the htseq-count script, I give the user the option to tell the script how the features are named, and hence, it works with GFF as well. See here: GFF: http://www.sanger.ac.uk/resources/so.../gff/spec.html GTF: http://mblab.wustl.edu/GTF22.html Simon Last edited by Simon Anders; 04-15-2010 at 10:11 AM. Reason: add links |
![]() |
![]() |
![]() |
#16 | |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
Hi Simon,
Can I ask you a question that if I have paired end reads who are overlapping with each other, for a overlapped region, does HTSeq count it as single coverages if reads are from the same fragment (but different ends) or count it as covered twice? Thanks! Quote:
|
|
![]() |
![]() |
![]() |
#17 | |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]() Quote:
At the moment, however, HTSeq does not support paired-end reads at all. I'll add this functionality very soon, hopefully next week. Your data would be very useful as test data. Could you maybe send me your SAM file (or parts of it)? By the way, last time I tried, Bowtie was unable to deal with overlapping ends from the same fragment and did not align these at all. Have you used another software, or did they improve that? Cheers Simon |
|
![]() |
![]() |
![]() |
#18 | |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
Hi Simon,
Thanks for your reply. I ended up using tophat and it has no problem aligning overlapping reads by specifying a negative -r as suggested by Cole Trapnell. I attached an example SAM file below. It is from a highly expressed gene on bovine chromosome one (ATPO). The Ensembl annotation is as below. I am very eager to see what you get. Thanks a lot. chr1 protein_coding exon 720699 720764 . + . gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "1"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; chr1 protein_coding CDS 720729 720764 . + 0 gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "1"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; protein_id "ENSBTAP00000024326"; chr1 protein_coding start_codon 720729 720731 . + 0 gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "1"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; chr1 protein_coding exon 722101 722151 . + . gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "2"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; chr1 protein_coding CDS 722101 722151 . + 0 gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "2"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; protein_id "ENSBTAP00000024326"; chr1 protein_coding exon 723098 723208 . + . gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "3"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; chr1 protein_coding CDS 723098 723208 . + 0 gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "3"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; protein_id "ENSBTAP00000024326"; chr1 protein_coding exon 725362 725491 . + . gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "4"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; chr1 protein_coding CDS 725362 725491 . + 0 gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "4"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; protein_id "ENSBTAP00000024326"; chr1 protein_coding exon 726174 726286 . + . gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "5"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; chr1 protein_coding CDS 726174 726286 . + 2 gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "5"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; protein_id "ENSBTAP00000024326"; chr1 protein_coding exon 727512 727598 . + . gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "6"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; chr1 protein_coding CDS 727512 727598 . + 0 gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "6"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; protein_id "ENSBTAP00000024326"; chr1 protein_coding exon 727876 728057 . + . gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "7"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; chr1 protein_coding CDS 727876 727986 . + 0 gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "7"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; protein_id "ENSBTAP00000024326"; chr1 protein_coding stop_codon 727987 727989 . + 0 gene_id "ENSBTAG00000018278"; transcript_id "ENSBTAT00000024326"; exon_number "7"; gene_name "ATPO_BOVIN"; transcript_name "ATPO_BOVIN"; Quote:
|
|
![]() |
![]() |
![]() |
#19 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Hi wenhuang
I've now added paired-end support for the SAM fomat to HTSeq and htseq-count. It would be great if you could download the new version 0.4.1 of HTSeq, try to do your counting job with it and tell me whether it worked correctly. Cheers Simon |
![]() |
![]() |
![]() |
#20 |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
Hi Simon,
Thank you for your work. Unfortunately, although I am able to get the yeast data provided on your website to work with htseq-count, I am not able to get my own data to work. The SAM file I used is two posts ago and I attach the GTF file to this post. The error message I got was as below. Again, thank you for your help. Traceback (most recent call last): File "/usr/local/bin/htseq-count", line 5, in <module> pkg_resources.run_script('HTSeq==0.4.1', 'htseq-count') File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/pkg_resources.py", line 489, in run_script File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/pkg_resources.py", line 1207, in run_script path = self.module_path File "/Library/Python/2.5/site-packages/HTSeq-0.4.1-py2.5-macosx-10.5-i386.egg/EGG-INFO/scripts/htseq-count", line 5, in <module> HTSeq.scripts.count.main() File "/Library/Python/2.5/site-packages/HTSeq-0.4.1-py2.5-macosx-10.5-i386.egg/HTSeq/scripts/count.py", line 156, in main opts.mode, opts.featuretype, opts.idattr, opts.quiet ) File "/Library/Python/2.5/site-packages/HTSeq-0.4.1-py2.5-macosx-10.5-i386.egg/HTSeq/scripts/count.py", line 68, in count_reads_in_features if iv.chrom not in features.step_vectors: AttributeError: 'NoneType' object has no attribute 'chrom' |
![]() |
![]() |
![]() |
Tags |
cufflinks, degseq, deseq |
Thread Tools | |
|
|