![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
DEXSeq Using Counts File From htseq-count | FuzzyCoder | Bioinformatics | 20 | 01-04-2016 12:18 AM |
multiBamCov or htseq-count to count read per feature ? | NicoBxl | Bioinformatics | 1 | 07-03-2012 03:05 AM |
DEXSeq vs htseq-count/DESeq counting model | jdsv | Bioinformatics | 2 | 11-20-2011 08:48 PM |
unable to install 'pasilla' on R | sridhar228 | Bioinformatics | 1 | 10-05-2011 11:30 PM |
Quantification: count reads or count base pairs? | yuelics | Bioinformatics | 0 | 07-27-2011 05:48 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: United States Join Date: Jul 2012
Posts: 3
|
![]()
Hi I'm working with the Pasilla and DEXSeq bioconductor packagse and I'm trying to replicate the example data, but the flattened .txt files processed using the provided dexseq_count.py script is not matching the example data provided in the source package. I think the suporting R documentation may contain an error (???) that I address in the text below. Thanks in advance for any help. I'm hoping to use these examples in order to understand how they are working so I can process my own data eventually, but want to make sure they are functioning properly with the provided example data as well.
Here is what I did: - download copies of the source packages for pasilla and DEXSeq (and unpackaged them to folders in my directory) - copied the python scripts to the directory containing example data for pasilla (/inst/extdata) - went to http://www.embl.de/~reyes/Graveley/bam/ - downloaded the bam files for treated1.bam and treated2.bam (because the source packages didn't seem to have these, but the above link was reference in the R documentation for downloading the BAM files). - used the .gff package already processed in the pasilla package - did the following in terminal #This is for an example for processing a single-read type alignment #creates an indexed bam file $ samtools index treated1.bam #converts the bam file to a sam file $ samtools view treated1.bam > treated1.sam #calls the python script dexseq_count to count the reads in the non-overlapping exotic parts. $ python dexseq_count.py Dmel.BDGP5.25.62.DEXSeq.chr.gff treated1.sam treated1fb_TEST.txt #This is an example for processing a paired-end type alignment #creates an indexed bamfile $ samtools index treated2.bam #converts the nam file to a sam file $ samtools view treated2.bam > treated2.sam #sorts the sam file by position (-k option); and I'm assuming the bam in the tutorial is a typo here because in the R documentation the output is a sorted BAM file, but in the next line it is a sorted SAM file. I did try the leaving it has a bam file as well, but as expected received an error "no such file or directory" $ sort -k 1,1 -k2,2n treated2.sam > treated2_sorted.sam #runs the python script dexseq_count on paired-end data for the processed sam file treated_2_sorted in order to count thee reads in each non-overlapping exonic part. python dexseq_count.py -p yes Dmel.BDGP5.25.62.DEXSeq.chr.gff treated2_sorted.sam treated2fb_TEST.txt - I've provided the first 10 lines of both the example data and the processed output. I also have a second question; since these python scripts are part of the HTSeq library why must the paired-end data be sorted by position and not by read name using samtools (since it was already called twice previously) as is conducted for the htseq-count script? Thanks again output of .txt (1st 10 lines) SINGLE READ: treated1fb_pasilla example.txt FBgn0000003:001 0 FBgn0000008:001 0 FBgn0000008:002 1 FBgn0000008:003 3 FBgn0000008:004 2 FBgn0000008:005 8 FBgn0000008:006 0 FBgn0000008:007 17 FBgn0000008:008 4 FBgn0000008:009 35 treated1fb_my processed output.txt (TEST) FBgn0000003:001 0 FBgn0000008:001 0 FBgn0000008:002 0 FBgn0000008:003 0 FBgn0000008:004 1 FBgn0000008:005 4 FBgn0000008:006 1 FBgn0000008:007 18 FBgn0000008:008 4 FBgn0000008:009 16 PAIRED-END: treated2fb_pasilla example.txt FBgn0000003:001 1 FBgn0000008:001 0 FBgn0000008:002 0 FBgn0000008:003 1 FBgn0000008:004 0 FBgn0000008:005 2 FBgn0000008:006 1 FBgn0000008:007 22 FBgn0000008:008 7 FBgn0000008:009 46 treated2fb_my processed output.txt (TEST) FBgn0000003:001 0 FBgn0000008:001 0 FBgn0000008:002 0 FBgn0000008:003 1 FBgn0000008:004 0 FBgn0000008:005 1 FBgn0000008:006 0 FBgn0000008:007 8 FBgn0000008:008 1 FBgn0000008:009 17 |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: United States Join Date: Jul 2012
Posts: 3
|
![]()
Sorry this posting was already through the Bioconductor mailing lists. In case it helps anyone else here was the response I received (many thanks to Alejandro Reyes):
Dear parrishdb, Thanks for your detailed report! And for pointing out those details. The pipeline you did is correct, at some point I updated the alignment with a most recent version of tophat without updating the count files in the pasilla package, I think the differences come from there, I am updating pasilla now! Regarding the sorting, the paired sam files and python scripts, the reads need to be sorted by read name and pair, e.g: alignment1read1 alignment1read2 alignment2read1 alignment2read2 alignmentnread1 alignmentnread2 Best wishes, Alejandro Reyes |
![]() |
![]() |
![]() |
Thread Tools | |
|
|