DEXSeq-Count & Pasilla

parrishdb

Junior Member

Join Date: Jul 2012

Posts: 3
- Share
- Tweet
#1

DEXSeq-Count & Pasilla

07-11-2012, 01:35 PM

Hi I'm working with the Pasilla and DEXSeq bioconductor packagse and I'm trying to replicate the example data, but the flattened .txt files processed using the provided dexseq_count.py script is not matching the example data provided in the source package. I think the suporting R documentation may contain an error (???) that I address in the text below. Thanks in advance for any help. I'm hoping to use these examples in order to understand how they are working so I can process my own data eventually, but want to make sure they are functioning properly with the provided example data as well.

Here is what I did:

- download copies of the source packages for pasilla and DEXSeq (and unpackaged them to folders in my directory)
- copied the python scripts to the directory containing example data for pasilla (/inst/extdata)
- went to http://www.embl.de/~reyes/Graveley/bam/
- downloaded the bam files for treated1.bam and treated2.bam (because the source packages didn't seem to have these, but the above link was reference in the R documentation for downloading the BAM files).
- used the .gff package already processed in the pasilla package
- did the following in terminal

#This is for an example for processing a single-read type alignment

#creates an indexed bam file

$ samtools index treated1.bam

#converts the bam file to a sam file

$ samtools view treated1.bam > treated1.sam

#calls the python script dexseq_count to count the reads in the non-overlapping exotic parts.

$ python dexseq_count.py Dmel.BDGP5.25.62.DEXSeq.chr.gff treated1.sam treated1fb_TEST.txt

#This is an example for processing a paired-end type alignment

#creates an indexed bamfile

$ samtools index treated2.bam

#converts the nam file to a sam file

$ samtools view treated2.bam > treated2.sam

#sorts the sam file by position (-k option); and I'm assuming the bam in the tutorial is a typo here because in the R documentation the output is a sorted BAM file, but in the next line it is a sorted SAM file. I did try the leaving it has a bam file as well, but as expected received an error "no such file or directory"

$ sort -k 1,1 -k2,2n treated2.sam > treated2_sorted.sam

#runs the python script dexseq_count on paired-end data for the processed sam file treated_2_sorted in order to count thee reads in each non-overlapping exonic part.

python dexseq_count.py -p yes Dmel.BDGP5.25.62.DEXSeq.chr.gff treated2_sorted.sam treated2fb_TEST.txt

- I've provided the first 10 lines of both the example data and the processed output.

I also have a second question; since these python scripts are part of the HTSeq library why must the paired-end data be sorted by position and not by read name using samtools (since it was already called twice previously) as is conducted for the htseq-count script?

Thanks again

output of .txt (1st 10 lines)

SINGLE READ:
treated1fb_pasilla example.txt
FBgn0000003:001 0
FBgn0000008:001 0
FBgn0000008:002 1
FBgn0000008:003 3
FBgn0000008:004 2
FBgn0000008:005 8
FBgn0000008:006 0
FBgn0000008:007 17
FBgn0000008:008 4
FBgn0000008:009 35

treated1fb_my processed output.txt (TEST)
FBgn0000003:001 0
FBgn0000008:001 0
FBgn0000008:002 0
FBgn0000008:003 0
FBgn0000008:004 1
FBgn0000008:005 4
FBgn0000008:006 1
FBgn0000008:007 18
FBgn0000008:008 4
FBgn0000008:009 16

PAIRED-END:
treated2fb_pasilla example.txt
FBgn0000003:001 1
FBgn0000008:001 0
FBgn0000008:002 0
FBgn0000008:003 1
FBgn0000008:004 0
FBgn0000008:005 2
FBgn0000008:006 1
FBgn0000008:007 22
FBgn0000008:008 7
FBgn0000008:009 46

treated2fb_my processed output.txt (TEST)
FBgn0000003:001 0
FBgn0000008:001 0
FBgn0000008:002 0
FBgn0000008:003 1
FBgn0000008:004 0
FBgn0000008:005 1
FBgn0000008:006 0
FBgn0000008:007 8
FBgn0000008:008 1
FBgn0000008:009 17
Tags: None
parrishdb

Junior Member

Join Date: Jul 2012

Posts: 3
- Share
- Tweet
#2

07-13-2012, 04:34 PM

Sorry this posting was already through the Bioconductor mailing lists. In case it helps anyone else here was the response I received (many thanks to Alejandro Reyes):

Dear parrishdb,

Thanks for your detailed report! And for pointing out those details. The pipeline you did is correct, at some point I updated the alignment with a most recent version of tophat without updating the count files in the pasilla package, I think the differences come from there, I am updating pasilla now!

Regarding the sorting, the paired sam files and python scripts, the reads need to be sorted by read name and pair, e.g:

alignment1read1
alignment1read2
alignment2read1
alignment2read2
alignmentnread1
alignmentnread2

Best wishes,
Alejandro Reyes
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

DEXSeq-Count & Pasilla

Comment

Latest Articles

ad_right_rmr

News