Seqanswers Leaderboard Ad

**areyes** · 10-10-2012, 04:05 AM

Have you checked the size of your first file? Looks like you are replacing your input file with the output file.

Could you please include a reproducible code for your R code? with the output of the sessionInput()? Also, I would not have many hopes in the results without replicates

Alejandro Reyes

**gokhulkrishnakilaru** · 10-10-2012, 04:31 AM

Originally posted by areyes View Post

Have you checked the size of your first file? Looks like you are replacing your input file with the output file.

Could you please include a reproducible code for your R code? with the output of the sessionInput()? Also, I would not have many hopes in the results without replicates

Alejandro Reyes

Hi Alejandro,

Yes, I did check the size of the input file. I am changing the extension. The input file has GTF and the output has GFF as its extension.

My R code is as follows

Code:

library(DEXSeq)
options(digits=3)
setwd("/test/dexseq/")
library(DEXSeq)
rm(list=ls())
annotationfile = file.path("/test/dexseq/Mus_musculus.GRCm38.68.gff")
annotationfile
samples = data.frame(condition = c("WT", "KO"),replicate=c(1,1),row.names=c("WildType", "KnockOut"),stringsAsFactors=TRUE,check.names = FALSE)
samples
fullFilenames<- list.files("/test/dexseq/",full.names=TRUE,pattern="DEXSEQ.txt")
fullFilenames
ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)
head(counts(ecs))
head(fData(ecs))

All I see is NA and the estimate size factor is also giving out NA.

**areyes** · 10-10-2012, 05:45 AM

ups, my bad in the gtf extensions thing...

do your files contain NAs also?

**gokhulkrishnakilaru** · 10-10-2012, 05:52 AM

Originally posted by areyes View Post

ups, my bad in the gtf extensions thing...

does your files contain NAs also?

No. My files have either a value or 0 for nothing. Also, I see another error saying

error in scan(file what nmax sep dec quote skip nlines na.strings line 1 did not have 3 elements

I looked the tail of my counts file and it has got some four lines in the last saying _ambiguous, _lowqual etc.

I deleted those lines and it gives me another error saying

Error in round(countData) : Non-numeric argument to mathematical function

.

Any pointers to these issues. This is my counts file's head

Code:

"ENSMUSG00000000001"    :001	1
"ENSMUSG00000000001"    :002	0
"ENSMUSG00000000001"    :003	0
"ENSMUSG00000000001"    :004	1
"ENSMUSG00000000001"    :005	0
"ENSMUSG00000000001"    :006	0
"ENSMUSG00000000001"    :007	0
"ENSMUSG00000000001"    :008	0

**areyes** · 10-10-2012, 05:58 AM

I see, I think the files you are using as input are causing some problems with the output of our htseq python scripts. I will check what is going on. In the meantime you can reformat your files to look more like this:

Code:

FBgn0000003:001	0
FBgn0000008:001	0
FBgn0000008:002	0
FBgn0000008:003	0
FBgn0000008:004	1
FBgn0000008:005	4
FBgn0000008:006	1
FBgn0000008:007	18
FBgn0000008:008	4
FBgn0000008:009	16

Then it should be fine!

**areyes** · 10-10-2012, 05:59 AM

By the way, where can I download the annotation files you used?

**gokhulkrishnakilaru** · 10-10-2012, 06:00 AM

Originally posted by areyes View Post

I see, I think the files you are using as input are causing some problems with the output of our htseq python scripts. I will check what is going on. In the meantime you can reformat your files to look more like this:

Code:

FBgn0000003:001	0
FBgn0000008:001	0
FBgn0000008:002	0
FBgn0000008:003	0
FBgn0000008:004	1
FBgn0000008:005	4
FBgn0000008:006	1
FBgn0000008:007	18
FBgn0000008:008	4
FBgn0000008:009	16

Then it should be fine!

Perfect. That helps me. Also, what about those last four lines with the underscore sign and a numerical value. Can I delete them?

**gokhulkrishnakilaru** · 10-10-2012, 06:02 AM

Originally posted by areyes View Post

By the way, where can I download the annotation files you used?

ftp://ftp.ensembl.org/pub/release-68/gtf/mus_musculus

That is where I got the one that worked for me.

You can use genome.ucsc.edu and go to tables section. Choose mouse and refseq genes and then refFlat or refGene. Select format to be GTF and if you are successful in preparing the annotations file. Please upload it somewhere or I can invite you to my dropbox. So, that way I have a refseq annotation file.

Thanks for the support, my friend.

**areyes** · 10-10-2012, 06:02 AM

You could, but they are also deleted automatically in the function "read.HTSeqCounts"!

**gokhulkrishnakilaru** · 10-10-2012, 06:18 AM

Originally posted by areyes View Post

You could, but they are also deleted automatically in the function "read.HTSeqCounts"!

Hi Alejandro,

I was successful in making the counts file as you suggested. I ran the script. The following are my errors. Any pointers that could be of help?

Code:

ecs<- estimateSizeFactors(ecs)
> ecs<- estimateDispersions(ecs)
Dispersion estimation. (Progress report: one dot per 100 genes)
Error in FUN(c("ENSMUSG00000000078", "ENSMUSG00000000134", "ENSMUSG00000000182",  : 
  Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified.
In addition: Warning messages:
1: In .local(object, ...) :
  Exons with less than 11 counts will be discarded. For more details read the documentation, parameter minCount
2: In .local(object, ...) :
  Genes with more than 70 testable exons will be kicked out of the analysis. For more details read the documentation, parameter maxExon

I was looking at this link - http://seqanswers.com/forums/archive...p/t-21212.html. Can I delete that line for my case?

**gokhulkrishnakilaru** · 10-17-2012, 07:51 AM

Any thoughts anybody?

Sorry mods, for bumping up posts.

Urgent task. So, had to.

**areyes** · 10-17-2012, 08:22 AM

Hi gokhulkrishnakilaru,

The error talks by its own: "Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified.", you do not have replicates. Sorry that I can not help.

Alejandro

**gokhulkrishnakilaru** · 10-17-2012, 08:33 AM

Originally posted by areyes View Post

Hi gokhulkrishnakilaru,

The error talks by its own: "Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified.", you do not have replicates. Sorry that I can not help.

Alejandro

Thanks Alejandro,

So no dexseq could work without replicates?

Is that the conclusion?

Is there a possibility to change the declaration while specifying the replicates in this section

Code:

samples = data.frame(condition = c("WT", "KO"),replicate=c(1,1),row.names=c("WildType", "KnockOut"),stringsAsFactors=TRUE,check.names = FALSE)

**areyes** · 10-18-2012, 12:17 AM

The motivation of the development of DESeq and DEXSeq is being able to estimate biological variability between replicates, and take this into account to call differentially expressed genes or exons. If you don´t have replicates, you do not know if the changes that you are observing are due to biological variation or due to the differences in your genotypes. In any experiment is crucial to do replicates, this is the only way to guarantee reproducibility on your differential expressed calls. For more details, you could check:

http://www.nature.com/nbt/journal/v29/n7/pdf/nbt.1910.pdf

the discussion of our DEXSeq paper:

Detecting differential usage of exons from RNA-seq data

http://genome.cshlp.org/content/22/10/2008.full.pdf+html

An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

DEXSEQ Prepare Annotation File and R output

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News