SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to prepare a right gff file for DEXSeq dadada4ever Bioinformatics 3 06-11-2013 03:13 AM
DEXSEQ Prepare Annotation File and R output gokhulkrishnakilaru Bioinformatics 14 10-18-2012 01:17 AM
.png file not loading generated by fastx toolkit bioinfo_ Bioinformatics 4 06-14-2012 11:45 AM
loading bam file failure on IGV ykingh Bioinformatics 6 12-30-2011 02:58 PM
Cistrome-Data file loading wanwhoo General 0 01-31-2011 02:18 PM

Reply
 
Thread Tools
Old 11-06-2013, 08:42 AM   #1
onyaw
Junior Member
 
Location: US

Join Date: Nov 2013
Posts: 5
Default DEXseq file loading flattened

Hi, I'm using DEXseq for the first time and successfully created the gff file and .counts files with the py scripts, and sampleTable file from within R, as specified, but am getting the following error when creating an ecs.

i do have all the files in the same wd on my desktop, although the gff and counts files were created on a different machine and moved over.

sampleTable$countFile does read back the correct number of levels and file names

any ideas; is it not recognizing the gff?

thx

onyaw

> ecs <- read.HTSeqCounts( sampleTable$countFile,sampleTable,"C57BL6J_dexseq.gff" )

Error in read.table(x, header = FALSE, stringsAsFactors = FALSE) :
'file' must be a character string or connection
onyaw is offline   Reply With Quote
Old 11-06-2013, 08:47 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

That error is occurring when DEXSeq is trying to read in the counts files. Can you paste in a excerpt of "sampleTable"?
dpryan is offline   Reply With Quote
Old 11-06-2013, 08:52 AM   #3
onyaw
Junior Member
 
Location: US

Join Date: Nov 2013
Posts: 5
Default

Thanks - the table is below. when printing this i did see one small error (an extra comma) which i fixed, but still same issue.

> sampleTable
countFile condition libType
B6J_wt_thal_1 B6J_wt_thal_1.counts B6J wt
B6J_wt_thal_3 B6J_wt_thal_3.counts B6J wt
B6J_wt_thal_4 B6J_wt_thal_4.counts B6J wt
B6J_wt_ssctx_1 B6J_wt_ssctx_1.counts B6J wt
B6J_wt_ssctx_3 B6J_wt_ssctx_3.counts B6J wt
B6J_wt_ssctx_4 B6J_wt_ssctx_4.counts B6J wt
FeJ_wt_thal_2 FeJ_wt_thal_2.counts FeJ wt
FeJ_wt_thal_3 FeJ_wt_thal_3.counts FeJ wt
FeJ_wt_thal_4 FeJ_wt_thal_4.counts FeJ wt
FeJ_wt_ssctx_2 FeJ_wt_ssctx_2.counts FeJ wt
FeJ_wt_ssctx_3 FeJ_wt_ssctx_3.counts FeJ wt
FeJ_wt_ssctx_4 FeJ_wt_ssctx_4.counts FeJ wt
B6J_mut_thal_1 B6J_mut_thal_1.counts B6J mut
B6J_mut_thal_2 B6J_mut_thal_2.counts B6J mut
B6J_mut_thal_3 B6J_mut_thal_3.counts B6J mut
B6J_mut_ssctx_1 B6J_mut_ssctx_1.counts B6J mut
B6J_mut_ssctx_2 B6J_mut_ssctx_2.counts B6J mut
B6J_mut_ssctx_3 B6J_mut_ssctx_3.counts B6J mut
FeJ_mut_thal_1 FeJ_mut_thal_1.counts FeJ mut
FeJ_mut_thal_2 FeJ_mut_thal_2.counts FeJ mut
FeJ_mut_thal_3 FeJ_mut_thal_3.counts FeJ mut
FeJ_mut_ssctx_1 FeJ_mut_ssctx_1.counts FeJ mut
FeJ_mut_ssctx_2 FeJ_mut_ssctx_2.counts FeJ mut
FeJ_mut_ssctx_3 FeJ_mut_ssctx_3.counts FeJ mut
onyaw is offline   Reply With Quote
Old 11-06-2013, 09:01 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

By chance, does
Code:
typeof(sampleTable$countFile)
return something other than "character"? BTW, you don't have to use "condition" and "libType" as column names. You might find "strain" and "genotype" more meaningful
dpryan is offline   Reply With Quote
Old 11-06-2013, 09:03 AM   #5
onyaw
Junior Member
 
Location: US

Join Date: Nov 2013
Posts: 5
Default

well, that returns 'integer' not character.

i realized about the column names (and obviously i have add'l conditions). but in trying to get it to work for me i thought i would be as literal as possible.
onyaw is offline   Reply With Quote
Old 11-06-2013, 09:17 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

At some point you converted your file names to factors, probably by using cbind(). Something like
Code:
sampleTable$countFile <- levels(sampleTable$countFile)[sampleTable$countFile]
should fix that problem. In the future, don't use cbind() to create the sampleTable, but instead:
Code:
sampleTable <- data.frame(countFiles=list.files("counts$"),
    strain=factor(c(rep(c(rep("B6J",6), rep("FEJ",6)),2))),
    genotype=factor(c(rep("WT",12), rep("MUT",12))))
dpryan is offline   Reply With Quote
Old 11-06-2013, 09:26 AM   #7
onyaw
Junior Member
 
Location: US

Join Date: Nov 2013
Posts: 5
Default

thanks - the table was ultimately constructed using literally the example in the pdf file, with names substituted. although originally i made it on my desktop as a csv file.

i ran 'levels' as you suggested - it got further but now i'm getting this:

> ecs <- read.HTSeqCounts( sampleTable$countFile,sampleTable,"C57BL6J_dexseq.gff" )
Error: all(unlist(lapply(design, class)) == "factor") is not TRUE
onyaw is offline   Reply With Quote
Old 11-06-2013, 09:38 AM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I should have mentioned that originally

Code:
 countFiles <- sampleTable$countFile
design <- sampleTable[,-1]
ecs <- read.HTSeqCounts(countFiles,design,"C57BL6J_dexseq.gff" )
or something like that will probably work.
dpryan is offline   Reply With Quote
Old 11-06-2013, 10:48 AM   #9
onyaw
Junior Member
 
Location: US

Join Date: Nov 2013
Posts: 5
Default

Devon, thanks again. I tried that and got the same error. But maybe its because I have two conditions now ("condition" and "libType") and the value in our example was "-1" So I changed it to "-2" and it went without error! So i'll move on to the next steps...wish me smoothness, please!!

btw if I have multiple conditions, but that I want to test separately, do I need to specify the design formula beyond the design specified above? or am i better off making a separate sample table for each 'experiment' just looking at one condition/sample table at a time?
onyaw is offline   Reply With Quote
Old 11-06-2013, 01:33 PM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The -1 just removes the first column (the count file names) and -2 would remove the second (mouse strain), which you probably want to keep. So, I'm a bit surprised that didn't then produce an error (though perhaps I'm incorrectly visualizing the dataframe that you're using).

Anyway, I would recommend that you keep the full design when you do the analyses. Mouse strains have enough behavioral and other differences that, if unaccounted for, will end up killing your statistical power (all of the variances will be larger than need be). You could just remove the samples you don't need, but that will also decrease power. So leaving everything in is your best bet.
dpryan is offline   Reply With Quote
Old 11-10-2013, 11:25 PM   #11
thanhhoang
Member
 
Location: Ohio, USA

Join Date: Jul 2013
Posts: 16
Default

Hi Onyaw, Dpryan and everyone!
I have a similar problem when running read.HTSeqCounts. Could you guys please help me with that?
I counted the 6 SAM files from GSNAP output using dexseq_count.py by following DEXSeq manual, then I made sample table. Here is what I did:
>sampleTable <- data.frame(row.names = c( "E1", "E2", "E3","F1", "F2", "F3" ), countFile = c( "E1.count", "E2.count", "E3.count", "F1.counts","F2.count", "F3.count" ), condition = c( "E", "E", "E",
+ "F", "F", "F" ))
>sampleTable
countFile condition
E1 E1.count E
E2 E2.count E
E3 E3.count E
F1 F1.counts F
F2 F2.count F
F3 F3.count F
>ecs <-read.HTSeqCounts(sampleTable$countFile,sampleTable,"protein_coding_flattened.gff")

Error in read.table(x, header = FALSE, stringsAsFactors = FALSE) :
'file' must be a character string or connection

I really appreciate your help.
Thanh
thanhhoang is offline   Reply With Quote
Old 11-11-2013, 03:52 AM   #12
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I saw your post on biostars first, so I replied there.
dpryan is offline   Reply With Quote
Old 11-11-2013, 06:21 AM   #13
thanhhoang
Member
 
Location: Ohio, USA

Join Date: Jul 2013
Posts: 16
Default

Hi dpryan,
Thank you
I just replied in Biostar. Here is what I just did:
>list.files()
[1] "CITATION" "DESCRIPTION"
[3] "DEXSeq note 11.11.13.odt" "DEXSeq_1.8.0.tar"
[5] "doc" "E1.count"
[7] "E2.count" "E3.count"
[9] "F1.count" "F2.count"
[11] "F3.count" "help"
[13] "html" "INDEX"
[15] "Meta" "NAMESPACE"
[17] "NEWS" "protein_coding_flattened.gff"
[19] "python_scripts" "R"

head -10 E1.count
ENSMUSG00000000001:001 1222
ENSMUSG00000000001:002 75
ENSMUSG00000000001:003 29
ENSMUSG00000000001:004 200
ENSMUSG00000000001:005 61
ENSMUSG00000000001:006 61
ENSMUSG00000000001:007 27
ENSMUSG00000000001:008 36
ENSMUSG00000000001:009 134
ENSMUSG00000000003:001 0

All files seem to be fine for me. I dont know whats going on
thanhhoang is offline   Reply With Quote
Old 11-11-2013, 06:27 AM   #14
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I mentioned this over on biostars too, but the common cause of this (and the one that affected onyaw) is that the file names aren't actually characters. If you used cbind() at some point to create the sampleTable, then these are actually factors now, which won't work very well. If this is the case, I'll try to get the authors to clarify this in the vignette for the next update. If it affects more than one user in a week then it's probably a common issue.
dpryan is offline   Reply With Quote
Old 11-12-2013, 01:05 AM   #15
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

Thanks for poiting this out! It indeed needed to be corrected and clarified in DEXSeq.

I have changed added a change in the function that checks that the count files are all characters. I have also change the vignette to specify a "as.character" for the count files specified in the data.frame, e.g.:

Code:
> ecs <- read.HTSeqCounts(
+ as.character( sampleTable$countFile ),
+ sampleTable,
+ "Dmel_flattenend.gff" )
areyes is offline   Reply With Quote
Old 11-12-2013, 01:27 AM   #16
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Thanks Alejandro, you saved me an email to you! I expect this will save a lot of people some grief!
dpryan is offline   Reply With Quote
Old 11-12-2013, 02:21 AM   #17
nbahlis
Member
 
Location: Canada

Join Date: May 2013
Posts: 25
Default error with DEXeq

Hi Alejandro and dpryan,

I did use as.character and I am getting the following error"
Error: all(unlist(lapply(design, class)) == "factor") is not TRUE

Any help or advice is greatly appreciated. Here's what I am doing:

> Table <- data.frame(
+ row.names = c( "P110", "P124", "P149", "P185", "P189", "P192", "P218", "P227", "P235", "P280", "P308", "P351", "P357", "P367", "P377", "P384", "P426", "P543", "P584", "P590", "P594", "P610" ),
+ countFile = c( "P110.counts", "P124.counts", "P149.counts", "P185.counts", "P189.counts", "P192.counts", "P218.counts", "P227.counts","P235.counts", "P280.counts", "P308.counts", "P351.counts", "P357.counts", "P367.counts", "P377.counts", "P384.counts", "P426.counts", "P543.counts", "P584.counts", "P590.counts", "P594.counts", "P610.counts" ),
+ condition = c( "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "post", "post", "post", "post", "post", "post", "post", "post", "post", "post", "post" ),
+ stringsAsFactors=FALSE)
> Table
countFile condition
P110 P110.counts pre
P124 P124.counts pre
P149 P149.counts pre
P185 P185.counts pre
P189 P189.counts pre
P192 P192.counts pre
P218 P218.counts pre
P227 P227.counts pre
P235 P235.counts pre
P280 P280.counts pre
P308 P308.counts pre
P351 P351.counts post
P357 P357.counts post
P367 P367.counts post
P377 P377.counts post
P384 P384.counts post
P426 P426.counts post
P543 P543.counts post
P584 P584.counts post
P590 P590.counts post
P594 P594.counts post
P610 P610.counts post
> ecs <- read.HTSeqCounts(
+ as.character( Table$countFile ),
+ Table,
+ "GRCh37_E64_1kg.gff" )
Error: all(unlist(lapply(design, class)) == "factor") is not TRUE
> sapply(Table,class)
countFile condition
"character" "character"
>
nbahlis is offline   Reply With Quote
Old 11-12-2013, 02:27 AM   #18
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Try the following instead:
Code:
Table <- data.frame(
    row.names = c( "P110", "P124", "P149", "P185", "P189", "P192", "P218", "P227", "P235", "P280", "P308", "P351", "P357", "P367", "P377", "P384", "P426", "P543", "P584", "P590", "P594", "P610" ),
    countFile = c( "P110.counts", "P124.counts", "P149.counts", "P185.counts", "P189.counts", "P192.counts", "P218.counts", "P227.counts","P235.counts", "P280.counts", "P308.counts", "P351.counts", "P357.counts", "P367.counts", "P377.counts", "P384.counts", "P426.counts", "P543.counts", "P584.counts", "P590.counts", "P594.counts", "P610.counts" ),
    condition = factor(c( "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "post", "post", "post", "post", "post", "post", "post", "post", "post", "post", "post" )),
stringsAsFactors=FALSE)
dpryan is offline   Reply With Quote
Old 11-12-2013, 02:32 AM   #19
nbahlis
Member
 
Location: Canada

Join Date: May 2013
Posts: 25
Default

it worked now! I just didn't have to specify "stringsAsFactor". Without it worked fine. Thank you Alejandro and Ryan for your great work
nbahlis is offline   Reply With Quote
Old 11-13-2013, 10:06 AM   #20
thanhhoang
Member
 
Location: Ohio, USA

Join Date: Jul 2013
Posts: 16
Default

Hi guys,
I followed your instruction and working well now. You saved my day! Thank you very much.
I just have a little issue now. When I ran on BioLinux 16 cores, this error show up
>library("parallel")
>ecs <- estimateDispersions( ecs, nCores=16)
>ecs <- fitDispersionFunction( ecs )
Error in fitDispersionFunction(ecs) :
no CR dispersion estimations found, please first call estimateDispersions function

I searched for solution in some forum and It seems due to old R version.
Could anyone help me to clarify that?
When I ran on my own Mac laptop 4 cores, Its working fine but the function ecs <- testForDEU( ecs, nCores=4) is running really slow.
Thank you very much.
Thanh
thanhhoang is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO