SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
fail to download genome sequence indexes deuterium Bioinformatics 5 08-21-2014 01:04 AM
Determine ethnicity from exome sequence rbagnall Bioinformatics 1 11-18-2013 05:59 PM
Sequence fail using PCR clean up instead of gel extraction. curiosity Sample Prep / Library Generation 3 02-06-2013 02:20 AM
tool to determine sequence divergence based on snp density? odoyle81 Bioinformatics 0 03-28-2012 12:46 PM
How to determine 454 paired end adaptor sequence edge 454 Pyrosequencing 10 10-01-2009 01:23 AM

Reply
 
Thread Tools
Old 09-09-2014, 01:18 AM   #1
SS00
PhD Student
 
Location: Netherlands

Join Date: Jun 2012
Posts: 32
Angry [bam_parse_region] fail to determine the sequence name

I'm trying to run differential expression using EdgeR and/or DEseq2 on the Ratsch Lab galaxy server but I keep getting this error:



[bam_parse_region] fail to determine the sequence name


I mapped the groomed and filtered FastQ files with Tophat2 using the mm10 reference and did DE expression with the Tophat2 bam files and the UCSC genes.gtf files.



Anyone know what could be the problem?
SS00 is offline   Reply With Quote
Old 09-09-2014, 01:33 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

At what point did it give you that error? There's actually no reason that that function within samtools should ever be used for computing differential expression (at least none that I can think of).
dpryan is offline   Reply With Quote
Old 09-09-2014, 02:33 AM   #3
SS00
PhD Student
 
Location: Netherlands

Join Date: Jun 2012
Posts: 32
Default

So from the log of EdgeR (but also DESeq2) I can see that it gets up to three steps:

% 1. Data preparation %

% 2. Read counting %

% 3. Differential testing %

and then it gives an error at step 3.

In addition to the [bam_parse_region] error as follows:

[bam_parse_region] fail to determine the sequence name.
Invalid region chrY_random:54420149-54423069
R script execution failed

It also says the following at the end of the log file:

Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
Calls: rownames<- -> row.names<- -> row.names<-.data.frame
In addition: Warning message:
non-unique values when setting 'row.names': ‘4933409K07Rik’, ‘A430089I19Rik’, ‘AU018829’, ‘AY761185’, ‘BC002163’, ‘BC061212’, ‘Ccl21b’, ‘Ccl21c’, ‘Cngb1’, ‘E330014E10Rik’, ‘Gm10591’, ‘Gm13298’, ‘Gm13304’, ‘Gm13308’, ‘Gm15056’, ‘Gm15085’, ‘Gm15093’, ‘Gm16367’, ‘Gm1993’, ‘Gm3286’, ‘Gm5506’, ‘Gm5512’, ‘Gm5643’, ‘Gm5801’, ‘Gm6040’, ‘Gm6367’, ‘Mir138-2’, ‘Mir1906-1’, ‘Mir1906-2’, ‘Mir684-1’, ‘Obox2’, ‘Ott’, ‘Snord58b’, ‘Sult1c1’, ‘Tagap’, ‘Tff1’, ‘Tsnax’, ‘Ube1y1’, ‘Vmn1r186’, ‘Vmn1r187’, ‘Vmn1r62’, ‘Vmn1r63’
Execution halted

It's driving me crazy.

Thanks for your help!

Quote:
Originally Posted by dpryan View Post
At what point did it give you that error? There's actually no reason that that function within samtools should ever be used for computing differential expression (at least none that I can think of).
SS00 is offline   Reply With Quote
Old 09-09-2014, 02:35 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

Did you write that galaxy pipeline or did someone else?
dpryan is offline   Reply With Quote
Old 09-09-2014, 02:39 AM   #5
SS00
PhD Student
 
Location: Netherlands

Join Date: Jun 2012
Posts: 32
Default

Someone else. I use it on https://galaxy.cbio.mskcc.org/ under Differential/Quantitative Analysis.

I used Tophat2 on the same server, downloaded igenomes UCSC genes.gtf and ran DESeq2 and EdgeR with the generated bam files and the downloaded gtf.
SS00 is offline   Reply With Quote
Old 09-09-2014, 02:51 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

Short answer: Whoever wrote that pipeline didn't know what they were doing.

Longer answer: It looks like you aligned to a genome that lacked chrY_random, whereas the annotation file has that. That's what leads to the "[bam_parse_region]" error. Having said that, that shouldn't occur because I expect this pipeline is performing the counting incorrectly. This is also what's leading to the error at the end when it's trying to create a dataframe. Basically, don't use that pipeline. You have three options for moving forward. (1) Use a different galaxy pipeline. You already have the BAM files, so this is probably doable. (2) Download the aforementioned BAM files and do things correctly locally (this would require you to know how to analyze your dataset). (3) Collaborate with a local bioinformatician. This is the best idea, since if you're using galaxy you're probably new to this. Galaxy is convenient, but if you're new to things then it's a black box that just spits out results that may, or may not, be correct (and you'd likely have no way of knowing which).
dpryan is offline   Reply With Quote
Old 09-09-2014, 02:58 AM   #7
SS00
PhD Student
 
Location: Netherlands

Join Date: Jun 2012
Posts: 32
Default

Thank you! That makes sense.

Yeah, I really have no idea of analysis so I agree my best bet is to work with one of our bioinformaticians and get it sorted.
SS00 is offline   Reply With Quote
Reply

Tags
error, galaxy, galaxy deseq, tophat2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:00 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO