SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Expression quantification/differential expression gene analysis by RNA-Seq chenjy Bioinformatics 12 08-02-2013 03:06 AM
ELAND RNA for gene expression (error-rate) bioinfosm Bioinformatics 2 06-25-2009 01:44 PM

Reply
 
Thread Tools
Old 10-10-2013, 08:30 AM   #1
Tomnl
Junior Member
 
Location: Leicester

Join Date: Jun 2013
Posts: 6
Default rDiff - error while getting gene expression

Hi everybody,

I was wondering if anybody could help with an error message I am receiving for the differential isoform analysis software rDiff.

http://cbio.mskcc.org/public/raetsch...r/drewe/rdiff/

The error message is:
Getting gene expression for: /NGS/users/Thomas/rDiff//wt_7.bam
error: convert_reads_to_region_indicators: A(I,J): column index out of bounds; value 11054 out of bound 7483
error: called from:
error: /NGS/Software/rDiff-master/src/tools/convert_reads_to_region_indicators.m at line 16, column 19
error: /NGS/Software/rDiff-master/src/get_reads_caller.m at line 71, column 48
error: /NGS/Software/rDiff-master/src/get_read_counts.m at line 32, column 17
error: /NGS/Software/rDiff-master/src/rdiff.m at line 38, column 5



Any help would be greatly appreciated.

Additional info:

The command given was as follows:
./rdiff -o output/ -d files/ -a wt_7.bam,wt_8.bam,wt_203.bam -b mut_2.bam,mut_201.bam,mut_204.bam -g genes_mm10.gff3 -m param -L 51 -m 30

The same error occurs when using both param and non param.

The bam files were generated by TopHat.

The .gff3 file was generated by converting the .gtf file provided by TopHat http://tophat.cbcb.umd.edu/igenomes.shtml for mus musculus NCBI.
Tomnl is offline   Reply With Quote
Old 10-10-2013, 03:06 PM   #2
philippd
Junior Member
 
Location: New York

Join Date: Oct 2012
Posts: 4
Default

The command seems to be right although the path to the bam-file seems strange. Is the bam-file located at: /NGS/users/Thomas/rDiff//wt_7.bam ?

Could you maybe also send me the complete output of the rDiff run as well as the first 1000 lines of your gff3-file( or the part where you believe that the problem is)?
philippd is offline   Reply With Quote
Old 10-11-2013, 12:45 AM   #3
Tomnl
Junior Member
 
Location: Leicester

Join Date: Jun 2013
Posts: 6
Default

Hi philippd,

Thank you for the quick response.

I have attached a text file of the output, the first 1000 lines of the GFF3 file and the first 1000 lines of one of the BAM files. I have also attached the output of the first example (used in the make example command) in case that may be of any use.

I note that on the example BAM files there are no qual and sequence strings. Do the BAM files need to be processed in any specific way?

The BAM location /NGS/users/Thomas/rDiff//wt_7.bam is correct. Except for the // should be a /

The command I showed previously was shortened. The full command is shown below:
/NGS/Software/rDiff-master/bin/rdiff -o /NGS/users/Thomas/rDiff/output/ -d /NGS/users/Thomas/rDiff/ -a wt_7.bam,wt_8.bam,wt_203.bam -b mut_2.bam,mut_201.bam,mut_204.bam -g /NGS/users/Thomas/Transcripts/genes_mm10.gff3 -m param -L 51 -m 30


I look forward to hearing your reply.

Kind regards

Tom
Attached Files
File Type: gz rDiff_troubleshoot.tar.gz (58.0 KB, 1 views)
Tomnl is offline   Reply With Quote
Old 10-11-2013, 01:34 PM   #4
philippd
Junior Member
 
Location: New York

Join Date: Oct 2012
Posts: 4
Default

Hi Tom,

I think that your GFF3-file is not formatted correctly. I saw that sometimes the exon and mRNA coordinates lie outside the gene coordinates( e.g for some genes the exons end after the gene), which should normally not happen.
What you could to is to either download a GFF3-file where this is not the case or replace for each gene the start and the stop with the smallest resp. largest exon position.

Kind regards,
Philipp
philippd is offline   Reply With Quote
Old 10-15-2013, 04:12 AM   #5
Tomnl
Junior Member
 
Location: Leicester

Join Date: Jun 2013
Posts: 6
Default

Hi Philipp

Thanks again for your reply. I tried with a number of different GFF3 files and using a number of different GTF2/GFF3 converters see below... but still no luck.

Would you recommend any specific GFF3/GTF files for the mm10 mouse genome?

I have used the following GFF3 files:

ftp://ftp.ncbi.nlm.nih.gov/genomes/M..._level.gff3.gz

ftp://ftp.ncbi.nlm.nih.gov/genomes/M...ffolds.gff3.gz


I have used the following GTF2/GFF3 converters:

The GFF toolkit from the mskcc galaxy webserver linked from the rDiff website https://galaxy.cbio.mskcc.org/

The python script which comes with SpliceGrapher-0.2.2 (gtf2gff.py)

The gffread tool which comes with cufflinks

Converter tools used with the following GTF files:

NCBI gtf file provided by cufflinks http://cufflinks.cbcb.umd.edu/igenomes.html
ensembl genes downloaded fro UCSC http://genome.ucsc.edu/cgi-bin/hgTab...mblGenes.fasta

I have attached a file which contains some of the error codes associated with some of the attempts I have made.

I have tried to avoid having to edit the GFF and replace the start and the stop location for each gene with the smallest resp. largest exon position. As it seems that it indicates that the GFF file is not correct. Although if there is no other option that is what I will do.

I should note that when I use the GFF toolkit conversion tool kit I always get exon and mRNA coordinates which lie outside the gene coordinate. When I use the gffread conversion tool I get the following rDiff error "child may be mapped to multiple parents ex: Parent=AT01,AT01-1-Protein."

Kind regards

Tom
Attached Files
File Type: txt rDiff_error_logs.txt (7.3 KB, 6 views)
Tomnl is offline   Reply With Quote
Old 10-22-2013, 01:30 PM   #6
vipints
Junior Member
 
Location: Germany

Join Date: Dec 2009
Posts: 1
Default

Hello Tom,

Can you please please post first 5 lines (uncommented) from the GTF/GFF file.

Thanks, Vipin
vipints is offline   Reply With Quote
Old 10-29-2013, 10:56 AM   #7
Tomnl
Junior Member
 
Location: Leicester

Join Date: Jun 2013
Posts: 6
Default

Hi Vipin

Sorry for the delay in replying. I have attached a file of the first few lines of the GTF/GFF files I have used.

I should mention: I have managed to get the program to run successfully using test files of very limited size (~50 KB a BAM file). This was using the ensembl GTF downloaded from UCSC and then GTF/GFF conversion using GFF converter.

When I attempt with larger files e.g over 2 GB a BAM file. I get the following error message:
error: memory exhausted or requested size too large for range of Octave's index type -- eval failed

Best regards
Attached Files
File Type: txt gtf_gff.txt (5.4 KB, 5 views)
Tomnl is offline   Reply With Quote
Reply

Tags
differential analysis, differential splicing, isoforms, rdiff, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO