SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Cuffmerge error: loading reference annotation failed? (http://seqanswers.com/forums/showthread.php?t=19435)

Kcornelius 04-23-2012 09:45 AM

Cuffmerge error: loading reference annotation failed?
 
Hi,

I am trying to run cuffmerge for quite some time now but so far I always get the same message:

Code:

[Thu Apr 19 18:41:47 2012] Beginning transcriptome assembly merge
-------------------------------------------

[Thu Apr 19 18:41:47 2012] Preparing output location ./merged_asm/
[Thu Apr 19 18:41:55 2012] Converting GTF files to SAM
[18:41:55] Loading reference annotation.
[18:42:01] Loading reference annotation.
[Thu Apr 19 18:42:13 2012] Quantitating transcripts
You are using Cufflinks v1.3.0, which is the most recent release.
Command line:
cufflinks -o ./merged_asm/ -F 0.05 -g Homo_sapiens.GRCh37.62.filter1.sorted.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 12 ./merged_asm/tmp/mergeSam_file54ILK4
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File ./merged_asm/tmp/mergeSam_file54ILK4 doesn't appear to be a valid BAM file, trying SAM...
[18:42:13] Loading reference annotation.
        [FAILED]
Error: could not execute cufflinks

So, reading in the forum here I applied two strategies to try to solve the error:

I applied the change in the python script for the header_for_chrom_info function as mentioned by damiankao in another threat:
http://seqanswers.com/forums/showthread.php?t=11694

Furthermore, I sorted the reference GTF, so that they have all the same order as the merged SAM file.

I followed so far the tuxedo protocol given by Cole Trapnell et al., my version is:

cufflinks-1.3.0.

I guess that I am still missing something but I wonder if there is a script now that works for reference gtf files ( In my case Homo_sapiens.GRCh37.62.gtf where I changed the Chromosome IDs to chr1, chr 2, ... chrM, chrX, chrY.

Here is the header of the mergeSam file:

Code:

@HD    VN:1.0  SO:coordinate
@SQ    SN:chr1 LN:    249231242
@SQ    SN:chr10        LN:    135516024
@SQ    SN:chr11        LN:    134945793
@SQ    SN:chr12        LN:    133815135
@SQ    SN:chr13        LN:    115099423
@SQ    SN:chr14        LN:    107288019
@SQ    SN:chr15        LN:    102519298
@SQ    SN:chr16        LN:    90237391
@SQ    SN:chr17        LN:    81188573
@SQ    SN:chr18        LN:    78005429
@SQ    SN:chr19        LN:    59111168
@SQ    SN:chr2 LN:    243102304
@SQ    SN:chr20        LN:    62959443
@SQ    SN:chr21        LN:    48111157
@SQ    SN:chr22        LN:    51239737
@SQ    SN:chr3 LN:    197955247
@SQ    SN:chr4 LN:    191013476
@SQ    SN:chr5 LN:    180899431
@SQ    SN:chr6 LN:    171055065
@SQ    SN:chr7 LN:    159026067
@SQ    SN:chr8 LN:    146281416
@SQ    SN:chr9 LN:    141150148
@SQ    SN:chrM LN:    16023
@SQ    SN:chrX LN:    155257848
@SQ    SN:chrY LN:    59002251
@PG    ID:cuffmerge    VN:1.0.0

So, I wonder if it is possible, given a bowtie index with chr1, chr2. etc. entries instead of redoing the bowtie index with the ensemble identifiers. Would be great if someone would know the missing link;)

Marc

Kcornelius 04-23-2012 11:02 AM

no luck with cuffdiff either
 
So, when using cuffdiff and say :

cuffdiff -N -o diff_out -p 1 -L C1,C2 -u Homo_sapiens.GRCh37.62.gtf ~/s_1_sequence_gtf_tophat_accepted_hits.bam ~/s_2_sequence_gtf_tophat_accepted_hits.bam

I got the error:
[11:29:30] Loading reference annotation.
/opt/sge-6_2u4/default/spool/sunnode30/job_scripts/86547: line 12: 16032 Segmentation fault (core dumped) cuffdiff -N -o diff_out -p $NSLOTS -L C1,C2 -u Homo_sapiens.GRCh37.62.filter1.sorted3.gtf ~/s_1_sequence_gtf_tophat_accepted_hits.bam ~/s_2_sequence_gtf_tophat_accepted_hits.bam

For some reasons, the problem always points down to my reference. Funny enough, when I leave out chrX chrY and chrM, it is running beyond the Loading reference step and probably cuffmerge would as well.
Could anyone provide me a reference for human, which worked, given that your bowtie index is:
bowtie-inspect hg19 --names
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
chrY
chrM

Marc

Kcornelius 04-23-2012 03:35 PM

ok, the cuttdiff part seems to run now for 2 sets, one having a gtf with only chrX chrY and chrM. The sorting with one gtf file was done according to the header file I got for cuffmerge but is not running? So what could go wrong...

Given the ensembl reference, I first substitute the chromosome IDs so that they correspond to the bowtie-index. Then I sort out each chromosome:

Code:

for i in 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 3 4 5 6 7 8 9 M X Y;
do
grep "^chr$i[[:blank:]]" Homo_sapiens.GRCh37.62.filter1.sorted.gtf > chr${i}_Homo_sapiens.GRCh37.62.filter3.sorted.gtf;
done

and then I combine in the "right " order with cat.

I am not sure yet how this could result into a corrupted gtf file.

sdriscoll 04-23-2012 04:16 PM

Seems like a very strange error. I've never had to resort GTF files in order to get them to work with Cufflinks/merge/diff. I'm surprised the GTF files is giving you this kind of trouble.

I almost always use GTF files from UCSC that I edit in the official gene names to make the DE output more readable. Aside from swapping in gene names I don't change anything. I'm also working with bowtie references I built from FASTA files downloaded from UCSC.

Kcornelius 04-24-2012 03:01 PM

Running now with the newer Homo_sapiens.GRCh37.66.gtf and it is fine so far.
I guess my old reference file was somehow corrupted.


All times are GMT -8. The time now is 08:19 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.