SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sam to bam conversion error, no @SQ lines in the header, missing header? efoss Bioinformatics 17 12-03-2015 05:28 AM
Cufflinks, BAM header problem solved... for the moment rossh Bioinformatics 9 05-06-2012 11:33 PM
BAM header too large using cuffdiff mlox Bioinformatics 20 09-13-2011 06:23 AM
Cufflinks crashes on BAM output from TopHat Sherry Bioinformatics 0 02-07-2011 08:04 AM
tophat/cufflinks bam vs. RPKM mgogol Bioinformatics 5 04-26-2010 10:58 AM

Reply
 
Thread Tools
Old 11-21-2011, 06:41 AM   #1
peromhc
Senior Member
 
Location: Durham, NH

Join Date: Sep 2009
Posts: 108
Arrow Tophat -> Cufflinks: BAM header too large

I am getting this error message when running Cufflinks on a Tophat created BAM file. Tophat version 1.3.3 and Cufflinks version 1.1.0. Bowtie 0.12.7 and Samtools 0.1.18

Tophat command:
Code:
/home/matthew/tophat-1.3.3/tophat -p 16 -r 195 -z pbzip2 --mate-std-dev 50 /media/hd2/tuco/bowtie.index/tuco7 \
/media/hd2/tuco/unshuff/sgatrim/nocontam/MDM01_index1_qualshuff.nohomo.1.fq /media/hd2/tuco/unshuff/sgatrim/nocontam/MDM01_index1_qualshuff.nohomo.2.fq
Cufflinks:
Code:
/home/matthew/cufflinks/cufflinks -p16 -u -o /media/hd2/tuco/tophat/406A/cuff \
-b /media/hd2/tuco/bowtie.index/tuco.fa --upper-quartile-norm --max-mle-iterations 20000 \
--num-importance-samples 10000 /media/hd2/tuco/tophat/406A/tophat_out/accepted_hits.bam
Tophat finishes without error, but Cufflinks does not..

You are using Cufflinks v1.1.0, which is the most recent release.
Warning: BAM header too large
File /media/hd2/tuco/tophat/406A/tophat_out/accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
[21:18:11] Inspecting reads and determining fragment length distribution.
SAM error on line 25873: CIGAR op has zero length
SAM error on line 26633: CIGAR op has zero length
...
peromhc is offline   Reply With Quote
Old 11-21-2011, 06:48 AM   #2
peromhc
Senior Member
 
Location: Durham, NH

Join Date: Sep 2009
Posts: 108
Default

one other thing I forgot to include above:

1st few lines of the SAM file:

Code:
@HD	VN:1.0	SO:coordinate
@SQ	SN:10000084.208.674	LN:208
@SQ	SN:1000016.233.27383	LN:233
@SQ	SN:10000164.283.623	LN:283
@SQ	SN:10000188.1527.11468	LN:1527
peromhc is offline   Reply With Quote
Old 01-04-2012, 10:40 AM   #3
jstjohn
Member
 
Location: San Francisco, CA

Join Date: Jun 2010
Posts: 35
Default

Lame... I am having the same issue and it looks like no one has responded to you. Have you figured this out yourself yet? I am wondering, are you also using this on a highly fragmented de-novo assembly with a few hundred thousand contigs/scaffolds? Maybe cufflinks doesn't work when the assembly has a large number of fragments?
jstjohn is offline   Reply With Quote
Old 01-04-2012, 11:04 AM   #4
jstjohn
Member
 
Location: San Francisco, CA

Join Date: Jun 2010
Posts: 35
Default maybe figured this out

Hello,
I noticed that the same general question was posted on stack exchange and didn't have an answer there either. To summarize I modified the max header length variable in hits.cpp (line 731 in v1.3.0) to the following (was 4MB)

Code:
static const unsigned MAX_HEADER_LEN = 6 * 1024 * 1024; // 6 MB
After changing that, the program appears to be proceeding normally.

To see my full previous post on this go to the stack exchange site:

http://biostar.stackexchange.com/que...ge/15971#15971

good luck!

Last edited by jstjohn; 01-04-2012 at 11:13 AM. Reason: Modified the link to point to my answer on biostar rather than the question.
jstjohn is offline   Reply With Quote
Old 02-12-2013, 07:12 PM   #5
ataraxia
Junior Member
 
Location: Australia

Join Date: Feb 2013
Posts: 7
Default Problems with warning "BAM header too large" using Cufflinks2 on Linux server

Hi jstjohn,
I am having a similar problem to yours when trying to run Cufflinks on my TopHat accepted_hits.bams output


Here is the output of the log file:

Command line:
cufflinks -o /outfile_location -p 16 -g /gtf_file_location -v --no-update-check -u -b /ref_fasta_location --max-bundle-frags 1000000000 /accepted_hits.bam_location
Warning: BAM header too large
File accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
[17:10:06] Loading reference annotation.
GFF warning: merging adjacent/overlapping segments of ENSOANT00000031404 on Contig9854 (16061-16163, 16168-16239)
GFF warning: merging adjacent/overlapping segments of ENSOANT00000023588 on Contig9784 (7502-7816, 7821-8557)
GFF warning: merging adjacent/overlapping segments of ENSOANT00000023588 on Contig9784 (8582-8714, 8717-8998)


The genome I am using is very fragmented (i.e. contains 200,000 contigs on top of the Chr) and the BAM header is around 5.5 Mb. However, I read in the Cufflinks2 manual that: " The header size limit in Cufflinks' BAM parser used to have a 4 megabyte limit. This has been removed to allow Cufflinks to be used on assemblies with many contigs. "

I have looked online for some help regarding this issue and some people have suggested changing the source code in the hits.cpp file (line 736 : static const unsigned MAX_HEADER_LEN = 4 * 1024 * 1024; // 4 MB) for Windows version, but there does not seem to be any equivalent file in the Linux version.

Any help will be greatly appreciated.
ataraxia is offline   Reply With Quote
Old 02-18-2013, 01:12 PM   #6
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

Hi,

I have the same problem. I am using cufflinks v.2. Anyone found a solution to this?
I don't have the skills to change the cufflinks source code unfortunately...

Jon
JonB is offline   Reply With Quote
Old 02-18-2013, 03:16 PM   #7
ataraxia
Junior Member
 
Location: Australia

Join Date: Feb 2013
Posts: 7
Default

I guess would anyone have a Cufflinks 2 version that they compiled themselves from source code (and that is modified to allow for larger bam headers) that they would be willing to share. I would need one to run on Linux x86_64.
ataraxia is offline   Reply With Quote
Old 03-29-2013, 06:58 AM   #8
qwsqe
Junior Member
 
Location: usa

Join Date: Jun 2010
Posts: 4
Default

"BAM header too large" problem/issue is caused by the genome file, which you used to make bowtie[12] index. To resolve the issue, clean up the genome file by removing all scaffold sequences that are not shown in your GTF file.
qwsqe is offline   Reply With Quote
Old 04-14-2013, 05:53 PM   #9
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

One alternative is to use the pseudochromosome to replace the fragmented scaffolds when run tophat/cufflinks/cuffdiff.
I don't know the possibility and whether there is influence for the following expression calculation and differential expression measurement.

Is it need a try?
pengchy is offline   Reply With Quote
Old 08-06-2013, 02:29 PM   #10
danjg
Junior Member
 
Location: Nebraska

Join Date: Jun 2011
Posts: 4
Default

I also had this problem on a shared machine where I couldn't recompile code. My transcriptome was pretty poorly assembled so filtering out low sequence reads got the header size to 4.1 MB. I was able to remove REGEX's in the fasta titles (like Genus_sp) of the headers with sed and it bumped the header size down to 3.9 MB. I was able to reheader the accepted_hits.bam file with the truncated titles and cufflinks ran it just fine...
danjg is offline   Reply With Quote
Reply

Tags
bowtie, cufflinks, samtools, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO