SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat -> Cufflinks: BAM header too large peromhc Bioinformatics 9 08-06-2013 01:29 PM
cufflinks warning messages pinki999 Bioinformatics 0 04-15-2011 12:41 AM
Tophat; warning poisson200 Bioinformatics 0 03-03-2011 10:17 AM
tophat warning MerFer Bioinformatics 1 08-05-2010 04:13 PM
Bowtie warning seq_GA Bioinformatics 1 06-10-2010 09:18 AM

Reply
 
Thread Tools
Old 07-17-2011, 12:04 PM   #1
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default cufflinks, Warning: Skipping large bundle.

Hi All,
When I used cufflinks,

cufflinks-1.0.3.Linux_x86_64/cufflinks -p 4 -I 5000000 -G genome_index/annotation/Homo_sap iens.GRCh37.63/Homo_sapiens.GRCh37.63.gtf --output-dir mapping/7124 mapping/7124/accepted_hits.bam

I always get this warning,

Warning: Skipping large bundle.

what is this mean?

Thank you very much.
fabrice is offline   Reply With Quote
Old 08-14-2011, 12:55 PM   #2
Joseph Dougherty
Junior Member
 
Location: United States

Join Date: Aug 2011
Posts: 2
Default

Hi Farbice,

Did you ever head anything back on this? I get the same error. It seems to occur during the steps involved in running multi-read correction, which it did not appear you ar doing.

I am running:

cufflinks -I 500000 -p 3 -b /srv/cgs/data/jdougherty/indexes/mm9_mcherry.fa -u -g ../../mm9_flat_dsred.gtf accepted_hits.bam


and in the output I get:


[05:56:12] Inspecting reads and determining fragment length distribution.
> Processed 106047 loci. [*************************] 100%
> Map Properties:
> Total Map Mass: 5751822.83
> Number of Multi-Reads: 640779 (with 3399112 total hits)
> Read Type: 104bp x 104bp
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 126.83
> Estimated Std Dev: 21.21
[05:58:36] Assembling transcripts and initializing abundances for multi-read correction.
> Processing Locus chr14:3030401-3030502 [******* ] 28%
chr14:3032091-7220792 Warning: Skipping large bundle.
> Processed 106046 loci. [*************************] 100%
[06:11:45] Loading reference annotation and sequence.
[06:12:12] Learning bias parameters.
> Processed 22579 loci. [*************************] 100%
[06:13:53] Re-estimating abundances with bias and multi-read correction.
> Processed 22579 loci. [*************************] 100%

real 25m33.584s
user 62m53.950s
sys 0m44.890s
Finished: Sat Aug 13 06:21:42 CDT 2011

Any idea of what this is?

Thanks
Joe
Joseph Dougherty is offline   Reply With Quote
Old 08-15-2011, 02:54 AM   #3
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

Joe ,

I do not get the answer
fabrice is offline   Reply With Quote
Old 08-16-2011, 02:20 PM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,167
Default

Joe & Fabrice,

Cufflinks groups overlapping reads into what it refers to as 'bundles', the assumption being that each of these bundles represents a gene locus. It then processes each of the bundles separately to assemble a gene model. If the length of genome spanned by all the reads in a bundle is too large (larger than reasonably expected for a gene) cufflinks will not attempt to process that bundle further and will move on. When this happens it produces the warning message you see. No models will be built from this group of aligned reads nor any expression values reported.

The default length which triggers this skipping is 3.5 million base pairs. In Joe's example the bundle which was skipped spanned chr14 from 3032091-7220792 which is ~4.2 million bp. You can increase (or decrease) the maximum bundle length by passing the "--max-bundle-length <int>" parameter to cufflinks. <int> can be any integer >= 1.
kmcarr is offline   Reply With Quote
Old 08-16-2011, 02:39 PM   #5
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

kmcarr,

Thank your reply.

Here if I set too larger --max-bundle-length value, will have some problem?


> Processing Locus 1:11868-31109 [ ] 0%^M> Processing Locus 1:34553-36081
21:38435145-45747259 Warning: Skipping large bundle.

here it is ~7,3million bp

Last edited by fabrice; 08-16-2011 at 03:10 PM.
fabrice is offline   Reply With Quote
Old 08-17-2011, 05:03 AM   #6
Joseph Dougherty
Junior Member
 
Location: United States

Join Date: Aug 2011
Posts: 2
Default

Thanks much!
Joseph Dougherty is offline   Reply With Quote
Old 08-17-2011, 05:11 AM   #7
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,167
Default

Quote:
Originally Posted by fabrice View Post
kmcarr,

Thank your reply.

Here if I set too larger --max-bundle-length value, will have some problem?


> Processing Locus 1:11868-31109 [ ] 0%^M> Processing Locus 1:34553-36081
21:38435145-45747259 Warning: Skipping large bundle.

here it is ~7,3million bp
The purpose of the --max-bundle-length parameter is to prevent cufflinks from trying to assemble a gene model from a read group spanning a genomic region which is clearly too large to represent a single gene. An appropriate value for this parameter is very much dependent upon the species you are working in. The default value of 3,500,000bp is (I believe) set to be appropriate for humans or other mammals. You could increase the size of this value but is it likely that a gene in your organism of interest would span 7.3 million bp? I can't answer that; this is where your knowledge of the organism you are studying comes into play.
kmcarr is offline   Reply With Quote
Old 08-17-2011, 05:19 AM   #8
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

I am working on humans samples.

Quote:
Originally Posted by kmcarr View Post
The purpose of the --max-bundle-length parameter is to prevent cufflinks from trying to assemble a gene model from a read group spanning a genomic region which is clearly too large to represent a single gene. An appropriate value for this parameter is very much dependent upon the species you are working in. The default value of 3,500,000bp is (I believe) set to be appropriate for humans or other mammals. You could increase the size of this value but is it likely that a gene in your organism of interest would span 7.3 million bp? I can't answer that; this is where your knowledge of the organism you are studying comes into play.
fabrice is offline   Reply With Quote
Old 04-25-2012, 09:14 AM   #9
Kcornelius
Member
 
Location: Los Angeles

Join Date: Apr 2012
Posts: 14
Default

I am working on human as well.
Me and a colleague of mine got 2 regions bigger than 3.5 mio:

chr21:38435145-45760353 Warning: Skipping large bundle.

chr6:126102278-130463972 Warning: Skipping large bundle.


So I think this is quite normal for human samples.

Marc
Kcornelius is offline   Reply With Quote
Old 07-31-2012, 01:12 PM   #10
caddymob
Member
 
Location: USA

Join Date: Apr 2009
Posts: 36
Default

I consistently see it skipping these in humans:

21:38435145-45747259 Warning: Skipping large bundle.
6:126102306-130463972 Warning: Skipping large bundle.

The chr21 locus is huge and part of the down syndrome critical region... I run --max-bundle-length 10000000 to get past this error. Seems to work, calling FPKMs across the DSCR...
caddymob is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO