Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks, Warning: Skipping large bundle.

    Hi All,
    When I used cufflinks,

    cufflinks-1.0.3.Linux_x86_64/cufflinks -p 4 -I 5000000 -G genome_index/annotation/Homo_sap iens.GRCh37.63/Homo_sapiens.GRCh37.63.gtf --output-dir mapping/7124 mapping/7124/accepted_hits.bam

    I always get this warning,

    Warning: Skipping large bundle.

    what is this mean?

    Thank you very much.

  • #2
    Hi Farbice,

    Did you ever head anything back on this? I get the same error. It seems to occur during the steps involved in running multi-read correction, which it did not appear you ar doing.

    I am running:

    cufflinks -I 500000 -p 3 -b /srv/cgs/data/jdougherty/indexes/mm9_mcherry.fa -u -g ../../mm9_flat_dsred.gtf accepted_hits.bam


    and in the output I get:


    [05:56:12] Inspecting reads and determining fragment length distribution.
    > Processed 106047 loci. [*************************] 100%
    > Map Properties:
    > Total Map Mass: 5751822.83
    > Number of Multi-Reads: 640779 (with 3399112 total hits)
    > Read Type: 104bp x 104bp
    > Fragment Length Distribution: Empirical (learned)
    > Estimated Mean: 126.83
    > Estimated Std Dev: 21.21
    [05:58:36] Assembling transcripts and initializing abundances for multi-read correction.
    > Processing Locus chr14:3030401-3030502 [******* ] 28%
    chr14:3032091-7220792 Warning: Skipping large bundle.
    > Processed 106046 loci. [*************************] 100%
    [06:11:45] Loading reference annotation and sequence.
    [06:12:12] Learning bias parameters.
    > Processed 22579 loci. [*************************] 100%
    [06:13:53] Re-estimating abundances with bias and multi-read correction.
    > Processed 22579 loci. [*************************] 100%

    real 25m33.584s
    user 62m53.950s
    sys 0m44.890s
    Finished: Sat Aug 13 06:21:42 CDT 2011

    Any idea of what this is?

    Thanks
    Joe

    Comment


    • #3
      Joe ,

      I do not get the answer

      Comment


      • #4
        Joe & Fabrice,

        Cufflinks groups overlapping reads into what it refers to as 'bundles', the assumption being that each of these bundles represents a gene locus. It then processes each of the bundles separately to assemble a gene model. If the length of genome spanned by all the reads in a bundle is too large (larger than reasonably expected for a gene) cufflinks will not attempt to process that bundle further and will move on. When this happens it produces the warning message you see. No models will be built from this group of aligned reads nor any expression values reported.

        The default length which triggers this skipping is 3.5 million base pairs. In Joe's example the bundle which was skipped spanned chr14 from 3032091-7220792 which is ~4.2 million bp. You can increase (or decrease) the maximum bundle length by passing the "--max-bundle-length <int>" parameter to cufflinks. <int> can be any integer >= 1.

        Comment


        • #5
          kmcarr,

          Thank your reply.

          Here if I set too larger --max-bundle-length value, will have some problem?


          > Processing Locus 1:11868-31109 [ ] 0%^M> Processing Locus 1:34553-36081
          21:38435145-45747259 Warning: Skipping large bundle.

          here it is ~7,3million bp
          Last edited by fabrice; 08-16-2011, 03:10 PM.

          Comment


          • #6
            Thanks much!

            Comment


            • #7
              Originally posted by fabrice View Post
              kmcarr,

              Thank your reply.

              Here if I set too larger --max-bundle-length value, will have some problem?


              > Processing Locus 1:11868-31109 [ ] 0%^M> Processing Locus 1:34553-36081
              21:38435145-45747259 Warning: Skipping large bundle.

              here it is ~7,3million bp
              The purpose of the --max-bundle-length parameter is to prevent cufflinks from trying to assemble a gene model from a read group spanning a genomic region which is clearly too large to represent a single gene. An appropriate value for this parameter is very much dependent upon the species you are working in. The default value of 3,500,000bp is (I believe) set to be appropriate for humans or other mammals. You could increase the size of this value but is it likely that a gene in your organism of interest would span 7.3 million bp? I can't answer that; this is where your knowledge of the organism you are studying comes into play.

              Comment


              • #8
                I am working on humans samples.

                Originally posted by kmcarr View Post
                The purpose of the --max-bundle-length parameter is to prevent cufflinks from trying to assemble a gene model from a read group spanning a genomic region which is clearly too large to represent a single gene. An appropriate value for this parameter is very much dependent upon the species you are working in. The default value of 3,500,000bp is (I believe) set to be appropriate for humans or other mammals. You could increase the size of this value but is it likely that a gene in your organism of interest would span 7.3 million bp? I can't answer that; this is where your knowledge of the organism you are studying comes into play.

                Comment


                • #9
                  I am working on human as well.
                  Me and a colleague of mine got 2 regions bigger than 3.5 mio:

                  chr21:38435145-45760353 Warning: Skipping large bundle.

                  chr6:126102278-130463972 Warning: Skipping large bundle.


                  So I think this is quite normal for human samples.

                  Marc

                  Comment


                  • #10
                    I consistently see it skipping these in humans:

                    21:38435145-45747259 Warning: Skipping large bundle.
                    6:126102306-130463972 Warning: Skipping large bundle.

                    The chr21 locus is huge and part of the down syndrome critical region... I run --max-bundle-length 10000000 to get past this error. Seems to work, calling FPKMs across the DSCR...

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    23 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    21 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X