Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufmerge and the max-bundle-length

    Hi,

    I could finally run cuffmerge but I realized that me and a colleague of mine got some
    annoying skipping with human samples:

    chr21:38435145-45760353 Warning: Skipping large bundle.

    chr6:126102278-130463972 Warning: Skipping large bundle.

    Looking at the genome browser, I see a lot of genes in these regions. Do I have to assume now that cuffmerge has produced one single geneID, spanning the whole region, when in fact I have a lot of reads distributed for many smaller gene entries?

    I naively thought to fix this with increasing the --max-bundle-length by adding the option in the python script for cuffmerge:

    Code:
    def cufflinks(out_dir,
                  sam_file,
                  min_isoform_frac,
                  gtf_file=None,
                  extra_opts=["-q", "--overhang-tolerance", "200", "--library-type=transfrags",  "-A","0.0", "--min-frags-per-transfrag", "0", "--no-5-extend", "--max-bundle-length", "9925208"],
                  lsf=False,
                  curr_queue=None):
    But this resulted in this error at the level of cuffcompare:

    Error: duplicate GFF ID 'ENST00000506472' encountered!
    [FAILED]

    So I wonder if

    first it is a good idea to use the merge.gtf at all, given that you would either skip whole chromosome regions or potentially get huge merged gene entries.

    And second how I could run the script with the --max-bundle-length option ?

    Thanks,
    Marc

  • #2
    I've seen skipping like that with mouse as well. I've seen it in all of their programs: cufflinks, cuffdiff, and cuffmerge. I can't say what it means in the results though..I haven't looked into it because I don't use these programs for primary analysis.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Strange, I think I have no choice but to run cuttdiff with the reference gtf for now if I want to stick with the tuxedo pipeline. Need to run it again and test if all the known reference IDs of the skipped region will be deleted as well. In that case I would need to run cuffdiff twice with a known reference and the merged one and that doesn't seem like the most elegant way.

      Comment


      • #4
        Confirmed,

        merged.gtf did not contain the GeneIDs of the skipped region. So would be nice to get an idea if cuffmerge makes sense at all for human assemblies?

        Comment


        • #5
          that's pretty odd. the whole point of cufflinks is to be able to do this kind of analysis. i know it sometimes fails on "overly complex loci" but this is a little crazy. i looked around that region you posted myself and there's plenty of genes in there that have a good bit of space on either side of them - enough to delineate them from other genes. i'd think cufflinks would be able to work in that region.

          maybe this issue will go away, or become improved, in their upcoming v1.4 release. i know it's in the works.

          my experience with cufflinks has been like this:

          1. it was released, i tried it, we didn't like the results.
          2. we did more sequencing 4 months later and i tried it again, we didn't like the results
          3. we did more sequencing a year later, i tried it again, we didn't like the results
          4. since then we have done more sequencing every 6 months and i've tried the current versions every time and have been disappointed with the output

          "not liking" the results hasn't just been because it didn't tell us what we wanted but for reasons similar to what you've posted here. it seems to fail in illogical ways and in ways that make it unusable when you're worried about missing information or misleading information. sequencing is expensive so one really wants to be sure they get the right information out a run.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #6
            Yes,
            we are disappointed by the results from cuffmerge as well. Why nearly the whole chr21 gets pasted together is a mystery as nothing hints to this behaviour in the code description.
            Working with the reference GTF works fine though for all the way down to cummeRbund and it's easy to pipe it together.
            For unknown isoforms and genes however we need to look for another tool. Which software are you using for this purpose right now?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            59 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            57 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            56 views
            0 likes
            Last Post seqadmin  
            Working...
            X