Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks bam header size too large

    When running cufflinks on my accepted_hits.bam files from tophat, I get this output, truncated as there are many CIGAR errors.

    Code:
    You are using Cufflinks v2.0.2, which is the most recent release.
    Warning: BAM header too large
    File accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
    [12:02:25] Inspecting reads and determining fragment length distribution.
    SAM error on line 806: CIGAR op has zero length
    SAM error on line 823: invalid CIGAR operation
    SAM error on line 831: CIGAR op has zero length
    SAM error on line 847: CIGAR op has zero length
    SAM error on line 874: CIGAR op has zero length
    SAM error on line 879: CIGAR op has zero length
    It looks like I'm having the same error as this thread was.
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    However, I can't get cufflinks to compile with the source changes. The dependencies fail to be noticed. I'm following the guide here http://cufflinks.cbcb.umd.edu/tutorial.html and have tried on two computers now with no luck. Is there by chance a pre-compiled version with the fix, or some other way around this issue?

    Thanks for any help.

  • #2
    I made it past the previous error with some help compiling cufflinks with the source change to handle headers larger than 4MB. Now another error appears when trying to run cufflinks.

    Code:
    You are using Cufflinks v2.0.2, which is the most recent release.
    Error: sort order of reads in BAMs must be the same
    I've tried sorting the accepted_hits.bam file with samtools as many threads seem to suggest, and error persists.

    Comment


    • #3
      ramma, having similar issue using cufflinks on large transcriptome sequence with lots of predicted transcripts. How did you get the program to compile? When I change the hits.cpp file, it won't compile, but if I leave it at 4 MB it still works. Any ideas on how to make this work?

      Thanks.

      Comment


      • #4
        Originally posted by JueFish View Post
        ramma, having similar issue using cufflinks on large transcriptome sequence with lots of predicted transcripts. How did you get the program to compile? When I change the hits.cpp file, it won't compile, but if I leave it at 4 MB it still works. Any ideas on how to make this work?

        Thanks.
        Hi JueFish, I have a standalone version of cufflinks which I believe accepts up to 8 MB header size. I had help compiling it, so it's not my expertise and every time I tried compiling it myself with the header size change I had issues as well. If 8 MB is enough for your needs I'd be happy to send the stand alone version to you.

        Comment


        • #5
          ramma, 8MB is actually still smaller than what I need, but it's better than what I got. I can probably drop my reference down to get in that range. I'll send you a private message on Seqanswers and we can figure out how I might be able to get a copy of the binary from you. Thanks.

          Comment


          • #6
            Dear ramma,
            I am having a similar problem to Juefish when trying to run Cufflinks on my TopHat accepted_hits.bams output, as I am using a very fragmented genome (my bam header size is around 5.5 MB). I have tried to compile the source code changing the parameter for header size without any success. Would it be possible for you to send me your stand alone version? Which system are you using (Mac, PC, Linux)?

            Comment


            • #7
              Sure ataraxia, I'll PM you a link to the download. It's compiled for a linux system.

              Comment


              • #8
                Ramma could you supply me with your 8MB compilation?

                Comment


                • #9
                  Originally posted by ramma View Post
                  I made it past the previous error with some help compiling cufflinks with the source change to handle headers larger than 4MB. Now another error appears when trying to run cufflinks.

                  Code:
                  You are using Cufflinks v2.0.2, which is the most recent release.
                  Error: sort order of reads in BAMs must be the same
                  I've tried sorting the accepted_hits.bam file with samtools as many threads seem to suggest, and error persists.
                  Did you ever find a way to fix this second error? I am experiencing it as well.

                  Comment


                  • #10
                    Originally posted by RNAddict View Post
                    Did you ever find a way to fix this second error? I am experiencing it as well.
                    Hi RNAddict,

                    here is the solution
                    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin


                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                      Yesterday, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    45 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X