Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BAM header too large using cuffdiff

    Hi All,

    I tried to do expression analysis on an Illumina paired end transcriptome run.

    I prepaired my reads by tophat:

    tophat -r 150 -o tophat-1 Index reads1-1.fastQ reads1-2.fastq
    tophat -r 150 -o tophat-2 Index reads2-1.fastQ reads2-2.fastq


    Then sorted the BAM files:

    samtools sort tophat-1.bam tophat-sorted-1
    samtools sort tophat-2.bam tophat-sorted-2


    Then used cuffdiff and got a warning :

    cuffdiff transcripts.gff tophat-sorted-1.bam tophat-sorted-2.bam
    You are using Cufflinks v1.0.3, which is the most recent release.
    File tophat-sorted-1.bam doesn't appear to be a valid BAM file, trying SAM...
    Warning: BAM header too large


    All the result files from cuffdiff are empty...
    Can anybody help me on that?
    best
    mlox

  • #2
    Hello --

    When using cufflinks I receive the same error. I assembled my reads using Bowtie, converted my SAM output to BAM and then sorted the BAM file using SAMtools.

    I would appreciate if someone could help. Thanks.

    cufflinks: /usr/lib64/libz.so.1: no version information available (required by cufflinks)
    You are using Cufflinks v1.0.3, which is the most recent release.
    Warning: BAM header too large
    File trinity_n_trial_inflorescence.sorted.bam doesn't appear to be a valid BAM file, trying SAM...
    jdjax
    Ph.d. Student
    Åarhus University

    Comment


    • #3
      mlox,

      The general work flow is tophat, cufflinks, cuffcompare, and cuffdiff. If you follow the manuals of tophat and cufflinks, it should give you decent results.

      Can you tell me how you came up with your current work flow?

      Comment


      • #4
        We currently do not have TopHat on our server and no one in our research group has any experience with TopHat.

        I do not have a reference genome, so after the assembly was done; I used bowtie to align the reads from my various tissue samples to the bowtie index I made from the assembly. Bowtie's output is an unsorted SAM file. So using SAMtools I first convert the file to a BAM file and then I sort it. I then take the sorted BAM file as input for cufflinks I get this error.

        I have tried using SAMtools reheader and that also did not work. Any other suggestions would be helpful.
        jdjax
        Ph.d. Student
        Åarhus University

        Comment


        • #5
          Hi Jdjax,

          My experience indicates it is in general less challenging to use the work flow recommended by the author(s). (I am aware that cufflinks support bam files generated by programs other than tophat but in your case it complains that your bam file is not valid.)

          In your case, the nice part about tophat is two folds: 1) you can download the binary to your home directory and use it directly ; 2) tophat uses bowtie to align so it can re-use your index files. You may pursue fixing the header complaint or try tophat, whichever can achieve your objectives.

          Comment


          • #6
            Thanks for your input DZhang. Do you know of any other dependencies besides Bowtie that are required for TopHat?
            jdjax
            Ph.d. Student
            Åarhus University

            Comment


            • #7
              Not that I am aware of. I believe you will get the results faster if you go with tophat.

              Comment


              • #8
                Hi DZhang,
                As I haven't a reference genome file I used a transcriptome assembly for mapping, I thought about no need to cufflnks and cuffmerge. I just generated a gtf file by my own, as all my reads came from spliced exons.
                I guess the error message results from the large number of transcripts I mapped to. I also tried bwa and for mapping and got a similar error.

                Comment


                • #9
                  Hi mlox,

                  In your case, I strongly recommend using a count-based method. (If possible, I would also recommend mapping the reads to a genome, not a transcriptome.) My pick is to use HT-seq to obtain the read counts and use DESeq to identify differentially expressed genes.

                  Comment


                  • #10
                    DZhang,

                    I installed TopHat and tired using the accepted_hits.bam output from TopHat in cufflinks. But I received the same error: BAM header too large.

                    Do you have any other suggestions on what I can do?

                    Thanks.
                    jdjax
                    Ph.d. Student
                    Åarhus University

                    Comment


                    • #11
                      jdjax,

                      Did you sort the sam? The bam file produced by Tophat should be used as is. Please also post your cufflink command.

                      Comment


                      • #12
                        DZhang,

                        I did not sort the sam. I am just testing these programs out so I did not use any options for tophat or cufflinks. Tophat made a file accept_hits.bam. I used that file as input for the cufflinks.

                        My cufflinks command was just: cufflinks accepted_hits.bam

                        I also want to more descriptive about errors I am recieveing in the hopes of figuring this problem. This is what the error stated:

                        cufflinks: /usr/lib64/libz.so.1 : no version information available
                        Warning: BAM header too large
                        File accepted_hits does not appear to be a valid BAM file, trying SAM
                        Inspecting reads and determining fragment length distribution.
                        SAM error on line 2880: CIGAR op has zero length
                        SAM error on line 3240: CIGAR op has zero length
                        SAM error on line 3464: CIGAR op has zero length
                        SAM error on line 5063: CIGAR op has zero length
                        SAM error on line 30750: CIGAR op has zero length
                        SAM error on line 51722: CIGAR op has zero length

                        This continues with increasing line numbers until it reaches the end of the file.
                        I have also checked /usr/lib64/libz.so.1 and it is in /usr/lib64

                        libz.so.1 -> libz.so.1.2.3

                        is what is present in on the server.

                        Again thanks for your input. I appreciate any help. =)
                        jdjax
                        Ph.d. Student
                        Åarhus University

                        Comment


                        • #13
                          Hi jdjax,

                          1) Can you provide some background about your project? Type of reads, type of reference sequence, etc.
                          2) Tophat requires one mandatory parameter besides the read file(s). See below: -r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.

                          How did you set -r ?

                          Comment


                          • #14
                            Hi jdjax,

                            You should check the header information of your bam file. One way to do it is to convert bam to sam using samtools, then check the top portion of the sam files. (e.g., using 'more your.sam'). Let us know what you see in the header.

                            Comment


                            • #15
                              DZhang,

                              These are 50 to 200 bp single reads and the reference sequence I am using is the fasta file of contigs I got from the trinity assembly. This is for a de novo project, I do not have a full reference genome. Because of the fact that I do not have a reference genome is why I wanted to just use Bowtie, I did not think that TopHat was necessary since I do not have a full genome.

                              The option -r is only required for paired end runs.
                              jdjax
                              Ph.d. Student
                              Åarhus University

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X