Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks 0.9.2 BAM bug?

    I'm trying to use the new Cufflinks 0.9.2 together with a sorted BAM file (sorting done via samtools). However, I am getting an error message complaining about the sort order, which nevertheless looks to be correct.

    This is the command and the output:

    bash$ ./cufflinks -G ../../Data/Homo_sapiens.GRCh37.59.gtf -o cuff-out-tophat-7a ../../incoming/tophat.7a.sorted.bam
    [23:17:47] Inspecting reads and determining fragment length distribution.
    > Processing Locus 9:141150044-141150148 [******* ] 29%
    Error: this SAM file doesn't appear to be correctly sorted!
    current hit is at 10:93917, last one was at 9:141152127
    Cufflinks requires that if your file has SQ records in
    the SAM header that they appear in the same order as the chromosomes names
    in the alignments.
    If there are no SQ records in the header, or if the header is missing,
    the alignments must be sorted lexicographically by chromsome
    name and by position.
    But my alignments *do* come in the same order as in the header:

    bash$ samtools view -H tophat.7a.sorted.bam
    @SQ SN:1 LN:249250621
    @SQ SN:2 LN:243199373
    @SQ SN:3 LN:198022430
    @SQ SN:4 LN:191154276
    @SQ SN:5 LN:180915260
    @SQ SN:6 LN:171115067
    @SQ SN:7 LN:159138663
    @SQ SN:8 LN:146364022
    @SQ SN:9 LN:141213431
    @SQ SN:10 LN:135534747
    @SQ SN:11 LN:135006516
    @SQ SN:12 LN:133851895
    @SQ SN:13 LN:115169878
    @SQ SN:14 LN:107349540
    @SQ SN:15 LN:102531392
    @SQ SN:16 LN:90354753
    @SQ SN:17 LN:81195210
    @SQ SN:18 LN:78077248
    @SQ SN:19 LN:59128983
    @SQ SN:20 LN:63025520
    @SQ SN:21 LN:48129895
    @SQ SN:22 LN:51304566
    @SQ SN:X LN:155270560
    @SQ SN:Y LN:59373566
    @SQ SN:MT LN:16569

    bash$ samtools view tophat.7a.sorted.bam |cut -f3 |uniq
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    X
    Y
    MT
    My guess is that Cufflinks still wants the old lexicographic order (1, 10, 11, ..., 2, ...) that you would get from sorting a SAM file using sort -k3,3 -k4,4n. But it's a bit of a hassle to convert back the BAM to SAM and then sort the big SAM file, so it would be nice if the BAM file would work with Cufflinks as advertised.
    Last edited by kopi-o; 11-02-2010, 03:08 PM. Reason: typo

  • #2
    I'll look into this issue very soon. Thanks for bringing it to our attention.

    Comment


    • #3
      Some additional information:

      This occurred for the Mac OS X executable. I tried the Linux executable on another file which was sorted in the same way, and it did not give the above error.

      Edit: It occurred for the Linux executable too for another data set.
      Last edited by kopi-o; 11-03-2010, 12:54 AM.

      Comment


      • #4
        Can you make the bam file available to me?

        [email protected]

        Comment


        • #5
          tophat solid output to cufflinks...

          Hi,

          I am trying to use the new cufflinks to get differential expression from tophat ouput of solid but i get this kind of error:

          Error: sort order of reads in BAMs must be the same

          but the sort order seem to be correct.

          Comment


          • #6
            @urchgene,

            This may be related to the other problem. Let me work on kopi-o's dataset and then I will get back to you. You may want to email me after a couple of days in case this report gets lost in the shuffle.

            Comment


            • #7
              I have the same kind of problem... Seems tophat sorts lexicographically and Trapnell tells me this is convention. My header looks like this from a tophat BAM:

              Code:
              @SQ     SN:1    LN:267910886
              @SQ     SN:10   LN:110718848
              @SQ     SN:11   LN:87759784
              @SQ     SN:12   LN:46782294
              @SQ     SN:13   LN:111154910
              @SQ     SN:14   LN:112194335
              @SQ     SN:15   LN:109758846
              @SQ     SN:16   LN:90238779
              @SQ     SN:17   LN:97296363
              @SQ     SN:18   LN:87265094
              @SQ     SN:19   LN:59218465
              @SQ     SN:2    LN:258207540
              @SQ     SN:20   LN:55268282
              @SQ     SN:21   LN:160699376
              @SQ     SN:3    LN:171063335
              @SQ     SN:4    LN:187126005
              @SQ     SN:5    LN:173096209
              @SQ     SN:6    LN:147636619
              @SQ     SN:7    LN:143002779
              @SQ     SN:8    LN:129041809
              @SQ     SN:9    LN:113440463
              I wanted to run picard metrics on my alignments and since my FASTA reference file is sorted chronologically I was getting failures. I simply remade my reference sorted lexicographically and it works in picard... Perhaps try that?

              Comment


              • #8
                @urchgene,

                Which aligner did you use?
                Can you post the headers for your alignment files?

                Comment


                • #9
                  I am having a similar problem with cufflinks 0.9.1. I tried running cufflinks on a samfile that had been output by tophat and got the following error:

                  [17:21:28] Inspecting reads and determining fragment length distribution.
                  > Processing Locus CH477697.1:309379-310913 [******* ] 30%
                  Error: this SAM file doesn't appear to be correctly sorted!
                  current hit is at CH477620.1:768, last one was at CH477619.1:982530
                  Cufflinks requires that if your file has SQ records in
                  the SAM header that they appear in the same order as the chromosomes names
                  in the alignments.
                  If there are no SQ records in the header, or if the header is missing,
                  the alignments must be sorted lexicographically by chromsome
                  name and by position.

                  My samfile has no SQ records in the header, but the hits specified in the error message DO seem to be in lexicographic order. I.E. CH477619.1 comes before CH477620.1.

                  I then converted the samfile into a sorted bamfile using samtools and got the same error (though the details of exactly which hits were out of order were different).
                  Lindy McBride - Rockefeller University

                  Comment


                  • #10
                    @kopi-o,

                    I have found a solution to your problem. For some reason the samtools code that cufflinks uses was unable to read your header. Using the "reheader" samtools command, I replaced the header on the bam file with the header that you posted (and that is output by samtools view -H). After doing this, I was able to get everything to run properly.

                    Comment


                    • #11
                      @lindymcb,

                      Are you using a GTF annotation? If so, it needs to be in the same order as your SAM file OR you must use a SAM header.

                      Comment


                      • #12
                        @adarob

                        Thanks for your quick response! I did find a difference in ordering of reference contigs between my samfile and my gtf file. However, when I reorder the gtf file to be consistent with the samfile and try again, I get the same error.

                        Now I have added the sam header and it is working. I didn't want to do that initially because I'm working with >2000 reference contigs!
                        Lindy McBride - Rockefeller University

                        Comment


                        • #13
                          @adarob..............i used bowtie for the alignment and my headers are same as those posted by caddymob.

                          @adarob, i will also like to get your email if possible. email me with yahoo on same username you see here.

                          Comment


                          • #14
                            adarob, thanks for your quick assistance. Cufflinks is a very nice piece of software.

                            Comment


                            • #15
                              converting from sam to bam

                              someone please help, i am really stuck here.................

                              i am using this command to convert result from the newest version of Tophat from sam to bam....

                              samtools view -bS tophat_out.sam > tophat_out.bam

                              and i get this error .....

                              @SQ SN:AM168630 LN:423
                              @SQ SN:AM168692 LN:492
                              @SQ SN:AM168702 LN:370
                              @SQ SN:AM16!' is recognized as '*'.
                              [main_samview] truncated file.

                              what is wrong please?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X