Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks: input alignment from hisat2

    Hi all,

    I read on Cufflinks man page that input bam file must be sorted this way:

    sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted

    However, it is taking ages, and eventually causes the server on which we are doing calculations to crash. Is there any possibility to achieve the same sorting as Cufflinks wants, overcoming this sorting step? I tried samtools sort, but apparently it is not what Cufflinks needs.

    Just in case, I also have alignments with STAR and MapSplice, both of them are also apparently "too big" to be handled by sort as cufflinks wants it. If you are wondering why I am not using TopHat for alignment, well you probably don't even imagine how it is slow for alignment. :P

    If you have a valid alternative to Cufflinks, I am also open to new software.

    Thanks in advance!

  • #2
    Hi dovah,

    Cufflinks requires the input alignments to be sorted by chromosomal position and that is what the sort command you posted is doing.

    You can use samtools sort using the default parameters (not -n as sorts by read name) and you will achieve the same results which should work with cufflinks. Why do you say this doesn't work?

    Let me ask a few more questions:
    * Which organism are you working with?
    * How big are the sam files you are working with?
    * How much RAM has the server you are using?

    Also keep in mind that HISAT requires the option "--dta-cufflinks" so that it reports the output SAM with the attributes needed by cufflinks.

    Cheers,

    Asier

    Comment


    • #3
      Hi asier,

      many thanks for your answer. I will try again the default samtools sort option, will let you know.

      To answer your questions:
      * The organism is D. melanogaster
      * My bam files are about 33 GB each. Yes, I've been told that's pretty big, but keep in mind I had 251929648 reads (x2 because of paired-end) that survived after trimming.
      * The overall RAM of the server I am using is about 60GB. I expect it to be okay to handle a sort job

      And thanks for reminding me of the magical Hisat2 option, maybe this is exactly what I am missing. So actually sou are suggesting that if I add this option when Hisat2 is generating the mapping file, I will still need to sort the input bam but it will have the necessary attributes to be processed by Cufflinks?

      Keeping you updated.

      Comment


      • #4
        Hi Dovah,

        If you have binary alignment files (bam) you definitively need to use samtools as linux command sort only works on test files (in this case you should have a sam file). If the BAM file is 33 GB big the SAM will be much bigger and the sort may fail with 60 GB, otherwise, these files are big but nothing ridiculous so samtools should work.

        However, hopefully the problem will be all about some missing attributes in the alignment file due to the missing Hisat2 parameter, so rerunning and sorting it again should fix it.

        Comment


        • #5
          Hi again.

          Just for the sake of completeness, and sharing:
          * I generated sam file (almost 200GB) with Hisat2 with the option you suggested (--dta-cufflinks), to prepare XS flags for Cufflinks.
          * I converted to bam with samtools view -Sb
          * I sorted the output bam file (33G) using Picard Tools SortSam (option SORT_ORDER=coordinate) and Cufflinks seems to appreciate it. This sorting tool is way faster than samtools (lexographical) sort. With picard I could properly sort the 33GB bam file in more or less 3h.

          Voilà, now let's hope Cufflinks won't crash.

          Comment


          • #6
            cufflinks error

            hello asier_gonzalez

            I am also struggling with the same problem.
            I have used HISAT2 for alignment then i followed samtools view -Sb.
            * I have used Picard Tools SortSam (option SORT_ORDER=coordinate)
            while running cufflinks there was an "errorSAM error on line 45697305: invalid CIGAR operation in cufflinks"

            pls help me!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            33 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            72 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            81 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X