SEQanswers (
-   RNA Sequencing (
-   -   cufflinks: input alignment from hisat2 (

dovah 07-30-2016 03:48 AM

cufflinks: input alignment from hisat2
Hi all,

I read on Cufflinks man page that input bam file must be sorted this way:

sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted

However, it is taking ages, and eventually causes the server on which we are doing calculations to crash. Is there any possibility to achieve the same sorting as Cufflinks wants, overcoming this sorting step? I tried samtools sort, but apparently it is not what Cufflinks needs.

Just in case, I also have alignments with STAR and MapSplice, both of them are also apparently "too big" to be handled by sort as cufflinks wants it. If you are wondering why I am not using TopHat for alignment, well you probably don't even imagine how it is slow for alignment. :P

If you have a valid alternative to Cufflinks, I am also open to new software.

Thanks in advance!

asier_gonzalez 08-02-2016 03:13 AM

Hi dovah,

Cufflinks requires the input alignments to be sorted by chromosomal position and that is what the sort command you posted is doing.

You can use samtools sort using the default parameters (not -n as sorts by read name) and you will achieve the same results which should work with cufflinks. Why do you say this doesn't work?

Let me ask a few more questions:
* Which organism are you working with?
* How big are the sam files you are working with?
* How much RAM has the server you are using?

Also keep in mind that HISAT requires the option "--dta-cufflinks" so that it reports the output SAM with the attributes needed by cufflinks.



dovah 08-02-2016 07:13 AM

Hi asier,

many thanks for your answer. I will try again the default samtools sort option, will let you know.

To answer your questions:
* The organism is D. melanogaster
* My bam files are about 33 GB each. Yes, I've been told that's pretty big, but keep in mind I had 251929648 reads (x2 because of paired-end) that survived after trimming.
* The overall RAM of the server I am using is about 60GB. I expect it to be okay to handle a sort job :)

And thanks for reminding me of the magical Hisat2 option, maybe this is exactly what I am missing. So actually sou are suggesting that if I add this option when Hisat2 is generating the mapping file, I will still need to sort the input bam but it will have the necessary attributes to be processed by Cufflinks?

Keeping you updated.

asier_gonzalez 08-02-2016 07:47 AM

Hi Dovah,

If you have binary alignment files (bam) you definitively need to use samtools as linux command sort only works on test files (in this case you should have a sam file). If the BAM file is 33 GB big the SAM will be much bigger and the sort may fail with 60 GB, otherwise, these files are big but nothing ridiculous so samtools should work.

However, hopefully the problem will be all about some missing attributes in the alignment file due to the missing Hisat2 parameter, so rerunning and sorting it again should fix it.

dovah 08-03-2016 08:31 AM

Hi again.

Just for the sake of completeness, and sharing:
* I generated sam file (almost 200GB) with Hisat2 with the option you suggested (--dta-cufflinks), to prepare XS flags for Cufflinks.
* I converted to bam with samtools view -Sb
* I sorted the output bam file (33G) using Picard Tools SortSam (option SORT_ORDER=coordinate) and Cufflinks seems to appreciate it. This sorting tool is way faster than samtools (lexographical) sort. With picard I could properly sort the 33GB bam file in more or less 3h.

Voilą, now let's hope Cufflinks won't crash.

sahusarika 05-09-2018 02:09 AM

cufflinks error
hello asier_gonzalez

I am also struggling with the same problem.
I have used HISAT2 for alignment then i followed samtools view -Sb.
* I have used Picard Tools SortSam (option SORT_ORDER=coordinate)
while running cufflinks there was an "errorSAM error on line 45697305: invalid CIGAR operation in cufflinks"

pls help me!

All times are GMT -8. The time now is 06:57 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.