SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Input BAM files for Cufflinks buthercup_ch RNA Sequencing 1 06-19-2016 10:16 AM
Can the sam files from Hisat2 used as input for HTseq? jingyawang Bioinformatics 1 03-29-2016 07:37 AM
GTF input tophat and cufflinks rubbertjes Bioinformatics 6 07-02-2013 07:42 AM
Bowtie alignment for ChIP-seq: good alignment for input but not for ChIP sample nico_z Bioinformatics 2 03-20-2013 10:19 AM
cuffdiff gtf input from cufflinks? PFS Bioinformatics 1 03-24-2011 12:46 PM

Reply
 
Thread Tools
Old 07-30-2016, 02:48 AM   #1
dovah
Member
 
Location: Russia

Join Date: Jul 2014
Posts: 18
Default cufflinks: input alignment from hisat2

Hi all,

I read on Cufflinks man page that input bam file must be sorted this way:

sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted

However, it is taking ages, and eventually causes the server on which we are doing calculations to crash. Is there any possibility to achieve the same sorting as Cufflinks wants, overcoming this sorting step? I tried samtools sort, but apparently it is not what Cufflinks needs.

Just in case, I also have alignments with STAR and MapSplice, both of them are also apparently "too big" to be handled by sort as cufflinks wants it. If you are wondering why I am not using TopHat for alignment, well you probably don't even imagine how it is slow for alignment. :P

If you have a valid alternative to Cufflinks, I am also open to new software.

Thanks in advance!
dovah is offline   Reply With Quote
Old 08-02-2016, 02:13 AM   #2
asier_gonzalez
Junior Member
 
Location: Harpenden

Join Date: Dec 2015
Posts: 7
Default

Hi dovah,

Cufflinks requires the input alignments to be sorted by chromosomal position and that is what the sort command you posted is doing.

You can use samtools sort using the default parameters (not -n as sorts by read name) and you will achieve the same results which should work with cufflinks. Why do you say this doesn't work?

Let me ask a few more questions:
* Which organism are you working with?
* How big are the sam files you are working with?
* How much RAM has the server you are using?

Also keep in mind that HISAT requires the option "--dta-cufflinks" so that it reports the output SAM with the attributes needed by cufflinks.

Cheers,

Asier
asier_gonzalez is offline   Reply With Quote
Old 08-02-2016, 06:13 AM   #3
dovah
Member
 
Location: Russia

Join Date: Jul 2014
Posts: 18
Default

Hi asier,

many thanks for your answer. I will try again the default samtools sort option, will let you know.

To answer your questions:
* The organism is D. melanogaster
* My bam files are about 33 GB each. Yes, I've been told that's pretty big, but keep in mind I had 251929648 reads (x2 because of paired-end) that survived after trimming.
* The overall RAM of the server I am using is about 60GB. I expect it to be okay to handle a sort job

And thanks for reminding me of the magical Hisat2 option, maybe this is exactly what I am missing. So actually sou are suggesting that if I add this option when Hisat2 is generating the mapping file, I will still need to sort the input bam but it will have the necessary attributes to be processed by Cufflinks?

Keeping you updated.
dovah is offline   Reply With Quote
Old 08-02-2016, 06:47 AM   #4
asier_gonzalez
Junior Member
 
Location: Harpenden

Join Date: Dec 2015
Posts: 7
Default

Hi Dovah,

If you have binary alignment files (bam) you definitively need to use samtools as linux command sort only works on test files (in this case you should have a sam file). If the BAM file is 33 GB big the SAM will be much bigger and the sort may fail with 60 GB, otherwise, these files are big but nothing ridiculous so samtools should work.

However, hopefully the problem will be all about some missing attributes in the alignment file due to the missing Hisat2 parameter, so rerunning and sorting it again should fix it.
asier_gonzalez is offline   Reply With Quote
Old 08-03-2016, 07:31 AM   #5
dovah
Member
 
Location: Russia

Join Date: Jul 2014
Posts: 18
Default

Hi again.

Just for the sake of completeness, and sharing:
* I generated sam file (almost 200GB) with Hisat2 with the option you suggested (--dta-cufflinks), to prepare XS flags for Cufflinks.
* I converted to bam with samtools view -Sb
* I sorted the output bam file (33G) using Picard Tools SortSam (option SORT_ORDER=coordinate) and Cufflinks seems to appreciate it. This sorting tool is way faster than samtools (lexographical) sort. With picard I could properly sort the 33GB bam file in more or less 3h.

Voilą, now let's hope Cufflinks won't crash.
dovah is offline   Reply With Quote
Old 05-09-2018, 01:09 AM   #6
sahusarika
Junior Member
 
Location: delhi

Join Date: Apr 2015
Posts: 7
Default cufflinks error

hello asier_gonzalez

I am also struggling with the same problem.
I have used HISAT2 for alignment then i followed samtools view -Sb.
* I have used Picard Tools SortSam (option SORT_ORDER=coordinate)
while running cufflinks there was an "errorSAM error on line 45697305: invalid CIGAR operation in cufflinks"

pls help me!
sahusarika is offline   Reply With Quote
Reply

Tags
alignment, cufflinks, hisat2, transcriptome assembly

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:02 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO