Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • drsnafu1831
    Junior Member
    • Dec 2016
    • 1

    TopHat (Tuxedo Suite) - AWS vs Local

    Hello All
    Sorry this one's a bit long
    Have come here after some testing !!

    I am trying to run Tuxedo suite for RNASeq Analysis on AWS

    Post alignment using Tophat, I have recorded significant difference in size of "accepted_hits.bam" between runs on AWS (Amazon Web Services) and Local System

    Details
    Paired end FastQs
    Size - after Adapter and quality thresholding - 7 gigs (3.5 X 2)
    Human Sample

    AWS Instance
    15 cores - of - m4.16xlarge (64 cores/256 RAM)
    Time ~45 min
    Out put accepted_hits.bam ~100mb

    Local Server
    15 cores - of - 36 cores/256 RAM
    Time ~52 min
    Out put accepted_hits.bam ~1.2GB

    Questions
    Why such significant difference? reasons?

    One thread I found, which was somewhat related


    As suggested in the above thread, I have tried running tophat, with same parameters and input files,
    on 1, 5, 10, 15, 20, 30 and 64 cores on local servers
    on 30 and 64 cores on AWS

    Interestingly, the size of "accepted_hits.bam" remained same (1.2 gb) till I reached 30 cores on local server (with specs mentioned above), and reduced (~120 mb) on 64 cores.
    On AWS, as said above, 30 and (later) 64 are giving out ~100 mb of accepted_hits.bam

    any input, suggestions and comments are welcome
    thank you for your time !!
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    Sounds like a bug to me... maybe you should try a different version of Tophat. The number of cores should not affect the size of the output bam more than a tiny amount.

    Alternatively, you could try BBMap; it's faster than Tophat and produces correct output for any number of cores

    Comment

    Latest Articles

    Collapse

    • SEQadmin2
      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
      by SEQadmin2


      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

      Here are nine questions we think about, in roughly the order they matter, before...
      06-18-2026, 07:11 AM
    • SEQadmin2
      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
      by SEQadmin2


      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
      ...
      06-02-2026, 10:05 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, 06-17-2026, 06:09 AM
    0 responses
    34 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-09-2026, 11:58 AM
    0 responses
    99 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-05-2026, 10:09 AM
    0 responses
    120 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-04-2026, 08:59 AM
    0 responses
    113 views
    0 reactions
    Last Post SEQadmin2  
    Working...