SEQanswers (
-   RNA Sequencing (
-   -   Free & Open Environment for RNA-seq analysis: Galaxy ( (

jgoecks 11-29-2010 01:06 PM

Free & Open Environment for RNA-seq analysis: Galaxy (
The Galaxy Team is excited to announce that the first free public resource for RNA-seq analysis is now available through the Galaxy public server at

Galaxy now supports both Tophat and Cufflinks and also provides useful utilities for manipulating and visualizing GTF files, which are common outputs for a Tophat-Cufflinks pipeline.

Here is an exercise for learning about how to use Galaxy for RNA-seq analysis.

This addition brings Galaxy's current NGS offerings to:

1. NGS QC and manipulation - contains a variety of tools for dealing with all flavors of fastq datasets as well as outputs of SOLiD and 454 instruments.
2. NGS Mapping - currently includes bowtie (Illumina & SOLiD), BWA (Illumina), and lastz (454) mappers. PerM (SOLiD) is on the way and more will be added in the coming months. Transcriptome tools (e.g., top-hat) are also in the final stages of development.
3. NGS SAMTools - includes a variety of utilities for SAM/BAM manipulation. Some are based on the samtools library, some are written by the Galaxy team.
4. NGS RNA-seq tools - includes Tophat, Cufflinks, and useful utilities for manipulating and viewing GTF files.

Galaxy is an open and free web-based platform for performing accessible, reproducible, and transparent NGS analyses. Users can start using Galaxy by going to ; alternatively, Galaxy can be downloaded and run on any *NIX machine: or run on cloud computing resources such as Amazon:

Here is the previous SEQAnswers announcement about Galaxy's initial NGS offerings.

Enjoy and please send us feedback!

The Galaxy Team

honey 12-15-2010 11:13 PM

I have problem while running RNA seq on Galaxy, I can not save Bam file (it saves as Bam index by default) or sam files. Secondly I am trying to find do you have any plan to integrate Deseq into Gakaxy or it is not necessary?

jgoecks 12-16-2010 05:21 AM


(1) Clicking on the save icon (the disk) rather than the arrow will download the BAM file rather than the index. (This is a recent UI bug, and we've fixed it in our codebase; you'll see this fix when update our main server.)

(2) I'm not sure why you wouldn't be able to save SAM files -- perhaps the size is very large and your browser times out or you're not waiting long enough for the file to download? Can you provide more details about the problem that you're having?

(3) DESeq could certainly be integrated into Galaxy, but we--the Galaxy team--are not currently working on it. Galaxy has many R-based tools already available and we both welcome and try to support submissions from the community for new tool wrappers.

Finally, Galaxy usage issues/questions are best sent to either or These lists go to the entire Galaxy team and, in the case of galaxy-user, to the user community, and you should be able to get help more easily/quickly when you post there.

Galaxy Team

neoanderson 04-30-2011 09:49 PM

Im not sure if this is the best place to post this...but here goes...
we have recently obtained an rna-seq dataset to get differential expression lists from.
being new to this, I evaluated the galaxy platform and I found it very useful and interesting. the QC and mapping programs in galaxy have been used to obtain bam/sam mapped files. I recently stumbled across the rquant package for galaxy but am unable to install it. I have also downloaded the bam files from the galaxy server. I am trying hard to understand how to proceed from having these bam files to actually obtaining lists of up or down regulated genes for the condition tested. thanks in anticipation

honey 05-01-2011 04:20 AM

Thanks for sharing your experience with Galaxy perhaps you may also like to mail the message to Galaxy users list. You have to follow the workflow of RNA-seq and have to run cufflink/ cuffdiff. The problem is I am not sure if you can really get to a point in Galaxy where you can get differential expression list of transcripts or genes or isoforms or splicing junctions. However you can certainly take these bam/sam files and run further analysis outside Galaxy. There is also a nice tutorial to work with RNA- seq data (search Galaxy users list). Jeremy Goecks may add in more information about differential expression if I am missing something.

jgoecks 05-01-2011 08:39 AM


rQuant was developed by Gunnar Ratsch's Lab and is available via their public Galaxy instance at Questions should be directed towards them (Help menu --> Email questions) rather than the galaxy-user mailing list.

The galaxy-user mailing list (see my previous post) is the best place to ask questions about using Tophat/Cufflinks/compare/diff in Galaxy.


Many users have gotten a functioning Tophat/Cufflinks/compare/diff pipeline working in Galaxy and have produced Cuffdiff quantitation and differential expression datasets. I think the Galaxy team has managed to address most of the big issues with this pipeline, but we're happy to help solve any particular problems that you may be having.


neoanderson 05-01-2011 02:17 PM

@honey, @jgoecks
thank you for the quick reply. I really appreciate it.
I will write to rQuant developers about the issues. just that the login details for galaxy don't let us login to and also it would be so much more easier if the sam/bam files generated through read mapping with bowtie in galaxy were available in
thank you both once again.

jgoecks 05-02-2011 05:30 AM


Galaxy makes it relatively easy to move files from one instance to another:

(1) for the dataset that you want to move, right click on the save (disk) icon next to the dataset and copy its URL;
(2) for the instance that you want to copy the dataset to, paste the URL into the upload form.

Galaxy will then copy the dataset from one instance to another without you having to download it to your local computer.

Complete histories can be imported and exported as well, but this functionality is still in beta.


Sakti 05-31-2011 10:59 AM

Hi Jeremy,

This is an awesome tool. I'm new to RNA-seq and was getting dizzy by reading the tons of reports using different programs. I'm glad Galaxy simplified a fairly complicated analysis pipeline into such a simple one.

My only request would be if you could answer the questions you made in the tutorial as to know my results and your results are in accordance, and feel more secure by comparing my reasoning with the results one should get.

Thank you so much, this was very interesting.

fangquan 08-10-2011 12:38 PM

Hi friend,
I have two questions.

(1) Is there a way to see the command lines ran behind Galaxy's web-interface?
(2) A few jobs are still waiting to run, if I shut down my PC. Is it still working?


jgoecks 08-10-2011 01:07 PM


The answers to your questions depend on whether you're using our public server or running Galaxy locally/on the cloud.

If you're using our public server (

(1) you cannot see the command lines run by Galaxy;
(2) waiting jobs will be run even if you turn off your computer.

If you're running locally/on the cloud:

(1) you can see the command lines by viewing Galaxy's logs;
(2) waiting jobs will not be run unless your Galaxy is running.

Questions like this are best directed to one of our mailing list:


GenoMax 08-11-2011 04:11 AM


Originally Posted by jgoecks (Post 30371)
The Galaxy Team is excited to announce that the first free public resource for RNA-seq analysis is now available through the Galaxy public server at

The Galaxy Team


I am not sure if it was advertised before but galaxy now has a disk quota for user files on the public instance (I understand the reason for the decision).

I learned this from a galaxy mailing list answer yesterday. I feel that this should be pointed out as a footnote for this post and mentioned on the main page of galaxy.

Thanks for the great work you all do!

jgoecks 08-16-2011 04:46 AM


Quotas on our public instance are a new feature (within the last couple weeks) and are being phased in slowly. Moreover, we're still in process of determining what the quotas should be; currently they are:

(a) 50 GB per dataset;
(b) 200 GB per history;
(c) 4 concurrent NGS jobs;

Once we've determined what these will be, I'll update my initial post and we'll ensure that this information is prominently featured on the public site.


SilviaBCE 07-16-2012 08:19 AM

Nucleotide bias in a specific position of the reads- Galaxy analysis
1 Attachment(s)
Hi, I'm analyzing my small-RNA-seq data (Illumina 1.9 quality score) and I'm using galaxy to make the preliminary qc tasks. I find it a great and easy tool! I'm here to ask you how can I interpretate a graph:I'm talking about the nucleotide distribution chart after the sample grooming and the 3' adapter trimming. I attach it here so anybody can see it. Up to now I've loaded two samples in galaxy and they both give me this kind of bias at the 3rd nucleotide of the reads. What does it mean? would you suggest to eliminate all those reads which contain the "N" in the 3rd position?
Any suggestion would be appreciated! Thanks a lot.

weijenc 08-13-2012 05:02 AM

Trimming Paired-End Data

So if I use quality value < 20 to trim my Illumina dataset, which contains paired-end 100 bp sequencing reads, would both reads on the same pair be removed should one of them have a base quality < 20? What I worry is when I use the trimmed dataset to perform de novo assembly, would any program say that the dataset is not paired-end if both reads are not removed at the same time?



jgoecks 08-13-2012 01:59 PM

Trimming PE reads in Galaxy

My suggestion for trimming paired-end reads in Galaxy is:

(1) Join them using the FASTQ joiner;
(2) Filter them using the Filter FASTQ tool;
(3) Split them using the FASTQ splitter.


weijenc 08-16-2012 07:35 PM

Problem in grooming
Thanks for the reply to my previous post.

I have been trying to work with the paired-end dataset (SRR131208, two files). After grooming (solexa to fastq sanger), however, quality values are between 5 and 0. Did I do something wrong?



jgoecks 08-17-2012 08:41 AM

Your data is almost certainly not solexa format; most newer Illumina data is already fastqsanger, in which case the groomer is not needed.

See the Wikipedia entry for FASTQ for more details:

You should be able to look at the first few reads of your datasets to determine the FASTQ format.


Peppe 11-18-2012 10:22 AM

Hi all,
I am new in the forum and also in the RNA seq analysis field.
I just a got the results of my RNA sequencing and I am trying to map my reads using tophat on galaxy. I am working with Windows 7. After the FastQC analysis, I converted my reads with FASTQ Groomer and then I run tophat. It has been 2 days and the process hasn't started yet.

Does tophat (in galaxy) run on windows7?
Usually how long does it take a mapping analysis (about 20 Mb the size of the genome reference)?


jgoecks 11-19-2012 09:40 AM


I assume you're using the public Galaxy server at , yes?

If so:

(a) Galaxy will work fine on Windows, though you'll want to you Firefox or Chrome as your Web browser going forward so that you can use all of Galaxy's functionality.

(b) The server is very busy right now, so it may take a couple days for your job to start. Do not restart the job or it will go to the end of the wait list. Once your job starts, it should go quickly (4-8 hours is a good estimate) because your genome is small.


All times are GMT -8. The time now is 12:51 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.