Unconfigured Ad

**lindseykelly** · 07-12-2012, 05:59 AM

This was the response from the Galaxy team, in case someone else has this question:

Yes, you have this correct. The general path would be to:

- join forward and reverse data per run
- run FASTQ Groomer & FastQC
(note: if your data is already in Sanger FASTQ format with Phred+33 quality scaled
values, the datatype '.fastqsanger' can be directly assigned and the FASTQ Groomer
step skipped. This is likely true if your data is a from the latest CASAVA pipeline, but
please double check.)
- discard data as needed based on quality
- split forward and reverse data that passes QC
- concatenate all forward reads from a sample into one FASTQ file
- concatenate all reverse reads from a sample into one FASTQ file.
- for each sample, run TopHat using the two concatenated FASTQ files

To manipulate paired end data, please see the tools -> NGS: QC and manipulation: FASTQ splitter & FASTQ joiner.

To combined data files head-to-tail from multiple runs into a single FASTQ file please see the tool -> Text Manipulation: Concatenate datasets.

I am not sure of the actual volume of data, but if these start to get large or TopHat errors with a memory problem, a local or cluster instance would be the recommendation: http://getgalaxy.org

For reference:

404 Not Found

http://tophat.cbcb.umd.edu/manual.html

303 See Other

http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

Hopefully this helps. Others are welcome to post comments/suggestions.

Jen
Galaxy team

**[email protected]** · 09-12-2012, 09:03 PM

This may be helpful to you:

404 Resource at '/content/dam/illumina-marketing/documents/products/datasheets/datasheet_rnaseq_analysis.pdf' not found: No resource found

http://www.illumina.com/documents/products/datasheets/datasheet_rnaseq_analysis.pdf

**mhkiani** · 10-29-2013, 12:44 PM

Broken paired reads

I got some RNA-seq paired 100bp data and when I did the RNA-seq analyis with CLC, I got more than 50% broken pairs among the reads and I'm not sure why.

**sugo** · 01-16-2014, 10:59 AM

What is the purpose of joining the forward and reverse reads prior to QC? Couldn't the QC be run on the separate reads?

**Mike2188** · 07-30-2014, 01:09 PM

If you do each file individually then you run into errors during alignments. For instance if I had 100,000 paired end reads in two files forward.fq and reverse.fq and I performed some trimming and quality filtering on each individually, I might end up with one file with 90,000 reads and one with 89,000. Now when I go to do alignments, the program will assume the first read in forward.fastq corresponds to the first read in reverse.fastq - but now the files are uneven. The alignments won't work correctly because of this.

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 108 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq on Galaxy

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News