Go Back   SEQanswers > Applications Forums > RNA Sequencing

Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina paired-end reads... naragam General 3 06-28-2012 06:51 AM
Help with Illumina Paired-End Data adamba Bioinformatics 5 04-16-2012 01:36 PM
Illumina quality score encoding for galaxy grooming Mshegrss General 2 03-14-2012 07:53 AM
Illumina Paired End FASTQ kjsalimian Bioinformatics 2 01-05-2012 01:19 PM
Illumina paired-end names trickytank Illumina/Solexa 3 05-03-2011 03:46 AM

Thread Tools
Old 07-02-2012, 07:07 AM   #1
Junior Member
Location: pittsburgh

Join Date: Apr 2012
Posts: 5
Default Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq on Galaxy

I am trying to do RNAseq analysis on Paired end data from the Hiseq2000. I have about 50 files for each sample (25 forward and 25 reverse - although each sample has a different number of files).

I think that I need to:
-convert them into FASTQ sanger format using the FASTSQ groomer tool
-check the quality using the FASTQqc tool

I don't know how to handle this many files. Do I have to groom and run the QC for each file? Should I join the paired files and run both tools on each pair, or should I combine all of the data for each sample (which I don't know how to do) and then groom and run the QC for all of the reads for the sample.

Thanks in advance for advice
lindseykelly is offline   Reply With Quote
Old 07-12-2012, 06:59 AM   #2
Junior Member
Location: pittsburgh

Join Date: Apr 2012
Posts: 5

This was the response from the Galaxy team, in case someone else has this question:

Yes, you have this correct. The general path would be to:

- join forward and reverse data per run
- run FASTQ Groomer & FastQC
(note: if your data is already in Sanger FASTQ format with Phred+33 quality scaled
values, the datatype '.fastqsanger' can be directly assigned and the FASTQ Groomer
step skipped. This is likely true if your data is a from the latest CASAVA pipeline, but
please double check.)
- discard data as needed based on quality
- split forward and reverse data that passes QC
- concatenate all forward reads from a sample into one FASTQ file
- concatenate all reverse reads from a sample into one FASTQ file.
- for each sample, run TopHat using the two concatenated FASTQ files

To manipulate paired end data, please see the tools -> NGS: QC and manipulation: FASTQ splitter & FASTQ joiner.

To combined data files head-to-tail from multiple runs into a single FASTQ file please see the tool -> Text Manipulation: Concatenate datasets.

I am not sure of the actual volume of data, but if these start to get large or TopHat errors with a memory problem, a local or cluster instance would be the recommendation:

For reference:

Hopefully this helps. Others are welcome to post comments/suggestions.

Galaxy team
lindseykelly is offline   Reply With Quote
Old 09-12-2012, 10:03 PM   #3
Junior Member
Location: California

Join Date: Sep 2012
Posts: 2

This may be helpful to you: is offline   Reply With Quote
Old 10-29-2013, 01:44 PM   #4
Location: Texas

Join Date: Oct 2013
Posts: 12
Default Broken paired reads

I got some RNA-seq paired 100bp data and when I did the RNA-seq analyis with CLC, I got more than 50% broken pairs among the reads and I'm not sure why.
mhkiani is offline   Reply With Quote
Old 01-16-2014, 10:59 AM   #5
Junior Member
Location: Canada

Join Date: Nov 2013
Posts: 8

What is the purpose of joining the forward and reverse reads prior to QC? Couldn't the QC be run on the separate reads?
sugo is offline   Reply With Quote
Old 07-30-2014, 02:09 PM   #6
Location: Winnipeg

Join Date: Oct 2013
Posts: 26

If you do each file individually then you run into errors during alignments. For instance if I had 100,000 paired end reads in two files forward.fq and reverse.fq and I performed some trimming and quality filtering on each individually, I might end up with one file with 90,000 reads and one with 89,000. Now when I go to do alignments, the program will assume the first read in forward.fastq corresponds to the first read in reverse.fastq - but now the files are uneven. The alignments won't work correctly because of this.
Mike2188 is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 09:14 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO