SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina paired-end reads... naragam General 3 06-28-2012 05:51 AM
Help with Illumina Paired-End Data adamba Bioinformatics 5 04-16-2012 12:36 PM
Illumina quality score encoding for galaxy grooming Mshegrss General 2 03-14-2012 06:53 AM
Illumina Paired End FASTQ kjsalimian Bioinformatics 2 01-05-2012 12:19 PM
Illumina paired-end names trickytank Illumina/Solexa 3 05-03-2011 02:46 AM

Reply
 
Thread Tools
Old 07-02-2012, 06:07 AM   #1
lindseykelly
Junior Member
 
Location: pittsburgh

Join Date: Apr 2012
Posts: 5
Default Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq on Galaxy

I am trying to do RNAseq analysis on Paired end data from the Hiseq2000. I have about 50 files for each sample (25 forward and 25 reverse - although each sample has a different number of files).

I think that I need to:
-convert them into FASTQ sanger format using the FASTSQ groomer tool
-check the quality using the FASTQqc tool

I don't know how to handle this many files. Do I have to groom and run the QC for each file? Should I join the paired files and run both tools on each pair, or should I combine all of the data for each sample (which I don't know how to do) and then groom and run the QC for all of the reads for the sample.

Thanks in advance for advice
Lindsey
lindseykelly is offline   Reply With Quote
Old 07-12-2012, 05:59 AM   #2
lindseykelly
Junior Member
 
Location: pittsburgh

Join Date: Apr 2012
Posts: 5
Default

This was the response from the Galaxy team, in case someone else has this question:

Yes, you have this correct. The general path would be to:

- join forward and reverse data per run
- run FASTQ Groomer & FastQC
(note: if your data is already in Sanger FASTQ format with Phred+33 quality scaled
values, the datatype '.fastqsanger' can be directly assigned and the FASTQ Groomer
step skipped. This is likely true if your data is a from the latest CASAVA pipeline, but
please double check.)
- discard data as needed based on quality
- split forward and reverse data that passes QC
- concatenate all forward reads from a sample into one FASTQ file
- concatenate all reverse reads from a sample into one FASTQ file.
- for each sample, run TopHat using the two concatenated FASTQ files

To manipulate paired end data, please see the tools -> NGS: QC and manipulation: FASTQ splitter & FASTQ joiner.

To combined data files head-to-tail from multiple runs into a single FASTQ file please see the tool -> Text Manipulation: Concatenate datasets.

I am not sure of the actual volume of data, but if these start to get large or TopHat errors with a memory problem, a local or cluster instance would be the recommendation: http://getgalaxy.org

For reference:
http://tophat.cbcb.umd.edu/manual.html
http://www.nature.com/nprot/journal/....2012.016.html

Hopefully this helps. Others are welcome to post comments/suggestions.

Jen
Galaxy team
lindseykelly is offline   Reply With Quote
Old 09-12-2012, 09:03 PM   #3
Huan@illumina.com
Junior Member
 
Location: California

Join Date: Sep 2012
Posts: 2
Default

This may be helpful to you:

http://www.illumina.com/documents/pr...q_analysis.pdf
Huan@illumina.com is offline   Reply With Quote
Old 10-29-2013, 12:44 PM   #4
mhkiani
Member
 
Location: Texas

Join Date: Oct 2013
Posts: 12
Default Broken paired reads

I got some RNA-seq paired 100bp data and when I did the RNA-seq analyis with CLC, I got more than 50% broken pairs among the reads and I'm not sure why.
mhkiani is offline   Reply With Quote
Old 01-16-2014, 09:59 AM   #5
sugo
Junior Member
 
Location: Canada

Join Date: Nov 2013
Posts: 8
Default

What is the purpose of joining the forward and reverse reads prior to QC? Couldn't the QC be run on the separate reads?
sugo is offline   Reply With Quote
Old 07-30-2014, 01:09 PM   #6
Mike2188
Member
 
Location: Winnipeg

Join Date: Oct 2013
Posts: 24
Default

If you do each file individually then you run into errors during alignments. For instance if I had 100,000 paired end reads in two files forward.fq and reverse.fq and I performed some trimming and quality filtering on each individually, I might end up with one file with 90,000 reads and one with 89,000. Now when I go to do alignments, the program will assume the first read in forward.fastq corresponds to the first read in reverse.fastq - but now the files are uneven. The alignments won't work correctly because of this.
Mike2188 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO