Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lindseykelly
    Junior Member
    • Apr 2012
    • 5

    Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq on Galaxy

    I am trying to do RNAseq analysis on Paired end data from the Hiseq2000. I have about 50 files for each sample (25 forward and 25 reverse - although each sample has a different number of files).

    I think that I need to:
    -convert them into FASTQ sanger format using the FASTSQ groomer tool
    -check the quality using the FASTQqc tool

    I don't know how to handle this many files. Do I have to groom and run the QC for each file? Should I join the paired files and run both tools on each pair, or should I combine all of the data for each sample (which I don't know how to do) and then groom and run the QC for all of the reads for the sample.

    Thanks in advance for advice
    Lindsey
  • lindseykelly
    Junior Member
    • Apr 2012
    • 5

    #2
    This was the response from the Galaxy team, in case someone else has this question:

    Yes, you have this correct. The general path would be to:

    - join forward and reverse data per run
    - run FASTQ Groomer & FastQC
    (note: if your data is already in Sanger FASTQ format with Phred+33 quality scaled
    values, the datatype '.fastqsanger' can be directly assigned and the FASTQ Groomer
    step skipped. This is likely true if your data is a from the latest CASAVA pipeline, but
    please double check.)
    - discard data as needed based on quality
    - split forward and reverse data that passes QC
    - concatenate all forward reads from a sample into one FASTQ file
    - concatenate all reverse reads from a sample into one FASTQ file.
    - for each sample, run TopHat using the two concatenated FASTQ files

    To manipulate paired end data, please see the tools -> NGS: QC and manipulation: FASTQ splitter & FASTQ joiner.

    To combined data files head-to-tail from multiple runs into a single FASTQ file please see the tool -> Text Manipulation: Concatenate datasets.

    I am not sure of the actual volume of data, but if these start to get large or TopHat errors with a memory problem, a local or cluster instance would be the recommendation: http://getgalaxy.org

    For reference:



    Hopefully this helps. Others are welcome to post comments/suggestions.

    Jen
    Galaxy team

    Comment

    • mhkiani
      Member
      • Oct 2013
      • 12

      #4
      Broken paired reads

      I got some RNA-seq paired 100bp data and when I did the RNA-seq analyis with CLC, I got more than 50% broken pairs among the reads and I'm not sure why.

      Comment

      • sugo
        Junior Member
        • Nov 2013
        • 8

        #5
        What is the purpose of joining the forward and reverse reads prior to QC? Couldn't the QC be run on the separate reads?

        Comment

        • Mike2188
          Member
          • Oct 2013
          • 27

          #6
          If you do each file individually then you run into errors during alignments. For instance if I had 100,000 paired end reads in two files forward.fq and reverse.fq and I performed some trimming and quality filtering on each individually, I might end up with one file with 90,000 reads and one with 89,000. Now when I go to do alignments, the program will assume the first read in forward.fastq corresponds to the first read in reverse.fastq - but now the files are uneven. The alignments won't work correctly because of this.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 10:09 AM
          0 responses
          10 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          20 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          27 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          22 views
          0 reactions
          Last Post SEQadmin2  
          Working...