Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq on Galaxy

    I am trying to do RNAseq analysis on Paired end data from the Hiseq2000. I have about 50 files for each sample (25 forward and 25 reverse - although each sample has a different number of files).

    I think that I need to:
    -convert them into FASTQ sanger format using the FASTSQ groomer tool
    -check the quality using the FASTQqc tool

    I don't know how to handle this many files. Do I have to groom and run the QC for each file? Should I join the paired files and run both tools on each pair, or should I combine all of the data for each sample (which I don't know how to do) and then groom and run the QC for all of the reads for the sample.

    Thanks in advance for advice
    Lindsey

  • #2
    This was the response from the Galaxy team, in case someone else has this question:

    Yes, you have this correct. The general path would be to:

    - join forward and reverse data per run
    - run FASTQ Groomer & FastQC
    (note: if your data is already in Sanger FASTQ format with Phred+33 quality scaled
    values, the datatype '.fastqsanger' can be directly assigned and the FASTQ Groomer
    step skipped. This is likely true if your data is a from the latest CASAVA pipeline, but
    please double check.)
    - discard data as needed based on quality
    - split forward and reverse data that passes QC
    - concatenate all forward reads from a sample into one FASTQ file
    - concatenate all reverse reads from a sample into one FASTQ file.
    - for each sample, run TopHat using the two concatenated FASTQ files

    To manipulate paired end data, please see the tools -> NGS: QC and manipulation: FASTQ splitter & FASTQ joiner.

    To combined data files head-to-tail from multiple runs into a single FASTQ file please see the tool -> Text Manipulation: Concatenate datasets.

    I am not sure of the actual volume of data, but if these start to get large or TopHat errors with a memory problem, a local or cluster instance would be the recommendation: http://getgalaxy.org

    For reference:



    Hopefully this helps. Others are welcome to post comments/suggestions.

    Jen
    Galaxy team

    Comment


    • #4
      Broken paired reads

      I got some RNA-seq paired 100bp data and when I did the RNA-seq analyis with CLC, I got more than 50% broken pairs among the reads and I'm not sure why.

      Comment


      • #5
        What is the purpose of joining the forward and reverse reads prior to QC? Couldn't the QC be run on the separate reads?

        Comment


        • #6
          If you do each file individually then you run into errors during alignments. For instance if I had 100,000 paired end reads in two files forward.fq and reverse.fq and I performed some trimming and quality filtering on each individually, I might end up with one file with 90,000 reads and one with 89,000. Now when I go to do alignments, the program will assume the first read in forward.fastq corresponds to the first read in reverse.fastq - but now the files are uneven. The alignments won't work correctly because of this.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 02:46 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-02-2024, 08:06 AM
          0 responses
          23 views
          0 likes
          Last Post seqadmin  
          Working...
          X