Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq on Galaxy

    I am trying to do RNAseq analysis on Paired end data from the Hiseq2000. I have about 50 files for each sample (25 forward and 25 reverse - although each sample has a different number of files).

    I think that I need to:
    -convert them into FASTQ sanger format using the FASTSQ groomer tool
    -check the quality using the FASTQqc tool

    I don't know how to handle this many files. Do I have to groom and run the QC for each file? Should I join the paired files and run both tools on each pair, or should I combine all of the data for each sample (which I don't know how to do) and then groom and run the QC for all of the reads for the sample.

    Thanks in advance for advice
    Lindsey

  • #2
    This was the response from the Galaxy team, in case someone else has this question:

    Yes, you have this correct. The general path would be to:

    - join forward and reverse data per run
    - run FASTQ Groomer & FastQC
    (note: if your data is already in Sanger FASTQ format with Phred+33 quality scaled
    values, the datatype '.fastqsanger' can be directly assigned and the FASTQ Groomer
    step skipped. This is likely true if your data is a from the latest CASAVA pipeline, but
    please double check.)
    - discard data as needed based on quality
    - split forward and reverse data that passes QC
    - concatenate all forward reads from a sample into one FASTQ file
    - concatenate all reverse reads from a sample into one FASTQ file.
    - for each sample, run TopHat using the two concatenated FASTQ files

    To manipulate paired end data, please see the tools -> NGS: QC and manipulation: FASTQ splitter & FASTQ joiner.

    To combined data files head-to-tail from multiple runs into a single FASTQ file please see the tool -> Text Manipulation: Concatenate datasets.

    I am not sure of the actual volume of data, but if these start to get large or TopHat errors with a memory problem, a local or cluster instance would be the recommendation: http://getgalaxy.org

    For reference:



    Hopefully this helps. Others are welcome to post comments/suggestions.

    Jen
    Galaxy team

    Comment


    • #4
      Broken paired reads

      I got some RNA-seq paired 100bp data and when I did the RNA-seq analyis with CLC, I got more than 50% broken pairs among the reads and I'm not sure why.

      Comment


      • #5
        What is the purpose of joining the forward and reverse reads prior to QC? Couldn't the QC be run on the separate reads?

        Comment


        • #6
          If you do each file individually then you run into errors during alignments. For instance if I had 100,000 paired end reads in two files forward.fq and reverse.fq and I performed some trimming and quality filtering on each individually, I might end up with one file with 90,000 reads and one with 89,000. Now when I go to do alignments, the program will assume the first read in forward.fastq corresponds to the first read in reverse.fastq - but now the files are uneven. The alignments won't work correctly because of this.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X