Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNAseq for 3 different Pools

    Hello. Here is the problem set up.
    I have received fastq.gz files that were ran in "four different pools". We have 10 samples per pool, and each sample was paired end read, so we have two reads per sample. Now each sample was ran on two lanes, so in actuality, for each sample there are 4 reads.

    I am interested in creating the bam files for each pool, and just would like to make sure I understand what is going on.

    So first I am concatenating the same read from each different lane into a single read. ie. SameID_Lane1_R1.fastq.gz is cat with SameID_Lane2_R1.fastq.gz etc.
    After cat these files in Linux is there an error check I should perform to make sure the cat was done correctly?


    And secondly, how do I align the four different pools? Should I align each pool independent of the others? Say for pool 1, I just run the TopHat and process all the bam files for these ten sampleIDs? Or should I align all 4 pools together?

    Thank you.

    To my understanding, a "pool" is just a batched run of samples that have been processed.

  • #2
    From what you write, I assume that you have 4 "pooled samples", from 10 biological samples each, sequenced on 2 lanes per "pooled sample". It's a bit ambiguous from what you wrote whether that's correct (e.g., there would never just be 2 reads for a sample, I assume you mean the number of files, etc.).

    1. Yes, just cat them the R1 files together and also the R2 files together. No there is no error checking needed after this.
    2. Just run tophat2/STAR/whatever 4 times, once for each of the pooled samples.

    Given your second question, it's unclear if you actually have pooled samples or not. You would have a pooled sample if you took 10 different biological samples and then dumped them in the same tube (then using that for sequencing). This averages out any outliers a bit and cuts down on costs (I've seen companies charging more per sample-prep than for the actual sequencing), but is generally a terrible way to do things. I wonder if you actually just multiplexed things, which would be the better way to go about things. If you have 16 fastq files, then you have pooled samples (4 samples * 2 lanes * 2 files/sample/lane). If, instead, you have 40 fastq files, then the samples were multiplexed. The difference is rather important.

    Comment


    • #3
      MultiPlex

      Hello.

      Thank you for your response.

      I have four pooled data "sets". In each pooled set, I have 10 samples with each sample having 4 fastq files. thus each "pooled set" has 40 fastq files.

      This implies by your response that I have multiplexed data.

      So if it is multiplexed data, I can align each "pooled set" separately using Tophat.

      Is this correct?

      Thank you!!

      Comment


      • #4
        Ah, then there was no pooling, you just have individual samples (this is a good thing). Run tophat once per biological sample (so, 40 times with 4 files per run). If you have the resources, run STAR, it'll save you a lot of time.
        Last edited by dpryan; 11-23-2013, 01:25 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Working...
        X