Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • basic questions about gene expression compassion

    I am a newbie for NGS.
    I am doing rna-Seq. My aim is to compare gene expression between samples.
    Question 1: I did single end 50 bp sequencing for a bacterial strain. Each sample got ~10,000,000 reads. For most samples, ~10% reads are rRNA which I think it OK. But for several samples, ~66% reads are rRNA. It means only maximally ~3,400,000 reads are useful. Do you think if ~3,400,000 reads are acceptable or the number is too low?
    Question 2: Since different samples have different rRNA proportions, I cannot directly compare gene expression using the rest reads, right? How should I do normalization?
    Questions 3: I have tried my best to using same amount of cells, RNA, and cDNA during sample preparation and I got approximately same amount of reads between samples. But, it is still possible to get bias between samples. Should I do normalization against a housekeeping gene by assuming the housekeeping gene has the same expression between the samples?

  • #2
    I only work occasionally with prokaryotes, but here is my attempt to answer your questions.
    Don't have absolute trust in what I post.
    I do make mistakes.

    Question 1.

    Seems fine. 20,000,000 reads is sufficient for a human RNA-Seq experiment, so one would think over 3,000,000 reads would be sufficient for a bacteria. I do have a recent E. coli RNA-Seq experiment somewhere, but I'm too lazy to go look up the sequencing depth we used.

    There is a way to verify if your coverage is sufficient. It's imperfect and requires a bit of work though. You could randomly select a lower number of reads, and verify if the correlation between the replicates decreases as you decrease the number of reads.

    I would think you actually have too many reads, and could sequence at a lower sequencing depth. If the correlation between the replicates does not decrease as you decrease the number of reads, this would confirm that you could sequence at an even lower depth, and cut costs.

    I'm too lazy to search for articles, which I'm sure exist, discussing the optimal sequencing depth for bacteria.

    It also depends on the level of expression of the genes you are interested in.

    Question 2.

    Just remove all the reads that map to the rRNA first.
    After removing these reads, yes, you can compare the reads from the transcriptome directly, after performing the usual normalization steps relative to the library size, and perhaps the gene length.

    Sequencing rRNA is a waste of money, though.
    You should remove them before sequencing, if possible.

    Question 3.

    No. It is not necessary to normalize to a housekeeping gene in RNA-Seq, as opposed to qPCR. You are measuring the relative level of expression of one gene relative to all the other genes. You are measuring a proportion. No other form of normalisation is required, relative to the total amount of cells or relative to a housekeeping gene.

    It's also always interesting to see in RNA-Seq experiments just how much variation there is between housekeeping genes. It's a wonder anyone can use them to do normalization.

    There is one important caveat.
    No normalization is required only if the total amount of RNA produced per cell in each condition is the same.
    If this is not the case, you do need to normalize relative to the total amount of RNA produced per cell.
    This is an exceptional case, but it does exist.
    Last edited by blancha; 11-15-2015, 09:45 AM.

    Comment


    • #3
      Thank you very much!
      Which tool should I use to remove rRNA?

      Comment


      • #4
        Just use the same aligner you were going to use to align the reads on the genome.
        Pick your favorite: BWA, Bowtie2, ...
        I use Bowtie2.

        An alternative strategy is to align on a reference genome that includes the ribosomal RNA, and then exclude the reads aligning on rRNA from the differential expression analysis.
        I prefer completely removing the rRNA reads first by aligning them on the rRNA sequences first.

        Just download a FASTA file containing the rRNA sequences, index the FASTA file, and align all the reads. Keep only the reads that do not align on rRNA for the rest of the analysis.

        To build the index: bowtie-build
        To align: bowtie (with the option --un-conc)

        Comment


        • #5
          Thank you.
          There are many ribosomal copies (8x5S, 7x16S and 7x23S) in the reference genome. So, I combined the 22 rRNA in one fasta file and used the file as the reference to mapping my reads using bowtie. My purpose is to save the unaligned reads into separate files and then can use the files for mapping. However, the step of mapping reads to the fasta file containing the 22 rRNA is very slow. Maybe it did not run at all. Do you think combing the 22 rRNA into one file can cause the problem?

          Comment


          • #6
            I do exactly the same thing.

            The best way to speed up the alignment is to use multiple cores.
            There is a linear relationship between the number of cores used and the speed of alignment.
            So doubling the number of cores will just about halve the run time.

            Of course, you need to have the cores available on the computer on which you are running the analysis.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            31 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X