Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
planning for tool database? lpantano Wiki Discussion 13 09-11-2020 05:09 AM
Planning an RNA-Seq Experiment gavin.oliver RNA Sequencing 20 02-12-2014 12:08 AM
Biostatistics Analyst position GoldbeltFalcon Industry Jobs! 0 10-15-2012 11:48 AM
Job Opportunities in USA - Bioinformatics, Computational Biology, Biostatistics sciencejobs Academic/Non-Profit Jobs 0 08-08-2012 01:08 PM
Research Associate: Biostatistics/Bioinformatics kryskelly Academic/Non-Profit Jobs 0 01-31-2011 05:06 AM

Thread Tools
Old 10-24-2013, 12:39 AM   #1
Location: haifa israel

Join Date: Jun 2011
Posts: 62
Default Biostatistics help in planning experiment with downstream bioinfo in mind!

I have a complicated experimental system that is looking at changes in a region of the brain during learning. Past NGS has shown that, perhaps not surprisingly for brain tissue, there is a lot of animal to animal variability. Also - we cannot completely cleanly cut out the cells of interest, so the tissue is contaminated with other layers of the brain, other glial cells, etc which may or may not have the same gene expression affects. Other than doing LCM/single cell studies- I am looking for recommendations in the best way to plan the following experiment:

We have 6 time points. Planned to do 12 experimental animals and 6-12 control (will 6 be enough or do I need to do the same as experimental?)

Question1: Is it better to individually barcode 12 experimental animals and run on 2 lanes (to get ~30M reads per animal), or can I (perhaps to remove animal to animal variability, whch I am less interested in) pool animals, and do triplicate samples with 4 animals pooled in each? That way I would do 3 pooled experimental and 3 pool control animals. Would the triplicates give me enough statistical power?

Question2: Can I cross-use controls? So if I have the experimental animals undergoing learning for 24, 48, 72 hours etc, but the control animals are essentially identical animals grown in a cage without undergoing learning paradigms, can i make just 3 control animals for each time point and maybe pool them together as a general control group, or are batch effects expected?

Question 3: can I "virtually" pool? So for example, barcode each animal separately, but perhaps run them so each has 10M reads per animal and then pool identical number of reads per animal into 3 pools each of four animals with 30M reads per pool? What are the pros/cons/caveats of doing this?

Noa is offline   Reply With Quote
Old 10-24-2013, 01:39 AM   #2
Devon Ryan
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480

I'd have some concerns about doing LCM with neurons anyway, though I guess I've never tried. Are the 12 experimental animals per timepoint or total? If those are total, then you'll definitely want at least 12 control animals (really, I would want to use much more than this in both groups, depending on what the underlying experiment is).

1. Regardless, barcode all of the animals so you can properly measure biological variability. In that same line of reasoning, every mouse/rat/whatever should be from a different litter (unless the experimental/control animals originate from the same cage, which would be a nice design).

2. The control animals should really undergo as similar an experience (sans learning paradigm) to the experimental animals as possible. I assume this behavioral paradigm involves cage movements to some sort of novel environment. In that case, the control animals should receive similar handling and then sit in a cage for a matched amount of time. This is really the same as you would design a learning test.

3. I have no idea what the point of that would be. If you have raw reads from different animals, just use that to gauge variability.

I can tell you from experience with combining NGS to animal behaviour, that you might want to do a quick pilot experiment before doing the full thing. It's hard to a priori gauge what sort of effect sizes you're going to get (we've learned that the hard/expensive way).
dpryan is offline   Reply With Quote
Old 10-25-2013, 04:50 AM   #3
Location: Virginia

Join Date: Mar 2011
Posts: 72

I agree with Devon on questions 1 and 2. For 3: If I understand what you are asking, no. I think you are asking if you can put all your samples together to get 10M reads per lane and then run 3 lanes to get a total of 30M reads. This does not work as the dynamic ranqe queried will be 3x less than if you pool less samples to get more reads in a single lane per sample. I have seen several sequencing centers do this, particularly when a run gets low read counts, ie they get 120M reads but promised 150M so they give you an additional fractional lane of data to get 30M more reads. For RNAseq, this does not work (and I would argue for DNAseq this as well). What you should really do is look at your experimental design with replicates etc and randomize the run list across the lanes (and flow cells) to control for run batch effects.
bioBob is offline   Reply With Quote

biostatistics, illumina, statistical design

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 04:09 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO