SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
ORF prediction in assembled NGS metagenomic/metatranscriptomic data someperson Bioinformatics 5 07-03-2013 08:22 AM
Species abundance and rarefaction in metagenomics data OliverDeusch Metagenomics 1 12-13-2012 08:17 AM
Metatranscriptomic data(paired-end,Illumina)mapping? jojohan Illumina/Solexa 2 09-04-2012 05:59 PM
Mapping sRNAs using Bowtie - Raw abundance vs norm abundance atulkakrana RNA Sequencing 0 06-02-2012 11:40 AM

Reply
 
Thread Tools
Old 05-06-2014, 08:24 PM   #1
lbragg
Member
 
Location: Brisbane

Join Date: Sep 2009
Posts: 14
Default Accounting for species abundance changes in metatranscriptomic data

In metatranscriptomic data, transcript abundance in a sample is influenced by species abundance as well as expression changes.

For anyone working with this data, how do you deal with the fact that species abundance fluctuation can lead to spurious DE calls? I am using DESeq2, and was wondering whether abundance of a 'housekeeping' gene can be (or has been) used to adjust for species abundance differences between samples.

Cheers,

Lauren
lbragg is offline   Reply With Quote
Old 05-11-2014, 05:12 PM   #2
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

You could consider performing analysis at the different scales you are interested in.

Let's say we have a reference sample which has counts of 8 for all rows. (In DESeq, such a reference sample is constructed by taking the geometric mean of all samples).

Then we have sample X which has counts:

species A gene 1: 8
species A gene 2: 8
...
species A gene 100: 8
species B gene 1: 4
species B gene 2: 4
...
species B gene 19: 4
species B gene 20: 2

If you use the median ratio method for estimating size factors over all 120 rows, the size factor for sample X would be 1. The genes 1-19 for species B will get log2 fold change of -1 and the gene 20 for species B will get log2 fold change of -2.

However, if you subset to the species B rows, sample X would get a size factor of 0.5 from the median ratio method. Then all genes except gene 20 will get a log2 fold change of 0, while gene 20 will have a log2 fold change of -1.

If you want to create a matrix of size factors for every row and every sample (called normalization factors in DESeq2), you could use the estimateSizeFactorsForMatrix function, and applying it to different sets of rows.

Last edited by Michael Love; 05-12-2014 at 02:53 AM.
Michael Love is offline   Reply With Quote
Old 05-11-2014, 08:34 PM   #3
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It's not clear to me why you would want to account for species abundance changes. What's important is the amount of RNA in the sample dedicated to some purpose. Does it matter whether the level of some transcript is higher because species with that gene reproduced more, versus species remaining unchanged but generating more of that transcript per cell? Either way, it become more expressed in the community.
Brian Bushnell is offline   Reply With Quote
Old 05-11-2014, 08:36 PM   #4
lbragg
Member
 
Location: Brisbane

Join Date: Sep 2009
Posts: 14
Default

So is this the same as analysing each organism individually?

The only case I can imagine this missing is when an organism ramps up expression for all genes (or almost all genes), then above approach may not detect upregulation. I don't know how likely this scenario is though.
I guess this is where the species/sample-specific normalisation factors would come in. I'll have to think on that one!
lbragg is offline   Reply With Quote
Old 05-11-2014, 08:50 PM   #5
lbragg
Member
 
Location: Brisbane

Join Date: Sep 2009
Posts: 14
Default

Quote:
Originally Posted by Brian Bushnell View Post
It's not clear to me why you would want to account for species abundance changes. What's important is the amount of RNA in the sample dedicated to some purpose. Does it matter whether the level of some transcript is higher because species with that gene reproduced more, versus species remaining unchanged but generating more of that transcript per cell? Either way, it become more expressed in the community.
I can understand where you are coming from, however, I'll just stress that our study is species-centric (it's a simple model community of phylogenetically distinct species), where we are interested in how species are interacting and competing with each other in the community for limited resources. Thus, the origin of the transcripts is important! If a species decreases substantially in abundance, but upregulates an alternative metabolic pathway at the same time, we definitely would like to detect that upregulation (rather than just observing that all the transcripts from that species have decreased in abundance).
lbragg is offline   Reply With Quote
Old 05-11-2014, 09:25 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Sounds difficult. Have you considered simultaneously sequencing DNA and RNA? That would probably be the most accurate way to normalize for population changes.
Brian Bushnell is offline   Reply With Quote
Old 05-11-2014, 09:33 PM   #7
lbragg
Member
 
Location: Brisbane

Join Date: Sep 2009
Posts: 14
Default

That definitely would be a good idea -- but unlikely to happen for this study (as is often the case!).
lbragg is offline   Reply With Quote
Old 06-25-2015, 04:06 PM   #8
lbragg
Member
 
Location: Brisbane

Join Date: Sep 2009
Posts: 14
Default

Quote:
Originally Posted by Michael Love View Post
You could consider performing analysis at the different scales you are interested in.

Let's say we have a reference sample which has counts of 8 for all rows. (In DESeq, such a reference sample is constructed by taking the geometric mean of all samples).

Then we have sample X which has counts:

species A gene 1: 8
species A gene 2: 8
...
species A gene 100: 8
species B gene 1: 4
species B gene 2: 4
...
species B gene 19: 4
species B gene 20: 2

If you use the median ratio method for estimating size factors over all 120 rows, the size factor for sample X would be 1. The genes 1-19 for species B will get log2 fold change of -1 and the gene 20 for species B will get log2 fold change of -2.

However, if you subset to the species B rows, sample X would get a size factor of 0.5 from the median ratio method. Then all genes except gene 20 will get a log2 fold change of 0, while gene 20 will have a log2 fold change of -1.

If you want to create a matrix of size factors for every row and every sample (called normalization factors in DESeq2), you could use the estimateSizeFactorsForMatrix function, and applying it to different sets of rows.
Sorry to drag this question back up, but it relates to issue I had in the other thread -- I have a sample where, for a given species A, the sample has no observations of any genes from species A. So the calculated normalisation factor is NA for these genes. Got any suggestions for what I should do? Thanks!
lbragg is offline   Reply With Quote
Old 08-12-2016, 01:17 AM   #9
chc*
Junior Member
 
Location: Devon, UK

Join Date: Jan 2016
Posts: 6
Default Normalising metatranscriptomic data with metagenomic data

Quote:
Originally Posted by Brian Bushnell View Post
Sounds difficult. Have you considered simultaneously sequencing DNA and RNA? That would probably be the most accurate way to normalize for population changes.
Hi Brian, I am currently setting up such an experiment as you suggest above where I want to know "who is doing what" in a community under two different conditions. Your suggestion is exactly what I want to achieve but I am unsure how exactly- do you have any ideas on this please?
chc* is offline   Reply With Quote
Old 08-12-2016, 07:39 AM   #10
harlequin
Junior Member
 
Location: Europe

Join Date: Oct 2012
Posts: 7
Default

Quote:
Originally Posted by lbragg View Post
In metatranscriptomic data, transcript abundance in a sample is influenced by species abundance as well as expression changes.

For anyone working with this data, how do you deal with the fact that species abundance fluctuation can lead to spurious DE calls? I am using DESeq2, and was wondering whether abundance of a 'housekeeping' gene can be (or has been) used to adjust for species abundance differences between samples.

Cheers,

Lauren
It's not possible to account for such changes using RNA-Seq data only. You always need another dimension. Check the following two out

http://www.nature.com/ismej/journal/...ej201640a.html
http://mbio.asm.org/content/5/2/e01012-14.abstract
harlequin is offline   Reply With Quote
Reply

Tags
deseq rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO