How many functionally-assigned reads is enough?


We are analyzing some 454 results annotated by MG-RAST v2. We have 6 samples in each category, e.g. A1, A2,...,A6 for category A and B1, B2,...,B6 for category B. (A and B are different treatments.) In each sample, we have between 15,000-40,000 reads with an assigned annotation.

Now, let's assume that A1 has 10 reads assigned to "carbohydrate processing". Is that enough to do a comparison with sample B6 which has 1,000 reads assigned to "carbohydrate processing"? Is 100 even enough to count as non-spurious? We assigned reads using an e-value threshold of 10^-5 and length of 100.

We would like to gauge (1) what is the minimal number of reads per sample, per functional category, that would be considered non-spurious and (2) what would be a good difference level between number (or percentage) of reads assigned to a certain function to say that the differences are significant between samples, and that those sample have different functional profiles?

