SEQanswers

Go Back   SEQanswers > Applications Forums > Metagenomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Minimum molecular weight needed for 150 bp PE whole genome resequencing mlouis Illumina/Solexa 6 05-15-2016 03:25 AM
minimum amount of DNA for analysis cngl 454 Pyrosequencing 6 08-12-2015 06:33 AM
Minimum amount of reads for de novo plant transcriptome assembly mruizm Bioinformatics 3 09-09-2013 02:54 AM
What is the minimum coverage needed? Annettet Illumina/Solexa 0 03-22-2011 03:41 AM
Minimum amount of total RNA for mRNA-seq asiebel Illumina/Solexa 0 02-24-2010 07:30 PM

Reply
 
Thread Tools
Old 12-10-2016, 01:05 PM   #1
bloosnail
Member
 
Location: Pittsburgh

Join Date: Jul 2015
Posts: 11
Default Minimum amount of data needed for reliable results?

We are trying to do analysis for whole genome metagenomic data taken from the surface of the eye. Each sample has millions of reads generated, but of those reads at most only 1-2% are bacterial reads. We are wondering if there is some information/resources about the amount of data available related to the reliability of the results eg. finding out the taxonomic information for bacteria down to species level that are greater than 1% relative abundance. Currently we are aligning the data to whole genome bacterial sequences, but there are many multi-mapping locations which many of which may be false positives. We have tried using Metaphlan2 to do alignment which uses a custom catalog of unique markers for different clades, but usually only several hundred reads will be mapped back -- many of the samples report very low/no species present. Specifically, we are wondering methods to do analysis for whole genome metagenomic sequences where the amount of data is very low. Any help is greatly appreciated.

Daniel
bloosnail is offline   Reply With Quote
Old 12-10-2016, 01:50 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,313
Default

You might try removing human sequence, then assembling the rest and BLASTing the contigs against nt/nr/RefSeq microbial. Assuming the contigs are longer than read length, they will give you more reliable hits. What kind of depth do you have for the bacteria? You can find that out with a kmer-frequency histogram, after human reads are removed.
Brian Bushnell is offline   Reply With Quote
Old 12-10-2016, 10:59 PM   #3
bloosnail
Member
 
Location: Pittsburgh

Join Date: Jul 2015
Posts: 11
Default

Thank you for the quick response. The idea of assembling the reads into contigs before alignment makes sense, I will let me supervisor know. Do you know of good software to do this? I have tried Velvet in the past but did not use it extensively.

I forgot to mention that we have removed human sequences, although the revised reference genome that you created seems like it would be especially useful for us.

Could you give more information on how to estimate the depth of the bacteria? There is generally less than 100,000 bacterial reads per sample out of 20-30 million initial reads (before any trimming/contaminant removal).
bloosnail is offline   Reply With Quote
Old 12-11-2016, 09:42 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,313
Default

I suggest Spades or Megahit for metagenome assembly. 100k is not many reads; you might not have sufficient depth for assembly. But in that case, you may get a better assembly by combining all bacterial reads from all samples and assembling together. Then you can quantify by mapping to the combined assembly.

For human removal, the raw human genome is fine in your case (bacteria). The masked version is mainly to allow decontamination of eukaryotes, which have shared sequence with human; bacteria basically don't.
Brian Bushnell is offline   Reply With Quote
Old 12-11-2016, 10:27 AM   #5
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 709
Default

You can create rarefaction curves to see if what you have is likely sufficient to describe the metagenomic profile.

The basic process is to remove reads and see if your calculation of the species diversity is similar. A low complexity sample will plateau at a low coverage, while the diversity of a high complexity sample will just keep increasing substantially with more reads.
gringer is offline   Reply With Quote
Old 12-12-2016, 04:30 PM   #6
dhtaft
Junior Member
 
Location: Davis, CA

Join Date: Dec 2016
Posts: 3
Default

I had some luck using IMSA in a similar situation to the one you describe, but only after human read removal
dhtaft is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO