SEQanswers

Go Back   SEQanswers > Applications Forums > Metagenomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: Functional metagenomics to mine the human gut microbiome for dietary fiber ca Newsbot! Literature Watch 0 09-16-2010 04:30 AM

Reply
 
Thread Tools
Old 06-13-2016, 11:37 PM   #1
bloosnail
Member
 
Location: Pittsburgh

Join Date: Jul 2015
Posts: 17
Default Question about whole genome metagenomics for the microbiome?

Hello,

Thank you for taking the time to read my post. I currently have whole genome metagenomic data taken from the eye, with the reads being a mix of human DNA and a number of different bacterial species' DNA. With the reads being paired-end, I am wondering if it can be that two reads from a pair can be from the genome of two different organisms, thus making the paired-end alignment invalid with the reads being able to be aligned as single-ended? I tried doing alignment with Bowtie2, which has an option to look for single-ended alignments after paired-end alignments fail -- when I looked at the unique read IDs that were mapped, there was a fairly large increase (~30% or so), which is a good amount larger than I expected, based on past sequencing projects I've worked with.

When trimming the reads for quality, I am trying to decide the parameters for removing pairs where one read may have many bases removed, whether to keep these reads as single-ended, etc. Any help is greatly appreciated.
bloosnail is offline   Reply With Quote
Old 06-14-2016, 07:47 AM   #2
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 196
Default

That is a fairly substantial portion of single read alignments. What does your FastQC output look like? Generally read 1 has slightly higher quality than read 2, so I wonder if most of your "additional" mapping comes from one read versus the other.

Just my 2 cents, but for removing human DNA I take the conservative approach of removing all read pairs in which either of the reads maps to human in any fashion. Note that bowtie2's --al-conc arguments don't support this.
fanli is offline   Reply With Quote
Old 06-14-2016, 09:49 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

@bloosnail: I am going to suggest that you try bbsplit.sh from BBMap to separate the human reads from the others before doing any additional analysis.

Chances of two ends of a fragment coming from two different organisms are not great/logical.
GenoMax is offline   Reply With Quote
Old 06-14-2016, 11:42 AM   #4
bloosnail
Member
 
Location: Pittsburgh

Join Date: Jul 2015
Posts: 17
Default

@fanli, I was unaware that the first read in a pair generally has better quality, thank you for pointing that out. I have run FastQC and it appears that in general the first reads have much better quality than the second, although neither passes the per base sequence quality test -- possibly because of the longish read lengths of 125 bp. Below are a couple of the graphics (sorry I tried to upload the images directly and it didn't seem to work):

First reads
https://gyazo.com/5a7316f1aa3af931794f1feeea74539b

Second reads
https://gyazo.com/d0db3ac387f7777dbad0ed8bb74056b4

@GenoMax, thank you for your suggestion, I forgot to mention that I have run a software called BMTagger (https://www.westgrid.ca/support/software/bmtagger) to remove human reads. I think you are right though, it does seem unlikely that reads from a pair would map to different organisms, or would have a major effect on overall mapping.
bloosnail is offline   Reply With Quote
Old 06-14-2016, 11:53 AM   #5
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 196
Default

Yikes, those quality scores aren't great. My guess is that a lot of the single-end alignments you're getting are because you have a fairly high error rate. This might be problematic if you are going to do de novo assembly of your metagenomes.

This is just my opinion, but I would recommend BWA or Bowtie mapping in place of BMTagger.
fanli is offline   Reply With Quote
Old 06-14-2016, 12:06 PM   #6
bloosnail
Member
 
Location: Pittsburgh

Join Date: Jul 2015
Posts: 17
Default

I agree with you that the single-endedness may be because of the high error rates. For now I suppose I will keep single reads even if their pair does not pass quality control since it seems much of the error comes from the second reads, instead of throwing out the whole pair. We do not plan on doing de novo assembly, just seeing which species the reads align back to out of all known microbiome reference genomes.

Oh, I meant that BMTagger was used just to remove human reads first -- after the bacterial DNA is filtered out it is mapped back to many bacterial genomes using Bowtie2.
bloosnail is offline   Reply With Quote
Old 06-15-2016, 07:06 AM   #7
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 196
Default

Out of curiosity, will you let me know if you "see" a lot of Retroviridae when you do the metagenomics? I had this issue and it turns out they are actually ERVs in the human genome that don't get nicely filtered out.

Also, you might be interested in trying an alignment-free method for taxonomic classification (e.g. kraken, CLARK, etc.)
fanli is offline   Reply With Quote
Old 06-15-2016, 10:47 AM   #8
bloosnail
Member
 
Location: Pittsburgh

Join Date: Jul 2015
Posts: 17
Default

We only looked at bacterial DNA so far, if you have reference genomes for the viruses in question I can try aligning back to them and let you know. Thank you for your suggestion about the alignment-free methods, I had not heard of these before and they seem to save a lot of time, I will look into it.

Also I am curious, would you know of a method to estimate the actual count of microbes in the sample? I have been looking at software like HUMAnN (https://huttenhower.sph.harvard.edu/humann) to look at the gene pathways present and have a script to calculate the relative abundances of the aligned bacteria based on reads and genome length, but am unsure of a way to get the actual number of microbes.
bloosnail is offline   Reply With Quote
Old 06-15-2016, 11:25 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

I guess you are referring to diversity of microbes and not the actual number (since there would be no way to estimate that).
GenoMax is offline   Reply With Quote
Old 06-15-2016, 11:49 AM   #10
bloosnail
Member
 
Location: Pittsburgh

Join Date: Jul 2015
Posts: 17
Default

Oh, yes I was referring to the actual number. So getting the number must be done with the real bacteria before sequencing?
bloosnail is offline   Reply With Quote
Old 06-15-2016, 10:27 PM   #11
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 196
Default

You'd need to do qPCR to quantify the actual number of microbes, and even that is difficult unless you have a really good set of standards.

For viral genomes, you can try the Viral Refseq genomes. See this useful blog post for instructions on how to build the kraken database:
http://www.opiniomics.org/building-a...no-gi-numbers/
fanli is offline   Reply With Quote
Old 07-20-2016, 08:36 AM   #12
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

Not sure if I get the question....you have reference genomes for all the bacterial data in your sample? And that then only 1 read of the PEs map to these genomes. That's how it sounds like for me.

Because normally you'll not have the exact references, and you'll have a fragmented assembly from the reads, and in this case it can obviously be that the PEs map on different fragments.
bastianwur is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:19 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO