Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about whole genome metagenomics for the microbiome?

    Hello,

    Thank you for taking the time to read my post. I currently have whole genome metagenomic data taken from the eye, with the reads being a mix of human DNA and a number of different bacterial species' DNA. With the reads being paired-end, I am wondering if it can be that two reads from a pair can be from the genome of two different organisms, thus making the paired-end alignment invalid with the reads being able to be aligned as single-ended? I tried doing alignment with Bowtie2, which has an option to look for single-ended alignments after paired-end alignments fail -- when I looked at the unique read IDs that were mapped, there was a fairly large increase (~30% or so), which is a good amount larger than I expected, based on past sequencing projects I've worked with.

    When trimming the reads for quality, I am trying to decide the parameters for removing pairs where one read may have many bases removed, whether to keep these reads as single-ended, etc. Any help is greatly appreciated.

  • #2
    That is a fairly substantial portion of single read alignments. What does your FastQC output look like? Generally read 1 has slightly higher quality than read 2, so I wonder if most of your "additional" mapping comes from one read versus the other.

    Just my 2 cents, but for removing human DNA I take the conservative approach of removing all read pairs in which either of the reads maps to human in any fashion. Note that bowtie2's --al-conc arguments don't support this.

    Comment


    • #3
      @bloosnail: I am going to suggest that you try bbsplit.sh from BBMap to separate the human reads from the others before doing any additional analysis.

      Chances of two ends of a fragment coming from two different organisms are not great/logical.

      Comment


      • #4
        @fanli, I was unaware that the first read in a pair generally has better quality, thank you for pointing that out. I have run FastQC and it appears that in general the first reads have much better quality than the second, although neither passes the per base sequence quality test -- possibly because of the longish read lengths of 125 bp. Below are a couple of the graphics (sorry I tried to upload the images directly and it didn't seem to work):

        First reads


        Second reads


        @GenoMax, thank you for your suggestion, I forgot to mention that I have run a software called BMTagger (https://www.westgrid.ca/support/software/bmtagger) to remove human reads. I think you are right though, it does seem unlikely that reads from a pair would map to different organisms, or would have a major effect on overall mapping.

        Comment


        • #5
          Yikes, those quality scores aren't great. My guess is that a lot of the single-end alignments you're getting are because you have a fairly high error rate. This might be problematic if you are going to do de novo assembly of your metagenomes.

          This is just my opinion, but I would recommend BWA or Bowtie mapping in place of BMTagger.

          Comment


          • #6
            I agree with you that the single-endedness may be because of the high error rates. For now I suppose I will keep single reads even if their pair does not pass quality control since it seems much of the error comes from the second reads, instead of throwing out the whole pair. We do not plan on doing de novo assembly, just seeing which species the reads align back to out of all known microbiome reference genomes.

            Oh, I meant that BMTagger was used just to remove human reads first -- after the bacterial DNA is filtered out it is mapped back to many bacterial genomes using Bowtie2.

            Comment


            • #7
              Out of curiosity, will you let me know if you "see" a lot of Retroviridae when you do the metagenomics? I had this issue and it turns out they are actually ERVs in the human genome that don't get nicely filtered out.

              Also, you might be interested in trying an alignment-free method for taxonomic classification (e.g. kraken, CLARK, etc.)

              Comment


              • #8
                We only looked at bacterial DNA so far, if you have reference genomes for the viruses in question I can try aligning back to them and let you know. Thank you for your suggestion about the alignment-free methods, I had not heard of these before and they seem to save a lot of time, I will look into it.

                Also I am curious, would you know of a method to estimate the actual count of microbes in the sample? I have been looking at software like HUMAnN (https://huttenhower.sph.harvard.edu/humann) to look at the gene pathways present and have a script to calculate the relative abundances of the aligned bacteria based on reads and genome length, but am unsure of a way to get the actual number of microbes.

                Comment


                • #9
                  I guess you are referring to diversity of microbes and not the actual number (since there would be no way to estimate that).

                  Comment


                  • #10
                    Oh, yes I was referring to the actual number. So getting the number must be done with the real bacteria before sequencing?

                    Comment


                    • #11
                      You'd need to do qPCR to quantify the actual number of microbes, and even that is difficult unless you have a really good set of standards.

                      For viral genomes, you can try the Viral Refseq genomes. See this useful blog post for instructions on how to build the kraken database:

                      Comment


                      • #12
                        Not sure if I get the question....you have reference genomes for all the bacterial data in your sample? And that then only 1 read of the PEs map to these genomes. That's how it sounds like for me.

                        Because normally you'll not have the exact references, and you'll have a fragmented assembly from the reads, and in this case it can obviously be that the PEs map on different fragments.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM
                        • seqadmin
                          The Impact of AI in Genomic Medicine
                          by seqadmin



                          Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                          02-26-2024, 02:07 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-14-2024, 06:13 AM
                        0 responses
                        32 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-08-2024, 08:03 AM
                        0 responses
                        71 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-07-2024, 08:13 AM
                        0 responses
                        80 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-06-2024, 09:51 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X