SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FPKM/TPM to raw read counts??????? jpg2006 Bioinformatics 1 06-28-2017 11:01 AM
mapping bacteria metatranscriptome to genomes Francy87 RNA Sequencing 0 10-28-2014 01:04 PM
metatranscriptome mapping Francy87 Bioinformatics 2 09-24-2013 08:08 PM
Low Cluster PF % and highly skewed Base % sequencer13 Illumina/Solexa 17 06-23-2013 03:28 AM

Reply
 
Thread Tools
Old 01-23-2020, 08:19 AM   #1
brandonkieft
Junior Member
 
Location: Vancouver, Canada

Join Date: Jan 2020
Posts: 1
Default Metatranscriptome read mapping yielding a highly left skewed ORF TPM distribution

I'm going to jump right into the problem and put details below for clarification:

What are the possible causes of mapping high quality paired end metatransriptome reads back to high quality assembled contigs, calculating TPM for predicted ORFs on the contigs, then having just the top few ORFs getting assigned 50-60% of total TPM and recruiting 30-40% of reads? Sometimes these "highly abundant" ORFs were real genes and sometimes pseudogenes, and whey they're removed from the mapping files and the procedure is repeated, a new set of ORFs get 50-60% of TPM and recruit many reads. Any ideas??

We conducted mRNA sequencing of a complex microbial community (metatranscriptomics) using Illumina HiSeq 150 bp on a few dozen time-series samples. Sequencing reactions seemed to work very well, yielding ~10 million quality filtered merged fastq reads per sample. We assembled with MetaHit and got reasonable n50 (1500) and number of contigs (150k). We predicted open reading frames (ORFs) on contigs using prodigal and annotated taxonomy with an in-house program and function (clusters of orthologous genes; COGs) with rpsBLAST to the NR protein database. Based on the number of ORFs annotated to different COGs (i.e., a sample's "functional distribution") of the metatranscriptomes, we were very happy with our results and got a consistent functional profile across samples that make sense biologically. When we used BWA MEM to map reads back to the assemblies and Salmon to calculate TPM from the BAM/gff files, there was an extremely biased distribution of ORF TPM. By this I mean that we had 150-200k ORFs predicted per sample (after length filtering for only predicted ORFs > 60 amino acids), but in many samples a single or a few ORFs were getting 400-700k TPM, half or more of the total TPM - this should be more evenly distributed among the ORFs, I assume. When we took the ORF TPM and functional annotation together and plotted function over time, we got nonsensical results. When we looked at the ORFs that recruited tons of reads and got assigned high TPM, they're sometimes bona fide genes with functions and high homology to database genes, and sometimes nothing and look like pseudogenes. As a test, we removed all ORFs that had >10% of total TPM from the mapping files and reran BWA (to see if this was actually a biological signal and the reads did actually come from these ORFs and we'd get lower read mapping/more even TPM) - in fact, new ORFs "took the place" of the high TPM ones from the origingal analysis and we got the same skewed number of reads mapped and TPM distribution.

To convince ourselves this was not purely methodological, we did concurrent metagenome sequencing, assembly, read mapping, and TPM calculation using the same exact procedures and got good results that make sense and give expected TPM distribution of both ORFs and aggregate functions (i.e., the top ORFs recruit ~0.5% of total reads and TPM and aggregate TPM at the functional category level give "correct" results).
brandonkieft is offline   Reply With Quote
Old 01-25-2020, 09:52 AM   #2
Cresil
Junior Member
 
Location: Netherlands

Join Date: Jan 2020
Posts: 1
Default

Did you filter against rRNA genes and did you adapter trim the reads? Especially the adapter may cause issues in the assembly.
Cresil is offline   Reply With Quote
Reply

Tags
assembly, illumina, metatranscriptomics, read mapping, tpm

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO