SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: Ribosomal RNA Depletion for Massively Parallel Bacterial RNA-Sequencing Appl Newsbot! Literature Watch 0 03-25-2011 03:00 AM

Reply
 
Thread Tools
Old 01-18-2018, 01:19 PM   #1
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 70
Default How should I count the percentage of messenger RNA and ribosomal RNA in the samples

Dear All

We have some E coli. total RNA-seq data and my PI would like to count the percentage of messenger RNA and ribosomal RNA in the data. My idea is mapping the data to the transcriptome and rRNA data separately and then count the reads#.

Do you think this is a doable plan? Actually, I have no idea where to find the transcriptome and rRNA reference for the E coli.

Is there is an alternate way to do this analysis? Any suggestion will be appreciated.

Thanks.
illuminaGA is offline   Reply With Quote
Old 01-18-2018, 01:49 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

You can find the rRNA sequences for E. coli here. You can align rest of the data to the genome and count the reads using a GTF file. I don't recollect if E. coli has overlapping reading frames but otherwise it should be straight forward to do the counts.
GenoMax is offline   Reply With Quote
Old 01-18-2018, 07:56 PM   #3
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 70
Default

Quote:
Originally Posted by GenoMax View Post
You can find the rRNA sequences for E. coli here. You can align rest of the data to the genome and count the reads using a GTF file. I don't recollect if E. coli has overlapping reading frames but otherwise it should be straight forward to do the counts.
Thank you so much.

Btw, how / where should I download the rRNA sequences? I explore the database for hours but still cannot figure out where to download the sequence.

Thanks again.

Al
illuminaGA is offline   Reply With Quote
Old 01-19-2018, 04:32 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

Use the links I included above. Then click on "nucleotide sequence" in the operations panel to the right.
GenoMax is offline   Reply With Quote
Old 01-23-2018, 11:14 AM   #5
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 70
Default

Quote:
Originally Posted by GenoMax View Post
Use the links I included above. Then click on "nucleotide sequence" in the operations panel to the right.
Great, Thanks,

So what should I do is copy those sequences into a text file and build as a reference and map the sample sequencing data to this reference, right?

One more question, can I just use the mappable reads number as the rRNA reads number?

Thanks a lot again.

AL
illuminaGA is offline   Reply With Quote
Old 02-05-2018, 07:19 AM   #6
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 70
Default

Quote:
Originally Posted by GenoMax View Post
Use the links I included above. Then click on "nucleotide sequence" in the operations panel to the right.

Dear GenoMax

I just finished the mapping to the rRNA reference and got 70% mapping rate. Is this a normal range for the mRNA seq? Can I say 70% reads are rRNA? how to understand the result? Could you please give me some pointers? Thank you so much.

AL
illuminaGA is offline   Reply With Quote
Old 02-05-2018, 07:34 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

You say mRNA seq but had you done any ribosomal RNA depletion (e.g. https://www.illumina.com/products/by...-bacteria.html) on your samples? If not, it is not surprising to see a large fraction of your sample to be rRNA. Unless you are working with rRNA that part of the sequence data is wasted (reason to do ribo-depletion).
GenoMax is offline   Reply With Quote
Old 02-05-2018, 07:49 PM   #8
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 70
Default

Quote:
Originally Posted by GenoMax View Post
You say mRNA seq but had you done any ribosomal RNA depletion (e.g. https://www.illumina.com/products/by...-bacteria.html) on your samples? If not, it is not surprising to see a large fraction of your sample to be rRNA. Unless you are working with rRNA that part of the sequence data is wasted (reason to do ribo-depletion).
Oh no. I will confirm with the lab to see what kit they use. how should I remove those reads from the raw reads? I saw someone said it's not necessary to do that.

Appreciate your help again.

AL
illuminaGA is offline   Reply With Quote
Old 02-06-2018, 05:43 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

You could extract the unmapped reads from the alignment you did (if you did include them in your alignment file) or redo the alignment and collect the unmapped reads in a separate file.

You could also ignore these reads when you do read counts. You would want to compare samples and make sure rRNA contamination levels are more or less the same across your pool of samples. You don't want one sample to have 70% rRNA and other 5% (if total number of reads are more or less similar).
GenoMax is offline   Reply With Quote
Old 02-07-2018, 07:48 AM   #10
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 70
Default

Quote:
Originally Posted by GenoMax View Post
You could extract the unmapped reads from the alignment you did (if you did include them in your alignment file) or redo the alignment and collect the unmapped reads in a separate file.

You could also ignore these reads when you do read counts. You would want to compare samples and make sure rRNA contamination levels are more or less the same across your pool of samples. You don't want one sample to have 70% rRNA and other 5% (if total number of reads are more or less similar).
Thanks a lot. I mapped the reads by Tophat, one of the output files is unmapped.bam. Can I just convert this file to a fastq file and map again?

Another sample mapping is still running, I will compare the % of rRNA.

Thanks again for your help

AL
illuminaGA is offline   Reply With Quote
Old 02-07-2018, 12:15 PM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

You should stop using TopHat for new projects. Use BBMap, STAR, HISAT2 etc.
GenoMax is offline   Reply With Quote
Old 02-09-2018, 05:54 AM   #12
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 70
Default

Quote:
Originally Posted by GenoMax View Post
You should stop using TopHat for new projects. Use BBMap, STAR, HISAT2 etc.
Ok, Thanks a lot.

Is there an alternative software for cufflink? It seems very slow.
illuminaGA is offline   Reply With Quote
Reply

Tags
ranseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:56 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO