SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cummerbund Volcano plot infinite value s18692001 Bioinformatics 0 05-30-2017 06:31 AM
volcano plots cummeRbund godzilla07 Bioinformatics 8 02-04-2015 12:55 PM
Inquiry: p-values & q-values for cummeRbund volcano? hyates RNA Sequencing 1 07-08-2014 05:18 AM
Volcano Plot showing weird scatter Parashar RNA Sequencing 4 03-11-2014 06:38 AM
RNA-Seq, volcano plot liuchenxi2004 Bioinformatics 1 07-18-2013 06:18 AM

Reply
 
Thread Tools
Old 07-12-2017, 06:11 AM   #1
DCseq
Junior Member
 
Location: Germany

Join Date: Jul 2017
Posts: 4
Default Volcano plot with R

Hello everyone!

We are cooperating with an Institute that performed Illumina sequencing (HiSeq3000) for our RNA samples. They normalized and annotated the data using CLC Genomics Workbench 9. In the end, we received an Excel table containing the name of the gene, expression count, p, FDR, Bon, fold change and RPKM value.
I wrote an R script to make a volcano plot (log2FC on the x-axis, -log10p on the y axis).


The issues:
(1) Turns out that roughly 66% of our genes have a p value of 1. I excluded these genes as they are plotted on the x-axis (log2(1)=0). Is it okay to pre-filter data for a volcano plot or do people usually plot the whole data set?
(2) Another roughly 180 genes have a p-value of exactly 0. As I cannot calculate the logarithm of value 0, I first wanted to replace the zeros with the second smallest p value available in my dataset. However, as there are so many genes with p=0, it is hard to randomly assign a small p value without creating a suspicious pattern of dots in my plot. How do people plot genes with p=0?
(3) We figured that maybe the p=0 and p=1 values are rounded values that appear when they ask the software to create an Excel file. Could that be possible?
Our collaborator claims that none of the values are rounded. Yet, when they ask their automated software (CLC Genomics Workbench) to create a volcano plot, it looks normal, without any horizontal lines.

Any input is greatly appreciated!!

Best wishes
DCseq
DCseq is offline   Reply With Quote
Old 07-12-2017, 01:13 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,380
Default

Why not ask them to export the normalized values from CLC (or better still the raw counts). You can do your own analysis (sounds like you are comfortable with R) with that data (e.g. DESeq2).
GenoMax is offline   Reply With Quote
Old 07-12-2017, 11:44 PM   #3
DCseq
Junior Member
 
Location: Germany

Join Date: Jul 2017
Posts: 4
Default

I asked them for DESeq2 files but they replied they cannot give me such an output quoting the following:
"
************************************
Export of tables

Tables can be exported in four different formats; CSV, tab-separated, Excel, or html. When exporting a table in CSV, tab-separated, or Excel format, numbers with many decimals are printed in the exported file with 10 decimals, or in 1.123E-5 format when the number is close to zero.

When exporting a table in html format, data are exported with the number of decimals that have been defined in the workbench preference settings. When tables are exported in html format from the server or using command line tools, the default number of exported decimals is 3.
************************************
"

Nonetheless, they said they could give me BAM files. I have not worked with BAM files before. Would they be helpful in my case?

Many thanks
DCseq is offline   Reply With Quote
Old 07-13-2017, 02:34 AM   #4
wdecoster
Member
 
Location: Antwerp, Belgium

Join Date: Oct 2015
Posts: 93
Default

Everything would be fixed if you can get the bam files, then you can do your own analysis. Getting counts (in R) is easy, doing differential expression analysis isn't too hard.
wdecoster is offline   Reply With Quote
Old 07-13-2017, 07:47 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,380
Default

If you get the BAM files then you can use featureCounts (via R subread package) followed by DESeq2. You should ask them to let you know the exact genome build used (or better still ask them to provide corresponding GTF files) since you would need those for read counting using BAM files and featureCounts.
GenoMax is offline   Reply With Quote
Old 07-14-2017, 12:44 AM   #6
DCseq
Junior Member
 
Location: Germany

Join Date: Jul 2017
Posts: 4
Default

Great, many thanks for your responses. I requested the GTF files from our collaborator and will let you know how everything goes!
DCseq is offline   Reply With Quote
Reply

Tags
clc genomics, rstudio, volcano plot

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO