Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Volcano plot with R

    Hello everyone!

    We are cooperating with an Institute that performed Illumina sequencing (HiSeq3000) for our RNA samples. They normalized and annotated the data using CLC Genomics Workbench 9. In the end, we received an Excel table containing the name of the gene, expression count, p, FDR, Bon, fold change and RPKM value.
    I wrote an R script to make a volcano plot (log2FC on the x-axis, -log10p on the y axis).


    The issues:
    (1) Turns out that roughly 66% of our genes have a p value of 1. I excluded these genes as they are plotted on the x-axis (log2(1)=0). Is it okay to pre-filter data for a volcano plot or do people usually plot the whole data set?
    (2) Another roughly 180 genes have a p-value of exactly 0. As I cannot calculate the logarithm of value 0, I first wanted to replace the zeros with the second smallest p value available in my dataset. However, as there are so many genes with p=0, it is hard to randomly assign a small p value without creating a suspicious pattern of dots in my plot. How do people plot genes with p=0?
    (3) We figured that maybe the p=0 and p=1 values are rounded values that appear when they ask the software to create an Excel file. Could that be possible?
    Our collaborator claims that none of the values are rounded. Yet, when they ask their automated software (CLC Genomics Workbench) to create a volcano plot, it looks normal, without any horizontal lines.

    Any input is greatly appreciated!!

    Best wishes
    DCseq

  • #2
    Why not ask them to export the normalized values from CLC (or better still the raw counts). You can do your own analysis (sounds like you are comfortable with R) with that data (e.g. DESeq2).

    Comment


    • #3
      I asked them for DESeq2 files but they replied they cannot give me such an output quoting the following:
      "
      ************************************
      Export of tables

      Tables can be exported in four different formats; CSV, tab-separated, Excel, or html. When exporting a table in CSV, tab-separated, or Excel format, numbers with many decimals are printed in the exported file with 10 decimals, or in 1.123E-5 format when the number is close to zero.

      When exporting a table in html format, data are exported with the number of decimals that have been defined in the workbench preference settings. When tables are exported in html format from the server or using command line tools, the default number of exported decimals is 3.
      ************************************
      "

      Nonetheless, they said they could give me BAM files. I have not worked with BAM files before. Would they be helpful in my case?

      Many thanks

      Comment


      • #4
        Everything would be fixed if you can get the bam files, then you can do your own analysis. Getting counts (in R) is easy, doing differential expression analysis isn't too hard.

        Comment


        • #5
          If you get the BAM files then you can use featureCounts (via R subread package) followed by DESeq2. You should ask them to let you know the exact genome build used (or better still ask them to provide corresponding GTF files) since you would need those for read counting using BAM files and featureCounts.

          Comment


          • #6
            Great, many thanks for your responses. I requested the GTF files from our collaborator and will let you know how everything goes!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X