Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Volcano plot with R

    Hello everyone!

    We are cooperating with an Institute that performed Illumina sequencing (HiSeq3000) for our RNA samples. They normalized and annotated the data using CLC Genomics Workbench 9. In the end, we received an Excel table containing the name of the gene, expression count, p, FDR, Bon, fold change and RPKM value.
    I wrote an R script to make a volcano plot (log2FC on the x-axis, -log10p on the y axis).


    The issues:
    (1) Turns out that roughly 66% of our genes have a p value of 1. I excluded these genes as they are plotted on the x-axis (log2(1)=0). Is it okay to pre-filter data for a volcano plot or do people usually plot the whole data set?
    (2) Another roughly 180 genes have a p-value of exactly 0. As I cannot calculate the logarithm of value 0, I first wanted to replace the zeros with the second smallest p value available in my dataset. However, as there are so many genes with p=0, it is hard to randomly assign a small p value without creating a suspicious pattern of dots in my plot. How do people plot genes with p=0?
    (3) We figured that maybe the p=0 and p=1 values are rounded values that appear when they ask the software to create an Excel file. Could that be possible?
    Our collaborator claims that none of the values are rounded. Yet, when they ask their automated software (CLC Genomics Workbench) to create a volcano plot, it looks normal, without any horizontal lines.

    Any input is greatly appreciated!!

    Best wishes
    DCseq

  • #2
    Why not ask them to export the normalized values from CLC (or better still the raw counts). You can do your own analysis (sounds like you are comfortable with R) with that data (e.g. DESeq2).

    Comment


    • #3
      I asked them for DESeq2 files but they replied they cannot give me such an output quoting the following:
      "
      ************************************
      Export of tables

      Tables can be exported in four different formats; CSV, tab-separated, Excel, or html. When exporting a table in CSV, tab-separated, or Excel format, numbers with many decimals are printed in the exported file with 10 decimals, or in 1.123E-5 format when the number is close to zero.

      When exporting a table in html format, data are exported with the number of decimals that have been defined in the workbench preference settings. When tables are exported in html format from the server or using command line tools, the default number of exported decimals is 3.
      ************************************
      "

      Nonetheless, they said they could give me BAM files. I have not worked with BAM files before. Would they be helpful in my case?

      Many thanks

      Comment


      • #4
        Everything would be fixed if you can get the bam files, then you can do your own analysis. Getting counts (in R) is easy, doing differential expression analysis isn't too hard.

        Comment


        • #5
          If you get the BAM files then you can use featureCounts (via R subread package) followed by DESeq2. You should ask them to let you know the exact genome build used (or better still ask them to provide corresponding GTF files) since you would need those for read counting using BAM files and featureCounts.

          Comment


          • #6
            Great, many thanks for your responses. I requested the GTF files from our collaborator and will let you know how everything goes!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X