Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Volcano plot with R

    Hello everyone!

    We are cooperating with an Institute that performed Illumina sequencing (HiSeq3000) for our RNA samples. They normalized and annotated the data using CLC Genomics Workbench 9. In the end, we received an Excel table containing the name of the gene, expression count, p, FDR, Bon, fold change and RPKM value.
    I wrote an R script to make a volcano plot (log2FC on the x-axis, -log10p on the y axis).


    The issues:
    (1) Turns out that roughly 66% of our genes have a p value of 1. I excluded these genes as they are plotted on the x-axis (log2(1)=0). Is it okay to pre-filter data for a volcano plot or do people usually plot the whole data set?
    (2) Another roughly 180 genes have a p-value of exactly 0. As I cannot calculate the logarithm of value 0, I first wanted to replace the zeros with the second smallest p value available in my dataset. However, as there are so many genes with p=0, it is hard to randomly assign a small p value without creating a suspicious pattern of dots in my plot. How do people plot genes with p=0?
    (3) We figured that maybe the p=0 and p=1 values are rounded values that appear when they ask the software to create an Excel file. Could that be possible?
    Our collaborator claims that none of the values are rounded. Yet, when they ask their automated software (CLC Genomics Workbench) to create a volcano plot, it looks normal, without any horizontal lines.

    Any input is greatly appreciated!!

    Best wishes
    DCseq

  • #2
    Why not ask them to export the normalized values from CLC (or better still the raw counts). You can do your own analysis (sounds like you are comfortable with R) with that data (e.g. DESeq2).

    Comment


    • #3
      I asked them for DESeq2 files but they replied they cannot give me such an output quoting the following:
      "
      ************************************
      Export of tables

      Tables can be exported in four different formats; CSV, tab-separated, Excel, or html. When exporting a table in CSV, tab-separated, or Excel format, numbers with many decimals are printed in the exported file with 10 decimals, or in 1.123E-5 format when the number is close to zero.

      When exporting a table in html format, data are exported with the number of decimals that have been defined in the workbench preference settings. When tables are exported in html format from the server or using command line tools, the default number of exported decimals is 3.
      ************************************
      "

      Nonetheless, they said they could give me BAM files. I have not worked with BAM files before. Would they be helpful in my case?

      Many thanks

      Comment


      • #4
        Everything would be fixed if you can get the bam files, then you can do your own analysis. Getting counts (in R) is easy, doing differential expression analysis isn't too hard.

        Comment


        • #5
          If you get the BAM files then you can use featureCounts (via R subread package) followed by DESeq2. You should ask them to let you know the exact genome build used (or better still ask them to provide corresponding GTF files) since you would need those for read counting using BAM files and featureCounts.

          Comment


          • #6
            Great, many thanks for your responses. I requested the GTF files from our collaborator and will let you know how everything goes!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Innovations in Spatial Biology
              by seqadmin


              Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

              3D Genomics
              While spatial biology often involves studying proteins and RNAs in their...
              01-01-2025, 07:30 PM
            • seqadmin
              Advancing Precision Medicine for Rare Diseases in Children
              by seqadmin




              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
              12-16-2024, 07:57 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 01-09-2025, 04:04 PM
            0 responses
            439 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 01-09-2025, 09:42 AM
            0 responses
            442 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 01-08-2025, 03:17 PM
            0 responses
            458 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 01-03-2025, 11:18 AM
            1 response
            50 views
            1 like
            Last Post Tonia
            by Tonia
             
            Working...
            X