Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Example RNA-seq datasets with low and high false-positive rates

    Hello,

    I am trying to obtain two example RNA-seq datasets. One has verified low false positive rate, and the other has verified high false positive rate.

    Specifically, I am hoping to obtain 3 things for each dataset:

    1) The processed count table (filtered, normalized, and whatever else) that was directly fed into the model that created the DEG list.

    2) The DEG list (simply which rows of the count table were designated DEGs)

    3) An estimated false positive rate (or similar metric) showing how reliable the DEG list is. Maybe from some golden standard type of procedure. For one dataset, this rate is high. For the other dataset, this rate is low.

    If I need the processed count table and DEG list myself that is of course fine too. I am just hoping it is clear and reproducible documentation.

    I would be very grateful to hear from anyone who has knowledge even of just one of these datasets too. Thank you for any input!

  • #2
    You could simulate them yourself to have precise control over the "truth".

    Comment


    • #3
      Thank you, I am trying to use real (not simulated) RNA-seq data.

      Comment


      • #4
        You will find plenty of real datasets which will (claim to have) low false positive rates (everyone wants to achieve that) but it may be hard to find a real dataset that has high false positive rate (since no reviewer would accept that).

        Comment


        • #5
          Thanks GenoMax.

          1) I agree it might be hard to find a high false-positive rate example on its own. However, if that is the case, I am hoping to find an easily-reproducible example of a dataset that, say, has high false-positive rate when analyzed one way, but low false-positive rate when analyzed another way. This might be available in studies promoting a certain methodology. I am very interested in seeing what DEGs looks like (by counts) when they come from established high false positive rate.

          2) I do have one dataset that returns a suspiciously large number of DEGs (through edgeR, DESeq, and limmaVoom). However, when I look at the DEGs (view their counts), I do not see much larger variation between treatment groups than between replicates as expected. This makes me *suspect* many of these DEGs are false positive calls. However, I am looking for a dataset which has been compared to some *standard* that shows it indeed has a high false positive rate, and unfortunately, I do not know of a way to do that with my data. Hence, I am trying to find a public dataset.

          Comment


          • #6
            RNA-seq differential expression methods are known to be affected by outliers. You have used edgeR to analyse the dataset. What dispersion estimation variety did you use? If you have patient replicates, you should use the robust variety of dispersion estimation. The default method is only useful if you are analysing replicates of cell lines (e.g. 3 replicates of PrEC and 3 replicates of LNCaP), which aren't representative of biological tissue and the heterogeneity of it. There's also a robust style of limma analysis you could be using.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X