SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina vs. Agilent Exome enrichment - higher false positive rates markusl Genomic Resequencing 9 09-22-2016 08:22 AM
why low mapping rates for RNA-seq with tophat2 IceWater RNA Sequencing 5 06-18-2012 12:41 PM

Reply
 
Thread Tools
Old 10-19-2017, 03:09 PM   #1
SuzuBell
Member
 
Location: U.S.A

Join Date: Nov 2013
Posts: 15
Talking Example RNA-seq datasets with low and high false-positive rates

Hello,

I am trying to obtain two example RNA-seq datasets. One has verified low false positive rate, and the other has verified high false positive rate.

Specifically, I am hoping to obtain 3 things for each dataset:

1) The processed count table (filtered, normalized, and whatever else) that was directly fed into the model that created the DEG list.

2) The DEG list (simply which rows of the count table were designated DEGs)

3) An estimated false positive rate (or similar metric) showing how reliable the DEG list is. Maybe from some golden standard type of procedure. For one dataset, this rate is high. For the other dataset, this rate is low.

If I need the processed count table and DEG list myself that is of course fine too. I am just hoping it is clear and reproducible documentation.

I would be very grateful to hear from anyone who has knowledge even of just one of these datasets too. Thank you for any input!
SuzuBell is offline   Reply With Quote
Old 10-20-2017, 03:02 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

You could simulate them yourself to have precise control over the "truth".
GenoMax is offline   Reply With Quote
Old 10-20-2017, 06:10 AM   #3
SuzuBell
Member
 
Location: U.S.A

Join Date: Nov 2013
Posts: 15
Default

Thank you, I am trying to use real (not simulated) RNA-seq data.
SuzuBell is offline   Reply With Quote
Old 10-20-2017, 07:53 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

You will find plenty of real datasets which will (claim to have) low false positive rates (everyone wants to achieve that) but it may be hard to find a real dataset that has high false positive rate (since no reviewer would accept that).
GenoMax is offline   Reply With Quote
Old 10-21-2017, 05:39 AM   #5
SuzuBell
Member
 
Location: U.S.A

Join Date: Nov 2013
Posts: 15
Default

Thanks GenoMax.

1) I agree it might be hard to find a high false-positive rate example on its own. However, if that is the case, I am hoping to find an easily-reproducible example of a dataset that, say, has high false-positive rate when analyzed one way, but low false-positive rate when analyzed another way. This might be available in studies promoting a certain methodology. I am very interested in seeing what DEGs looks like (by counts) when they come from established high false positive rate.

2) I do have one dataset that returns a suspiciously large number of DEGs (through edgeR, DESeq, and limmaVoom). However, when I look at the DEGs (view their counts), I do not see much larger variation between treatment groups than between replicates as expected. This makes me *suspect* many of these DEGs are false positive calls. However, I am looking for a dataset which has been compared to some *standard* that shows it indeed has a high false positive rate, and unfortunately, I do not know of a way to do that with my data. Hence, I am trying to find a public dataset.
SuzuBell is offline   Reply With Quote
Old 11-12-2017, 10:00 PM   #6
Dario1984
Senior Member
 
Location: Sydney, Australia

Join Date: Jun 2011
Posts: 160
Default

RNA-seq differential expression methods are known to be affected by outliers. You have used edgeR to analyse the dataset. What dispersion estimation variety did you use? If you have patient replicates, you should use the robust variety of dispersion estimation. The default method is only useful if you are analysing replicates of cell lines (e.g. 3 replicates of PrEC and 3 replicates of LNCaP), which aren't representative of biological tissue and the heterogeneity of it. There's also a robust style of limma analysis you could be using.
Dario1984 is offline   Reply With Quote
Reply

Tags
false positive, false positives, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:44 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO