SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
False negative variant calling in haploids. Call variants using coverage (not stats)? Genomics101 Bioinformatics 3 06-07-2012 11:29 AM
samtools false positive/negative dnusol Bioinformatics 0 11-08-2011 07:13 AM
tophatfusion false positive results lichenbiostat RNA Sequencing 0 10-30-2011 09:37 AM

Reply
 
Thread Tools
Old 01-04-2019, 05:02 AM   #1
K1K
Junior Member
 
Location: Saxony

Join Date: Jan 2019
Posts: 5
Default false negative results in rna sequencing data sets

Dear experts,

I just started working in the field of RNA sequencing and would be very grateful to get an advice from you.

I started with a set of two patients and two wild types in the hope of identifying a gene, which is not expressed in the patients. I used the pipeline and recipe of the genepattern workspace (http://recipes.genomespace.org/view/6) and I finally get a list with the FPKMs of the four different samples. There were two genes, which were virtually not expressed in both patients (FPKM = 0) while both wild types showed high expression (gene 1: FPKM > 8000; gene 2: > 3000). The status was marked as "High Data". Moreover, no errors occured during the execution. The problem is now, that these two genes were actually expressed in the patients (verified by qRT-PCR, mass spec and western blotting). So, do you have any idea, how this false result emerges?

Thanks a lot!
K1K is offline   Reply With Quote
Old 01-07-2019, 03:28 AM   #2
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 378
Default

Far too many reasons to possibly speculate on from incorrectly running the analysis, to low read coverage in your patient samples to QC differences between the samples to differences is sample collection.

But you have hit the very clear problem of not having anywhere near enough samples to make an informed decision about this question.
Bukowski is offline   Reply With Quote
Old 01-07-2019, 04:20 AM   #3
K1K
Junior Member
 
Location: Saxony

Join Date: Jan 2019
Posts: 5
Default

Thanks a lot for your reply!
I prepared all samples together and according to the same protocol. Also, Ive run the analysis multiple times with the same result. The raw reads in the patients looked good, so that the read coverage should not be a problem.
You are absolutely right, that 2 samples are not enough, but the two patients are siblings from highly consanguineous parents and we are searching for a complete loss of a protein, not only differential differences.
K1K is offline   Reply With Quote
Old 01-07-2019, 04:29 AM   #4
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 378
Default

So have you visualised the read aligments in the region in the patients?
Bukowski is offline   Reply With Quote
Old 01-07-2019, 11:29 PM   #5
K1K
Junior Member
 
Location: Saxony

Join Date: Jan 2019
Posts: 5
Default

Do you mean a visualization for example with the igv tool? Or do you prefer an other one?
K1K is offline   Reply With Quote
Old 01-07-2019, 11:35 PM   #6
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 378
Default

IGV would be fine - I'm just interested in whether there are any reads mapping to that region
Bukowski is offline   Reply With Quote
Old 01-21-2019, 01:11 AM   #7
K1K
Junior Member
 
Location: Saxony

Join Date: Jan 2019
Posts: 5
Default

Hi,

Im so sorry for the late reply, but I decided to start the analysis again and then have a look at the igv, as you recommended.

Now, I have checked both genes in the igv and there are reads for every exon actually. But, I dont understand, why I get a FPKM = 0, if there are so many reads. I dont know, what to change in future analyses.

Best,

K1K
K1K is offline   Reply With Quote
Old 01-21-2019, 01:30 AM   #8
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 378
Default

I just looked at the GenePattern workflow. I can think of a number of reasons why this might be happening still, but this workflow is EXTREMELY dated. It uses Tophat, which was superseded by Tophat2, which was superseded by HISAT which was superseded by HISAT2. There's insufficient detail on how it manages the cuffdiff workflow.

I'd seriously consider another route for this, perhaps looking for a documented HISAT2 > cuffdiff analysis workflow on Galaxy if you can't run the analysis via the command line.
Bukowski is offline   Reply With Quote
Old 01-21-2019, 04:11 AM   #9
K1K
Junior Member
 
Location: Saxony

Join Date: Jan 2019
Posts: 5
Default

Thanks a lot for your help. In the meantime, I also read that tophat is that dated.

Ill try to run the analysis with the HISAT2 > cuffdiff analysis pipeline and hope to get more reliable results.
K1K is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:19 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO