SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tuxedo Pipeline Issue - Multiple Gene Hits per Transcript, High FPKM chimb Bioinformatics 1 08-07-2014 12:04 AM
Automating RNA-SEQ TUXEDO PIPELINE PBM Bioinformatics 1 04-27-2014 04:56 PM
Tuxedo Pipeline with single condition Daytwa RNA Sequencing 1 03-03-2014 01:04 PM
Difference between Multiple hits and Chimeric hits Maulik23 Bioinformatics 1 02-04-2013 07:57 PM
multiple FPKM problem for single gene in gene_exp.diff after running cuffdiff ngs RNA Sequencing 4 03-30-2011 01:55 PM

Reply
 
Thread Tools
Old 08-06-2014, 06:58 PM   #1
chimb
Junior Member
 
Location: Richmond, VA

Join Date: Aug 2014
Posts: 2
Default Tuxedo Pipeline Issue - Multiple Gene Hits per Transcript, High FPKM

Hello all!

** I already posted this in the Bioinformatics forum -- I'm not sure which one it should belong to-- my apologies. Admins, feel free to delete/merge my post as necessary **

- I've been analyzing some RNA-seq data using the Tuxedo pipeline and have been getting some peculiar results, which are especially noticeable in the tables of significant genes (and their differential expression data) I've attached.

- Some biological background: the experiment is looking at bacteria-bacteria interaction effects between Streptococcus sanguinis (Ss) and Porphyromonas gingivalis (Pg). There are numerous conditions and comparisons that were made using Cuffdiff, but the data I've attached is based on the comparison between the conditions:

Wild-type Ss (SK36) grown in isolation (sample_1)
--vs--
Wild-type Ss cultured with wild-type Pg (sample_2)

In this case, the cuffdiff run utilizes the Ss read alignments and uses the merged transcriptome of Ss across both conditions.

- In Sk--Sk_Pg_sig_genes.txt, I ran the data through the whole Tuxedo Pipeline using Trapnell et. al's protocol from Nature. Tophat, Cufflinks, Cuffmerge, Cuffdiff, cummeRbund -- all default commands/options. In cummeRbund, I used the getSig(), getGenes(), diffData() and featureNames() functions to merge together a table of the significantly diff-expressed genes (alpha=0.05), their differential expression data and their short names. Two peculiar things:

- Some transcripts report hits with multiple genes each (many gene_short_name's per transcript)

- FPKM (value_1 and value_2) are extremely high for some transcripts ~ 3089410 for one of them, which can't be possible.


- My PI and I suspected that tophat may be finding splice junctions that do not exist (I did not include "--no-novel-juncs" in my initial tophat runs). I imagine this would link together disparate stretches of DNA as a single transcript and garner multiple gene hits. That, or perhaps many genes overlapping across the same stretches of DNA in different reading frames (though I'd imagine cufflinks would account for that?).

- I tried running the whole pipeline again, but skipped the tophat step (which includes read fragmentation and splice junction discovery). I ran bowtie2 alone for the bare-read alignments, converted the output SAM to BAM, sorted it and fed it through cufflinks and the rest of the pipeline as normal. The result is (using the same extraction methods in cummeRbund): sig_genes_Sk-Sk_Pg_bt2.txt

~ Still, getting multiple gene hits per transcript.. and still getting extremely high FPKM values

**************************************************

- Have any of you experienced the same sort of problems? What might be causing this? Any suggestions for alternate methods for alignment, transcript construction or visualization? ... I realize the Tuxedo pipeline was designed with eukaryotic systems in mind so I'm not sure if it is, in whole or in part, unsuitable for prokaryotes.

Any input would be greatly appreciated!

Thanks!
chimb is offline   Reply With Quote
Reply

Tags
cufflinks, cummerbund 2, rna-seq advice, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:25 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO