SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
stringtie parameters sbcn Bioinformatics 6 02-07-2017 07:46 PM
Stringtie error tonup69 Bioinformatics 2 02-23-2016 05:33 AM
StringTie Output JazminO RNA Sequencing 2 02-03-2016 06:50 PM
StringTie software? jake13 Bioinformatics 5 05-29-2015 09:37 PM
Overlapping and non-Overlapping pair-end reads with Tophat senpeng Illumina/Solexa 4 10-16-2011 06:43 PM

Reply
 
Thread Tools
Old 04-26-2017, 03:19 AM   #1
pbarros
Junior Member
 
Location: Portugal

Join Date: Jul 2012
Posts: 7
Default StringTie - isoforms not overlapping

Hi,
I am using StringTie for transcriptome reconstruction and identification of new isoforms.
While I was exploring the "[file]-transcripts.gtf" output file from stringtie I found something that intrigued me... in the example below I show three isoforms resulting from the same gene ("STRG.14686") and the last one was present in the reference annotation. However the start and end coordinates do not match. The first two isoforms end at 394180 bp and 389422 bp, respectively, while the third starts at 396257 bp...


Quote:
scaffold_96 StringTie transcript 383404 394180 1000 + .gene_id "STRG.14686"; transcript_id "STRG.14686.1"; cov "72.829010"; FPKM "6.506798"; TPM "8.058224";
scaffold_96 StringTie transcript 383404 389422 1000 + .gene_id "STRG.14686"; transcript_id "STRG.14686.2"; cov "61.675678"; FPKM "5.510321"; TPM "6.824155";
scaffold_96 StringTie transcript 396257 398001 1000 + .gene_id "STRG.14686"; transcript_id "STRG.14686.3"; reference_id "scaffold_96.g39603.t1"; ref_gene_id "scaffold_96.g39603"; cov "2963.938721"; FPKM "264.808624"; TPM "327.947357";
Why is StringTie "clustering" these isoforms in the same gene?

Last edited by pbarros; 04-26-2017 at 03:22 AM.
pbarros is offline   Reply With Quote
Old 04-26-2017, 11:18 PM   #2
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

first of all let me say that I agree with your interpretation. the third transcript does not overlap the first two and it has given all three the same 'gene_id' value.

the only thing that comes to mind is that the assembly from stringtie, or cufflinks for that matter, is an attempted explanation, and often a simplification, of the alignment data. you may learn more about this by looking at the alignments in an alignment browser such as UCSC or IGV.

while i doubt this, stringtie could be very, very smart and have assembled the third isoform from reads that multimapped between it and the first two transcripts which would imply they are all the same gene but one that is repeated in more than one position.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 04-27-2017, 06:06 AM   #3
pbarros
Junior Member
 
Location: Portugal

Join Date: Jul 2012
Posts: 7
Default

thank you for the input sdriscoll ... maybe I was overthinking this

cheers,
pedro
pbarros is offline   Reply With Quote
Reply

Tags
isoform prediction, isoforms, stringtie

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO