SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
StringTie Output JazminO RNA Sequencing 2 02-03-2016 06:50 PM
stringtie output has exons instead of transcript annotation liux Bioinformatics 1 07-09-2015 04:39 AM
StringTie software? jake13 Bioinformatics 5 05-29-2015 09:37 PM
Parameters of PaciBio SOLiDance Pacific Biosciences 5 11-22-2011 04:00 PM

Reply
 
Thread Tools
Old 02-15-2016, 03:05 AM   #1
sbcn
Member
 
Location: spain

Join Date: Oct 2012
Posts: 16
Default stringtie parameters

Hi,

I have been trying to use Stringtie for transcriptome re-assembly, based on a reference gtf file.
Here is how I ran it:

# for each of the bam files from my project (aligned with tophat2):
stringtie file.bam -G reference.gtf -o file_stringtie.gtf -p 4 -v -C file_coverage.txt -A file_gene_abundance.out

# then merging all gtf files together:
stringtie --merge -G reference.gtf -p 4 -o all_merged.gtf gtf_list.txt

It is very straightforward. It is also incredibly fast as compared to the cufflinks + cuffmerge pipeline.

But when I compare the number of transcripts found in the reference GTF file and in the output of Stringtie, it is dramatically different:
awk '$3=="transcript"' reference.gtf | wc -l
# 23963
awk '$3=="transcript"' all_merged.gtf | wc -l
# 57830

I expect and hope for new transcripts, but I think this is a bit too much difference (Am I wrong?).

How can I make the pipeline more stringent?

Would you advice to increase the minimum input transcript coverage for example, in the merging step?
Also, If I look at some of cuffmerge's parameters, the minimum isoform fraction is set to 0.05 while in stringtie it is set as 0.01 by default: is it the way to go?

I have tried these parameters:

stringtie --merge -c 2.5 -G reference.gtf -p 4 -o all_merged_bis.gtf gtf_list.txt
awk '$3=="transcript"' all_merged_bis.gtf | wc -l
# 57476

stringtie --merge -f 0.05 -G reference.gtf -p 4 -o all_merged_ter.gtf gtf_list.txt
awk '$3=="transcript"' all_merged_ter.gtf | wc -l
# 36164

I am merging together results from about 60 bam files, so I guess the approach can be different than for smaller projects.

Thank you for any help and advice!

Best,
sbcn is offline   Reply With Quote
Old 02-16-2016, 08:44 AM   #2
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

I would try gffcompare (by the same author) instead of "stringtie --merge" because it seems to be more stringent. I have also experienced the same issue that you report, but it is worse for a large genome. In my case, "stringtie --merge" generated 3X more transcripts than the reference, while gffcompare only generated about 2X more. You can also discard novel loci with gffcompare if you want to only consider the reference set.

Alternatively, you can increase the thresholds for stringtie to merge transcripts.
SES is offline   Reply With Quote
Old 02-17-2016, 01:48 AM   #3
sbcn
Member
 
Location: spain

Join Date: Oct 2012
Posts: 16
Default

Thanks a lot for your input.

I have now tried gffcompare, but it is actually a lot worse in my case:

gffcompare -r reference.gtf -s reference.fa -C -D -i gtf_list.txt

awk '$3=="transcript"' gffcmp.combined.gtf | wc -l
# 185653

As I understand it, gffcompare creates the union of all the gtf files given as an input, and as I am merging about 60 files, I get a huge final number of transcripts.

I think stringtie --merge is more appropriate in my case as it rather constructs a kind of consensus, so I will try and work on optimizing the parameters, although I would like to make sure not to be too stringent on some of them, and too flexible on others.
sbcn is offline   Reply With Quote
Old 02-18-2016, 09:22 AM   #4
mpertea
Junior Member
 
Location: Maryland

Join Date: Mar 2012
Posts: 1
Default

It is very likely that most of the transcripts that make up the difference are intronic or intergenic single exon transcripts. Especially with such a large number of samples, there are many small fragments expressed all over the place. We are more aggressive in filtering these out in StringTie version 1.2.2 (just released today), so please give it a try.

The other ways to filter more of the transcripts are with the -f parameter just as mentioned before, or with the -F or -T parameters that filter out transcripts of very low abundance in the samples. We like filtering with -F and -T more than with the -f option, because -f filters transcripts that have a relative low abundance compared to the most abundant transcript in the bundle, even if sometimes the transcripts that are filtered out are highly expressed.
mpertea is offline   Reply With Quote
Old 02-18-2016, 02:34 PM   #5
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

Quote:
Originally Posted by mpertea View Post
It is very likely that most of the transcripts that make up the difference are intronic or intergenic single exon transcripts. Especially with such a large number of samples, there are many small fragments expressed all over the place. We are more aggressive in filtering these out in StringTie version 1.2.2 (just released today), so please give it a try.

The other ways to filter more of the transcripts are with the -f parameter just as mentioned before, or with the -F or -T parameters that filter out transcripts of very low abundance in the samples. We like filtering with -F and -T more than with the -f option, because -f filters transcripts that have a relative low abundance compared to the most abundant transcript in the bundle, even if sometimes the transcripts that are filtered out are highly expressed.
This is very helpful, thanks. One question I have would be about the merging that gffcompare does vs. the "stringtie --merge" method. It seems like "stringtie --merge" is the more appropriate method for joining libraries from different tissues, followed by an assessment with gffcompare. Is this correct? The docs say that gffcompare also does merging but it is not clear to how this relates to what "stringtie --merge" is doing.
SES is offline   Reply With Quote
Old 06-20-2016, 10:23 AM   #6
mcsimenc
Junior Member
 
Location: California USA

Join Date: May 2013
Posts: 5
Default

Can anyone suggest an interpretation of the following results using stringtie --merge: ?

Three stringtie assemblies with 29747, 30865, and 29863 transcripts are merged using stringtie --merge and the resulting gtf has only 25130 transcripts.

Am I losing information? I do not know the internal workings of stringtie --merge but I intuitively expect to have no fewer transcripts than the input assembly with the fewest transcripts.

Thanks!!
Matt
mcsimenc is offline   Reply With Quote
Old 02-07-2017, 07:46 PM   #7
rajeev.vikram
Junior Member
 
Location: Taipei

Join Date: Aug 2015
Posts: 6
Default

Quote:
Originally Posted by mcsimenc View Post
Can anyone suggest an interpretation of the following results using stringtie --merge: ?

Three stringtie assemblies with 29747, 30865, and 29863 transcripts are merged using stringtie --merge and the resulting gtf has only 25130 transcripts.

Am I losing information? I do not know the internal workings of stringtie --merge but I intuitively expect to have no fewer transcripts than the input assembly with the fewest transcripts.

Thanks!!
Matt
Hello Matt,

According to my understanding, the number of merged transcripts presented depends on the relative expression of the input transcript files. As the literature stares, " generate a non-redundant set of transcripts observed in all the RNA-Seq samples assembled previously to generate a a global, unified set of transcripts (isoforms) across multiple RNA-Seq samples." which means, the merge option will only produce transcripts with robust expression (or whatever expression cutoff one selects). Are you using a reference transcriptome file in assembly? you can also use gff compare to check the accuracy of your files.

Cheers
rajeev.vikram is offline   Reply With Quote
Reply

Tags
rna sequencing, stringtie, transcriptome assembly

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO