View Single Post
Old 02-15-2016, 03:05 AM   #1
Location: spain

Join Date: Oct 2012
Posts: 16
Default stringtie parameters


I have been trying to use Stringtie for transcriptome re-assembly, based on a reference gtf file.
Here is how I ran it:

# for each of the bam files from my project (aligned with tophat2):
stringtie file.bam -G reference.gtf -o file_stringtie.gtf -p 4 -v -C file_coverage.txt -A file_gene_abundance.out

# then merging all gtf files together:
stringtie --merge -G reference.gtf -p 4 -o all_merged.gtf gtf_list.txt

It is very straightforward. It is also incredibly fast as compared to the cufflinks + cuffmerge pipeline.

But when I compare the number of transcripts found in the reference GTF file and in the output of Stringtie, it is dramatically different:
awk '$3=="transcript"' reference.gtf | wc -l
# 23963
awk '$3=="transcript"' all_merged.gtf | wc -l
# 57830

I expect and hope for new transcripts, but I think this is a bit too much difference (Am I wrong?).

How can I make the pipeline more stringent?

Would you advice to increase the minimum input transcript coverage for example, in the merging step?
Also, If I look at some of cuffmerge's parameters, the minimum isoform fraction is set to 0.05 while in stringtie it is set as 0.01 by default: is it the way to go?

I have tried these parameters:

stringtie --merge -c 2.5 -G reference.gtf -p 4 -o all_merged_bis.gtf gtf_list.txt
awk '$3=="transcript"' all_merged_bis.gtf | wc -l
# 57476

stringtie --merge -f 0.05 -G reference.gtf -p 4 -o all_merged_ter.gtf gtf_list.txt
awk '$3=="transcript"' all_merged_ter.gtf | wc -l
# 36164

I am merging together results from about 60 bam files, so I guess the approach can be different than for smaller projects.

Thank you for any help and advice!

sbcn is offline   Reply With Quote