Seqanswers Leaderboard Ad

**SES** · 02-16-2016, 09:44 AM

I would try gffcompare (by the same author) instead of "stringtie --merge" because it seems to be more stringent. I have also experienced the same issue that you report, but it is worse for a large genome. In my case, "stringtie --merge" generated 3X more transcripts than the reference, while gffcompare only generated about 2X more. You can also discard novel loci with gffcompare if you want to only consider the reference set.

Alternatively, you can increase the thresholds for stringtie to merge transcripts.

**sbcn** · 02-17-2016, 02:48 AM

Thanks a lot for your input.

I have now tried gffcompare, but it is actually a lot worse in my case:

gffcompare -r reference.gtf -s reference.fa -C -D -i gtf_list.txt

awk '$3=="transcript"' gffcmp.combined.gtf | wc -l
# 185653

As I understand it, gffcompare creates the union of all the gtf files given as an input, and as I am merging about 60 files, I get a huge final number of transcripts.

I think stringtie --merge is more appropriate in my case as it rather constructs a kind of consensus, so I will try and work on optimizing the parameters, although I would like to make sure not to be too stringent on some of them, and too flexible on others.

**mpertea** · 02-18-2016, 10:22 AM

It is very likely that most of the transcripts that make up the difference are intronic or intergenic single exon transcripts. Especially with such a large number of samples, there are many small fragments expressed all over the place. We are more aggressive in filtering these out in StringTie version 1.2.2 (just released today), so please give it a try.

The other ways to filter more of the transcripts are with the -f parameter just as mentioned before, or with the -F or -T parameters that filter out transcripts of very low abundance in the samples. We like filtering with -F and -T more than with the -f option, because -f filters transcripts that have a relative low abundance compared to the most abundant transcript in the bundle, even if sometimes the transcripts that are filtered out are highly expressed.

**SES** · 02-18-2016, 03:34 PM

Originally posted by mpertea View Post

It is very likely that most of the transcripts that make up the difference are intronic or intergenic single exon transcripts. Especially with such a large number of samples, there are many small fragments expressed all over the place. We are more aggressive in filtering these out in StringTie version 1.2.2 (just released today), so please give it a try.

The other ways to filter more of the transcripts are with the -f parameter just as mentioned before, or with the -F or -T parameters that filter out transcripts of very low abundance in the samples. We like filtering with -F and -T more than with the -f option, because -f filters transcripts that have a relative low abundance compared to the most abundant transcript in the bundle, even if sometimes the transcripts that are filtered out are highly expressed.

This is very helpful, thanks. One question I have would be about the merging that gffcompare does vs. the "stringtie --merge" method. It seems like "stringtie --merge" is the more appropriate method for joining libraries from different tissues, followed by an assessment with gffcompare. Is this correct? The docs say that gffcompare also does merging but it is not clear to how this relates to what "stringtie --merge" is doing.

**mcsimenc** · 06-20-2016, 10:23 AM

Can anyone suggest an interpretation of the following results using stringtie --merge: ?

Three stringtie assemblies with 29747, 30865, and 29863 transcripts are merged using stringtie --merge and the resulting gtf has only 25130 transcripts.

Am I losing information? I do not know the internal workings of stringtie --merge but I intuitively expect to have no fewer transcripts than the input assembly with the fewest transcripts.

Thanks!!
Matt

**rajeev.vikram** · 02-07-2017, 08:46 PM

Originally posted by mcsimenc View Post

Can anyone suggest an interpretation of the following results using stringtie --merge: ?

Three stringtie assemblies with 29747, 30865, and 29863 transcripts are merged using stringtie --merge and the resulting gtf has only 25130 transcripts.

Am I losing information? I do not know the internal workings of stringtie --merge but I intuitively expect to have no fewer transcripts than the input assembly with the fewest transcripts.

Thanks!!
Matt

Hello Matt,

According to my understanding, the number of merged transcripts presented depends on the relative expression of the input transcript files. As the literature stares, " generate a non-redundant set of transcripts observed in all the RNA-Seq samples assembled previously to generate a a global, unified set of transcripts (isoforms) across multiple RNA-Seq samples." which means, the merge option will only produce transcripts with robust expression (or whatever expression cutoff one selects). Are you using a reference transcriptome file in assembly? you can also use gff compare to check the accuracy of your files.

Cheers

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

stringtie parameters

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News