SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
varscan2 mpileup2snp output differs from mpileup output JohanF Bioinformatics 0 02-28-2014 01:47 AM
Blastn output retrieve sequence rkizen Bioinformatics 6 06-20-2013 09:34 PM
The same sequence occurs multiple times in blastp output bioagri Bioinformatics 1 03-19-2012 12:28 AM
Getting sequence descriptions into BLAST output Hilary Bioinformatics 16 01-26-2012 09:28 AM
Bfast output and "Empty Sequence Dictionary" in .sam output aiden Bioinformatics 1 05-28-2010 06:50 PM

Reply
 
Thread Tools
Old 08-14-2014, 04:51 PM   #1
daikon
Junior Member
 
Location: japan

Join Date: Feb 2011
Posts: 2
Question gffread to output sequence, gene_id not output

HI everyone,

Many apologies if I'm duplicating, I have searched the forums, google, can't find the specific answer.

So- I've performed my mRNAseq experiment, used the workflow:

cufflinks->cuffmerge->cuffquant->cuffdiff

then used cummeRbund to look at the results.

From cummeRbund I've generated a list of differentially expressed genes.

What I'd like to do now is look at the sequence of the genes to see what type of things are differentially expressed (have done a brief GO analysis, would like to search HMM profiles for protein motifs).

I tried to output the sequences from my merged.gtf file (generated by cuffmerge) using gffread. I can get them to output, but I would really, REALLY, like the gene_id "XLOC_*****" number to be in the fasta header. But it seems that whatever I do, I can't get it out there. I can get almost every single other piece of info from the gtf file there using one or other of the gffread options, but not this.

Clearly it wouldn't be so hard to write my own script to do this, but I'm under time pressure, and I've leaernt the hard way that duplicating others efficient tools is foolhardy.

So- am I missing the crucial option here? Or do folks do this (outputting differentially exporessed gene sequences from mRNAseq expts) iin a different way?

I do have the gene IDs of the annotated genes in the fasta header, but there are some novel/intergenic/anomalous genes which are only really iddentifiable by "XLOC****"

Many thanks for your help

Matt
daikon is offline   Reply With Quote
Old 09-25-2014, 07:58 PM   #2
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Hi, running into to a similar problem.

There is a workaround here that may be relevant:

http://transdecoder.sourceforge.net/...e_eg_cufflinks

I can't get my merged.gtf file (output of cuffmerge) to output anything using gffread. I just get empty files.

Quote:
I tried to output the sequences from my merged.gtf file (generated by cuffmerge) using gffread. I can get them to output
Can you detail the code you used to do this?
danwiththeplan is offline   Reply With Quote
Old 09-25-2014, 08:27 PM   #3
daikon
Junior Member
 
Location: japan

Join Date: Feb 2011
Posts: 2
Default

gffread merge_matt_tissue_MSU/merged.gtf -g Oryza_sativa.IRGSP-1.0.21.dna_sm.genome.fa -w test.fa

for example.

The fasta genome file is in the same directory- you have to supply the full path if not.

Depending on which options you use, it seems a little sensitive to the features in your gtf file. For example, using -x gave me noting here as the merged.gtf doesn't have any features labelled as cds, only exons.
daikon is offline   Reply With Quote
Reply

Tags
cufflinks, cuffmerge, fasta, gffread, xloc

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO