SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
visualizing scripture output BED chrisbala Bioinformatics 6 12-06-2013 11:05 AM
scripture output nimmi RNA Sequencing 0 10-21-2011 01:29 PM
Comparison of experimental groups in Cufflinks ega2d Bioinformatics 1 11-08-2010 06:36 PM
Scripture output wenhuang Bioinformatics 1 07-28-2010 03:36 AM
Cufflinks and Comparison Analysis DrD2009 Bioinformatics 0 03-10-2010 12:40 PM

Reply
 
Thread Tools
Old 06-23-2010, 08:06 AM   #1
rcorbett
Member
 
Location: canada

Join Date: Sep 2009
Posts: 29
Question cufflinks funny output, scripture comparison

Hi all,

I have 50bp paired illumina reads which I have aligned with tophat (default parameters).
The alignments look reasonable in IGV, or UCSC browser.

I have run scripture on the tophat output, and I get a list of isoforms that look reasonable, if not verbose.
However, when I run cufflinks I get very spotty connectivity.

I am trying to attach a screenshot, which shows at the top, the split alignments of tophat, then the predicted transcripts of scripture, and below that, before the reference gene annotations there is the cufflinks output.

Has anyone else seen cufflinks output that is disconected like this? Any ideas on how to improve the results?
I have run scripture, and cufflinks on the same file.

(the screenshot attempt didn't work out)
The image I tried to attach has been posted here instead:
http://www.bcgsc.ca/downloads/rnaSeq...f3f_22ffe0.gif
Attached Images
File Type: jpg hgt_genome_test_4f3f_22ffe0.jpg (9.0 KB, 202 views)

Last edited by rcorbett; 06-23-2010 at 08:12 AM. Reason: screenshot too small to see
rcorbett is offline   Reply With Quote
Old 06-24-2010, 03:42 PM   #2
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

Quote:
Originally Posted by rcorbett View Post
Hi all,

I have 50bp paired illumina reads which I have aligned with tophat (default parameters).
The alignments look reasonable in IGV, or UCSC browser.

I have run scripture on the tophat output, and I get a list of isoforms that look reasonable, if not verbose.
However, when I run cufflinks I get very spotty connectivity.

I am trying to attach a screenshot, which shows at the top, the split alignments of tophat, then the predicted transcripts of scripture, and below that, before the reference gene annotations there is the cufflinks output.

Has anyone else seen cufflinks output that is disconected like this? Any ideas on how to improve the results?
I have run scripture, and cufflinks on the same file.

(the screenshot attempt didn't work out)
The image I tried to attach has been posted here instead:
http://www.bcgsc.ca/downloads/rnaSeq...f3f_22ffe0.gif
What are the parameters you used for running cufflinks/cuffcompare? Could it be that you are filtering out a number of reads based on some paramenter, i.e did you provide a -G file. if yes, your gtf file could be missing exon junctions.

Also I noted that are a lot of reads landing in intronic regions, is that to be expected?
Finally, can you please tell me which file you used to get the cuff.23.1 track on UCSC? I would like to see if I get similar dis-connectivity in my data.
thinkRNA is offline   Reply With Quote
Old 06-25-2010, 07:04 AM   #3
rcorbett
Member
 
Location: canada

Join Date: Sep 2009
Posts: 29
Default

Hi thinkRNA,

To run cufflinks, I used entirely default parameters. I used the pre-compiled 0.8.2 beta version for 64bit linux. I didn't provide a gtf of reference exons because I wanted to test the "de-novo" transcript assembly. I think that cufflinks should do this well according to the paper....

"High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks."

The intronic reads are more or less to be expected. The number of intronic reads varies with the genes, libraries, and disease type that we study. The disconnectivity of the identified transcripts is prevalent throughout my data set, in genes with high and low intronic levels.

To load the cufflinks output into UCSC you can just take your transcripts.gtf output file and load it directly as a custom track.

I would be interested to hear how your data performs with this software.

thanks!
rcorbett is offline   Reply With Quote
Old 06-25-2010, 12:33 PM   #4
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

SO, I looked at this same gene on UCSC along with junctions from tophat and fortunately I get the entire transcript connected.

I have 75bp reads sequenced to ~30 million depth.

I ran tophat with this parameter
tophat -a 10 --coverage-search -p 4 -g 10 -G refFlat_RefSeq.gff -o s2_tophat mm9 ../s2.fastq

cufflinks without -G option

Here is the UCSC image
http://picasaweb.google.com/priyamsi...08504614142194

I have not explored too many other genes systematically but around 5 of them I have seen so far are connected well.
I don't understand why there are two cuff ids (CUFF.204951 and CUFF.204952) with such different FPKM and coverages!! only difference in the two CUFF ids is that one is 3 base longer?

HTML Code:
~/tophat/S2$ grep "Insr" transcripts.tmap 
Insr    ENSMUST00000091291      p       CUFF.203963     CUFF.203963.1   100     1.082117        0.000000        2.612462        1.685393        89      CUFF.203963.1
Insr    ENSMUST00000091291      p       CUFF.203965     CUFF.203965.1   100     0.687917        0.000000        1.482256        1.071429        210     CUFF.203965.1
Insr    ENSMUST00000139504      c       CUFF.203967     CUFF.203967.1   100     2.363397        0.692223        4.034571        3.680982        163     CUFF.203967.1
Insr    ENSMUST00000139504      j       CUFF.204951     CUFF.204951.1   44      5.694980        1.526815        9.863144        8.869908        9073    CUFF.204952.2
Insr    ENSMUST00000139504      j       CUFF.204952     CUFF.204952.2   100     13.057124       8.871707        17.242540       20.336418       9076    CUFF.204952.2
How did you get your BAM file to view on UCSC? Did you just upload your BAM file to an https server? I don't have access to a server, so I doubt I can upload it.

May be you should see if tophat is picking those junctions for this gene? Given your image though, I can already see a lot of your reads are crossing junctions. You should also look systematically to see how many genes exhibit this behavior, you may just be unlucky with this one.
thinkRNA is offline   Reply With Quote
Old 06-25-2010, 12:52 PM   #5
rcorbett
Member
 
Location: canada

Join Date: Sep 2009
Posts: 29
Default

I'm pretty jealous of your nice results! I have played with cufflinks quite a bit and haven't seen a decent transcript such as that in all of my data.

Is it possible I am not seeing such good results because I am using 50bp reads? I just don't know at this point. Certainly the tophat results show a consistent level of junction reads for cufflinks to be expected to put it together correctly (after all scripture does a fine job).

To show the bam on UCSC you need to index it with samtools, and then as you suggest, upload to a publicly viewable site. THen you just point UCSC browser at your bam and it works! If you are using picassa, you can probably (though I'm not sure) host your bam file on google somewhere and point UCSC to that.

Unfortunately I have been looking at many genes and they all show exactly the same behaviour.

Can you tell me exactly what version of cufflinks you are using, and on what OS? For extra points I could share a small part of my sam file with you and would love to see if you get the same results on my data.
rcorbett is offline   Reply With Quote
Old 06-25-2010, 01:42 PM   #6
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

Quote:
Originally Posted by rcorbett View Post
I'm pretty jealous of your nice results! I have played with cufflinks quite a bit and haven't seen a decent transcript such as that in all of my data.

Is it possible I am not seeing such good results because I am using 50bp reads? I just don't know at this point. Certainly the tophat results show a consistent level of junction reads for cufflinks to be expected to put it together correctly (after all scripture does a fine job).

To show the bam on UCSC you need to index it with samtools, and then as you suggest, upload to a publicly viewable site. THen you just point UCSC browser at your bam and it works! If you are using picassa, you can probably (though I'm not sure) host your bam file on google somewhere and point UCSC to that.

Unfortunately I have been looking at many genes and they all show exactly the same behaviour.

Can you tell me exactly what version of cufflinks you are using, and on what OS? For extra points I could share a small part of my sam file with you and would love to see if you get the same results on my data.
Trust me, I have had my share of bad luck with these programs. I am now stuck in making sense of the output and tens of files spit out. I used linux 64 bit version and ofcourse the latest version of all programs given this forum is filled with the bugs reported in the older version. this is bizarre that tophat is reporting those junctions but cufflinks is not connecting them. Email Cole Trapnell and just hope that he will reply.
thinkRNA is offline   Reply With Quote
Old 06-30-2010, 10:47 AM   #7
rcorbett
Member
 
Location: canada

Join Date: Sep 2009
Posts: 29
Default

If anyone is interested, Cole is working on a new version (0.8.3), that will improve these results.
rcorbett is offline   Reply With Quote
Old 06-30-2010, 05:32 PM   #8
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

Quote:
Originally Posted by rcorbett View Post
If anyone is interested, Cole is working on a new version (0.8.3), that will improve these results.
do you know when it will be out? Is it possible for him to let out temporary fixes to critical known bugs reported.
thinkRNA is offline   Reply With Quote
Old 10-08-2010, 12:26 AM   #9
cur
Junior Member
 
Location: aberdeen

Join Date: Dec 2009
Posts: 5
Default how do you make refFlat_RefSeq.gff for mm9

Can somebody tell me where you get the refFlat_RefSeq.gff for mm9? I have found gff3 files for each chromosome (reference assembly, MGSCv37, of mouse build 37.1, in GFF3 format). Do these correspond to mm9? If so you have to combine these gff3 for each chromosome into one file, adding column for chromosome (chr1, chr2 etc) to each gff3 before merging the gff3 files?
Thanks
cur is offline   Reply With Quote
Old 11-15-2010, 06:47 AM   #10
rgejman
Junior Member
 
Location: New York

Join Date: Nov 2010
Posts: 4
Default

Has anyone resolved this issue? I am still seeing disconnected transcripts in many genes using tophat v1.1.2 and cufflinks 0.9.2. Tophat is run without the -G option, so de novo transcripts are found and I DO see reads connecting junctions that are then not reflected in the cufflinks transcripts.gtf output file.
rgejman is offline   Reply With Quote
Old 12-22-2010, 12:44 PM   #11
honey
Senior Member
 
Location: Pittsburgh

Join Date: Feb 2010
Posts: 151
Default

IS anyone know how can I plot RNAseq differential expression results using Tophat Cufflink and Cuffdiff, for visualization
honey is offline   Reply With Quote
Old 12-25-2010, 09:55 AM   #12
adarob
Member
 
Location: Berkeley, CA

Join Date: Jul 2010
Posts: 71
Default

@rgejman

This issue has been resolved in v0.9.3.
adarob is offline   Reply With Quote
Reply

Tags
cufflinks, scripture, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 06:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO