SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
tophat/Cufflinks workflow question hmortens Bioinformatics 2 01-09-2012 10:26 AM
a question about tophat and cufflinks camelbbs Bioinformatics 0 06-27-2011 09:21 AM
Question to combine Bowtie output with Tophat's -- impact on Cufflinks FPKM values berath Bioinformatics 0 04-21-2011 08:38 AM
Cufflinks: transcript assembly and abundance estimation for RNA-seq Cole Trapnell Bioinformatics 21 02-10-2011 11:46 AM
Tophat/cufflinks workflow question staylor Bioinformatics 7 12-08-2009 02:08 PM

Reply
 
Thread Tools
Old 08-19-2010, 03:48 AM   #1
internet_nobody
Junior Member
 
Location: UK

Join Date: Aug 2010
Posts: 3
Default Tophat/Cufflinks newbie - question about transcript assembly

Hi everyone,
This is my first time trying to analyse RNAseq data. I've got the results from a paired end read experiment, where fragments were selected at 240bp for 75bp reads, so there's probably some overlap and i'm not sure the wiggle track can be trusted.

I've attatched a pic of one of my odd results, i'm not sure whether it is down to me badly selecting parameters, or whether there are issues when there are small spaces between transcribed genes? The EMBL track is automated annotations, and I didn't use them to guide tophat as we "know" a lot of them are wrong, but 01890 and 01880 are definitely seperate genes, based on experimental results (i'm also new to gbrowse, and haven't got it configured correctly so it seems to think all the exons are separate genes...but that doesn't matter right now!). CUFF .3178 including the well transcribed region and the rather flat bit next to it also doesn't seem intuitive.

Any tips?
Thanks.
Attached Images
File Type: png Picture 4.png (16.4 KB, 46 views)
internet_nobody is offline   Reply With Quote
Old 08-19-2010, 06:52 AM   #2
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by internet_nobody View Post
Hi everyone,
The EMBL track is automated annotations, and I didn't use them to guide tophat as we "know" a lot of them are wrong, but 01890 and 01880 are definitely seperate genes, based on experimental results
Thanks.
Starting with the most basic questions, 1) are you looking at the same genome assembly to which it was aligned?

2) What parameters were used to generate these results?

3) Is it possible that this is an operon, where run-on transcription occurs but synthesizes multiple proteins?
RockChalkJayhawk is offline   Reply With Quote
Old 08-19-2010, 08:34 AM   #3
internet_nobody
Junior Member
 
Location: UK

Join Date: Aug 2010
Posts: 3
Default

Thanks for replying.

1) Yes it's the same assembly.

2) I now realise this weren't the best, I interpreted "ends" as "adapters" for the -r parameter, and reading other topics see that it meant "read length": tophat -r 140 --mate-std-dev 50 -i 50 s_3_1_sequences.txt s_3_2_sequences.txt
Anything Cufflinks also had I put as the same, the rest I left as default.

3) That's something I hadn't thought of, but it doesn't fit with microarray results showing different expression profiles for the mRNAs.
internet_nobody is offline   Reply With Quote
Old 08-19-2010, 08:43 AM   #4
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by internet_nobody View Post
Thanks for replying.

1) Yes it's the same assembly.

2) I now realise this weren't the best, I interpreted "ends" as "adapters" for the -r parameter, and reading other topics see that it meant "read length": tophat -r 140 --mate-std-dev 50 -i 50 s_3_1_sequences.txt s_3_2_sequences.txt
Anything Cufflinks also had I put as the same, the rest I left as default.

3) That's something I hadn't thought of, but it doesn't fit with microarray results showing different expression profiles for the mRNAs.
How did you come up with -r 140?
I would have come up with 240 (size selected) - 150 (2*75bp reads) - ~100 (primer length) = -10
RockChalkJayhawk is offline   Reply With Quote
Old 08-19-2010, 09:03 AM   #5
internet_nobody
Junior Member
 
Location: UK

Join Date: Aug 2010
Posts: 3
Default

Yes I know that now, but after asking someone else how they interpreted it they did 240 - 100 (2 x 50bp adapters) = 140 (I had assumed 0, as I couldn't find anything about using a negative number, and their argument that it couldn't be 0 beat my conviction that it should be 0). It was only when I read the forum I realised that the read length should be included. I'm waiting on a re-run using -30 (I looked at a few of the paired ends by eye, and they had ~30bp overlap, so perhaps the person that prepared the library cut a higher weight band than expected), which takes around 12 hours so i'll know soon enough. I wasn't sure if that would have been a big enough mistake to have had that much of an affect on the results.

I'm hoping that also explains regions where many reads have aligned, but tophat/cufflinks don't pick anything up there...
internet_nobody is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:40 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO