Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat/Cufflinks newbie - question about transcript assembly

    Hi everyone,
    This is my first time trying to analyse RNAseq data. I've got the results from a paired end read experiment, where fragments were selected at 240bp for 75bp reads, so there's probably some overlap and i'm not sure the wiggle track can be trusted.

    I've attatched a pic of one of my odd results, i'm not sure whether it is down to me badly selecting parameters, or whether there are issues when there are small spaces between transcribed genes? The EMBL track is automated annotations, and I didn't use them to guide tophat as we "know" a lot of them are wrong, but 01890 and 01880 are definitely seperate genes, based on experimental results (i'm also new to gbrowse, and haven't got it configured correctly so it seems to think all the exons are separate genes...but that doesn't matter right now!). CUFF .3178 including the well transcribed region and the rather flat bit next to it also doesn't seem intuitive.

    Any tips?
    Thanks.
    Attached Files

  • #2
    Originally posted by internet_nobody View Post
    Hi everyone,
    The EMBL track is automated annotations, and I didn't use them to guide tophat as we "know" a lot of them are wrong, but 01890 and 01880 are definitely seperate genes, based on experimental results
    Thanks.
    Starting with the most basic questions, 1) are you looking at the same genome assembly to which it was aligned?

    2) What parameters were used to generate these results?

    3) Is it possible that this is an operon, where run-on transcription occurs but synthesizes multiple proteins?

    Comment


    • #3
      Thanks for replying.

      1) Yes it's the same assembly.

      2) I now realise this weren't the best, I interpreted "ends" as "adapters" for the -r parameter, and reading other topics see that it meant "read length": tophat -r 140 --mate-std-dev 50 -i 50 s_3_1_sequences.txt s_3_2_sequences.txt
      Anything Cufflinks also had I put as the same, the rest I left as default.

      3) That's something I hadn't thought of, but it doesn't fit with microarray results showing different expression profiles for the mRNAs.

      Comment


      • #4
        Originally posted by internet_nobody View Post
        Thanks for replying.

        1) Yes it's the same assembly.

        2) I now realise this weren't the best, I interpreted "ends" as "adapters" for the -r parameter, and reading other topics see that it meant "read length": tophat -r 140 --mate-std-dev 50 -i 50 s_3_1_sequences.txt s_3_2_sequences.txt
        Anything Cufflinks also had I put as the same, the rest I left as default.

        3) That's something I hadn't thought of, but it doesn't fit with microarray results showing different expression profiles for the mRNAs.
        How did you come up with -r 140?
        I would have come up with 240 (size selected) - 150 (2*75bp reads) - ~100 (primer length) = -10

        Comment


        • #5
          Yes I know that now, but after asking someone else how they interpreted it they did 240 - 100 (2 x 50bp adapters) = 140 (I had assumed 0, as I couldn't find anything about using a negative number, and their argument that it couldn't be 0 beat my conviction that it should be 0). It was only when I read the forum I realised that the read length should be included. I'm waiting on a re-run using -30 (I looked at a few of the paired ends by eye, and they had ~30bp overlap, so perhaps the person that prepared the library cut a higher weight band than expected), which takes around 12 hours so i'll know soon enough. I wasn't sure if that would have been a big enough mistake to have had that much of an affect on the results.

          I'm hoping that also explains regions where many reads have aligned, but tophat/cufflinks don't pick anything up there...

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:57 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-02-2024, 08:06 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-30-2024, 12:17 PM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Working...
          X