Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat/Cufflinks newbie - question about transcript assembly

    Hi everyone,
    This is my first time trying to analyse RNAseq data. I've got the results from a paired end read experiment, where fragments were selected at 240bp for 75bp reads, so there's probably some overlap and i'm not sure the wiggle track can be trusted.

    I've attatched a pic of one of my odd results, i'm not sure whether it is down to me badly selecting parameters, or whether there are issues when there are small spaces between transcribed genes? The EMBL track is automated annotations, and I didn't use them to guide tophat as we "know" a lot of them are wrong, but 01890 and 01880 are definitely seperate genes, based on experimental results (i'm also new to gbrowse, and haven't got it configured correctly so it seems to think all the exons are separate genes...but that doesn't matter right now!). CUFF .3178 including the well transcribed region and the rather flat bit next to it also doesn't seem intuitive.

    Any tips?
    Thanks.
    Attached Files

  • #2
    Originally posted by internet_nobody View Post
    Hi everyone,
    The EMBL track is automated annotations, and I didn't use them to guide tophat as we "know" a lot of them are wrong, but 01890 and 01880 are definitely seperate genes, based on experimental results
    Thanks.
    Starting with the most basic questions, 1) are you looking at the same genome assembly to which it was aligned?

    2) What parameters were used to generate these results?

    3) Is it possible that this is an operon, where run-on transcription occurs but synthesizes multiple proteins?

    Comment


    • #3
      Thanks for replying.

      1) Yes it's the same assembly.

      2) I now realise this weren't the best, I interpreted "ends" as "adapters" for the -r parameter, and reading other topics see that it meant "read length": tophat -r 140 --mate-std-dev 50 -i 50 s_3_1_sequences.txt s_3_2_sequences.txt
      Anything Cufflinks also had I put as the same, the rest I left as default.

      3) That's something I hadn't thought of, but it doesn't fit with microarray results showing different expression profiles for the mRNAs.

      Comment


      • #4
        Originally posted by internet_nobody View Post
        Thanks for replying.

        1) Yes it's the same assembly.

        2) I now realise this weren't the best, I interpreted "ends" as "adapters" for the -r parameter, and reading other topics see that it meant "read length": tophat -r 140 --mate-std-dev 50 -i 50 s_3_1_sequences.txt s_3_2_sequences.txt
        Anything Cufflinks also had I put as the same, the rest I left as default.

        3) That's something I hadn't thought of, but it doesn't fit with microarray results showing different expression profiles for the mRNAs.
        How did you come up with -r 140?
        I would have come up with 240 (size selected) - 150 (2*75bp reads) - ~100 (primer length) = -10

        Comment


        • #5
          Yes I know that now, but after asking someone else how they interpreted it they did 240 - 100 (2 x 50bp adapters) = 140 (I had assumed 0, as I couldn't find anything about using a negative number, and their argument that it couldn't be 0 beat my conviction that it should be 0). It was only when I read the forum I realised that the read length should be included. I'm waiting on a re-run using -30 (I looked at a few of the paired ends by eye, and they had ~30bp overlap, so perhaps the person that prepared the library cut a higher weight band than expected), which takes around 12 hours so i'll know soon enough. I wasn't sure if that would have been a big enough mistake to have had that much of an affect on the results.

          I'm hoping that also explains regions where many reads have aligned, but tophat/cufflinks don't pick anything up there...

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          45 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X