Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat/Cufflinks newbie - question about transcript assembly

    Hi everyone,
    This is my first time trying to analyse RNAseq data. I've got the results from a paired end read experiment, where fragments were selected at 240bp for 75bp reads, so there's probably some overlap and i'm not sure the wiggle track can be trusted.

    I've attatched a pic of one of my odd results, i'm not sure whether it is down to me badly selecting parameters, or whether there are issues when there are small spaces between transcribed genes? The EMBL track is automated annotations, and I didn't use them to guide tophat as we "know" a lot of them are wrong, but 01890 and 01880 are definitely seperate genes, based on experimental results (i'm also new to gbrowse, and haven't got it configured correctly so it seems to think all the exons are separate genes...but that doesn't matter right now!). CUFF .3178 including the well transcribed region and the rather flat bit next to it also doesn't seem intuitive.

    Any tips?
    Thanks.
    Attached Files

  • #2
    Originally posted by internet_nobody View Post
    Hi everyone,
    The EMBL track is automated annotations, and I didn't use them to guide tophat as we "know" a lot of them are wrong, but 01890 and 01880 are definitely seperate genes, based on experimental results
    Thanks.
    Starting with the most basic questions, 1) are you looking at the same genome assembly to which it was aligned?

    2) What parameters were used to generate these results?

    3) Is it possible that this is an operon, where run-on transcription occurs but synthesizes multiple proteins?

    Comment


    • #3
      Thanks for replying.

      1) Yes it's the same assembly.

      2) I now realise this weren't the best, I interpreted "ends" as "adapters" for the -r parameter, and reading other topics see that it meant "read length": tophat -r 140 --mate-std-dev 50 -i 50 s_3_1_sequences.txt s_3_2_sequences.txt
      Anything Cufflinks also had I put as the same, the rest I left as default.

      3) That's something I hadn't thought of, but it doesn't fit with microarray results showing different expression profiles for the mRNAs.

      Comment


      • #4
        Originally posted by internet_nobody View Post
        Thanks for replying.

        1) Yes it's the same assembly.

        2) I now realise this weren't the best, I interpreted "ends" as "adapters" for the -r parameter, and reading other topics see that it meant "read length": tophat -r 140 --mate-std-dev 50 -i 50 s_3_1_sequences.txt s_3_2_sequences.txt
        Anything Cufflinks also had I put as the same, the rest I left as default.

        3) That's something I hadn't thought of, but it doesn't fit with microarray results showing different expression profiles for the mRNAs.
        How did you come up with -r 140?
        I would have come up with 240 (size selected) - 150 (2*75bp reads) - ~100 (primer length) = -10

        Comment


        • #5
          Yes I know that now, but after asking someone else how they interpreted it they did 240 - 100 (2 x 50bp adapters) = 140 (I had assumed 0, as I couldn't find anything about using a negative number, and their argument that it couldn't be 0 beat my conviction that it should be 0). It was only when I read the forum I realised that the read length should be included. I'm waiting on a re-run using -30 (I looked at a few of the paired ends by eye, and they had ~30bp overlap, so perhaps the person that prepared the library cut a higher weight band than expected), which takes around 12 hours so i'll know soon enough. I wasn't sure if that would have been a big enough mistake to have had that much of an affect on the results.

          I'm hoping that also explains regions where many reads have aligned, but tophat/cufflinks don't pick anything up there...

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          50 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X