Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Software for identification of Trinity/Cufflinks transcripts

    Are there any good available programs/scripts for analyzing assembled transcripts?

    I imagine something like a script to blastn, blastx and tblastx each transcript and report the best hit. Something like that wouldn't be too hard to write, but I don't want to re-invent the wheel, and I also am concerned that sometimes the highest scoring hit reported by blast has a lot of gaps and is not the right result, while a lower scoring shorter hit is more likely correct, but the only way I can think of to accurately determine this is manually.

    The genome I am interested in is poorly annotated, and particularly bad in my region of interest, so just using a reference gtf with my cufflinks transcripts would not be very helpful.

  • #2
    Your subject title suggests you're working on Trinity transcripts, so why not try the workflows suggested on the Trinity website?



    In particular, it sounds like you might be interestested in the Read Alignment / Abundance Estimation workflow:



    On the other hand, if you're working from cufflinks transcripts then you should probably use cufflinks for the initial analysis:

    Last edited by gringer; 02-29-2012, 12:08 AM. Reason: added cufflinks cuff link

    Comment


    • #3
      I have already done all of that, I am talking about analysis downstream of cuffdiff for cufflinks/tophat and RSEM for trinity.

      As I mentioned, my region of interest and my model organism as a whole have poor annotation and assembly, so many genes that I find to be differentially expressed are either unannotated or only annotated as hypothetical or xenoref genes. With a few hundred to a few thousand differentially expressed genes, it is not really feasible to manually examine each one. I have seen reports done by other groups where they provide an excel sheet with every Trinity transcript, FPKM, log change, and then (importantly) the blastn results for each gene, and then the blastx/tblastx for any gene that does not map or does not have annotation.

      It seems simple to write a script to do this, except for the problem I mentioned regarding gapped alignments, but rather than re-invent the wheel I was wondering if there was an available script/program.

      Comment


      • #4
        If I have understood you perfectly you have a file with sequences of transcripts and you want to annotate them. I would suggest for that blast2GO is a good platform (http://www.blast2go.com/b2ghome).

        Comment


        • #5
          This looks promising, I will try using it, thanks.

          Is there anything similar that can be installed locally and run over a command line instead of a GUI?

          Comment


          • #6
            Blast2Go has a pipeline option which you can run locally (http://www.blast2go.com/b2glaunch/resources).

            Another interesting package is SeqGene which performs many common tasks for a next generation sequencing analysis. It has a one script complete analysis pipeline which works well with a little bit of configuration.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            59 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            57 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X