Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks merging more than one transcript on bacterial genomes

    I am running tophat and cufflinks on a bacterial genome using galaxy.

    As parameters for tophat, I used minimal distance between introns as 15bp, and max intron size as 1500bp. Visual verification of this looks decent. What I mean by this is that when I look at the splice junctions, not many are identified (I do not expect many introns in my genome) although there are a few false ones, that seem to connect two different genes. This is one thing I would like help with- is it worth simply reducing to nothing the max intron size? What is accepted consensus when using tophat on bacterial genomes?

    When I look at the second tophat file, of accepted hits, all hits align nicely with known genes. However, when I run cufflinks I run into the following issues: when I use a reference genome, I get in addition to the known transcripts, a bunch of very long transcripts spanning very large genomic regions. Also, I will have two genes that are very near each other but run in opposite directions (which you can see beautifully in the tophat accepted hits alignments - different colors for each strand) but they merge into a single CUFF identifier. Is there any way I can address this- is it something I am missing with respect to parameters I have to change because I am working on a bacterial genome?

    Many thanks

    Noa

  • #2
    I've never tried these tools on bacterial RNA-seq, but it's my understanding that they were designed with eukaryotes in mind. TopHat, for example, aligns reads across splice junctions which presumably are absent in prokaryotes. Cufflinks assembles multiple splice isoforms which won't be present in a prokaryote.

    Without RNA-seq, finding new genes in a bacterial genome can be accomplished using the RAST or IMG/ER annotation services. Perhaps you can convert this annotation to a GTF, align your reads with bowtie, then use Cufflinks (or even a short script) to generate RPKM for each gene?

    It sounds like you have a stranded bacterial RNA-seq dataset. Out of curiosity, could you elaborate a bit on how the RNA and library were prepared?

    Comment


    • #3
      Hi- thanks for the reply.
      Maybe I should have given more of an intro: I am trying to develop a pipeline using Galaxy for non-bioinformaticists to do very basic analyses on RNA-Seq data (FPKM comparison of different experimental conditions etc). I myself am a wet biologist as well so I am trying to stay away from Linux. I know the tuxedo suite is eukaryote oriented but I was hoping to use it since Galaxy is so user-friendly. If I understand correctly, tophat aligns the reads to bowtie first in any case, and cufflinks will give me the FPKMs.

      The dataset I am using is actually just a test set from a friend. In any case it is in fact stranded, and was created using polyA tailing of the bacterial genome, followed by fragmentation, then treatment by antarctic phosphatase and PNK, then ligation of adapter, and then RT and PCR. It works beautifully on eukrayotes as well in my hands.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      9 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X