Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustering/mapping genes across de novo assemblies

    Hello,

    I have two different strains of the same bacteria, strain A and strain B. I assembled both of these using trinity de novo, and I also have a reference genome for each strain, with reads aligned using bowtie. I want to find a way to make a mapping/clustering between strain A's de novo assembly, and strain B's de novo assembly. I also want to be able to make the same kind of mapping between strain A's de novo, and strain A's reference genome.

    What I have:
    • strain A: trinity de novo assemble (fasta)
    • strain B: trinity de novo assemble (fasta)
    • strain A: reads mapped to de novo assembly (bam)
    • strain B: reads mapped to de novo assembly (bam)
    • Strain A: reads mapped to reference genome via bowtie2 (bam)
    • Strain B: reads mapped to reference genome via bowtie2 (bam)


    What I want:
    • Compare gene expression levels across assembly methods.
    • Compare gene expression levels for same assembly methods, across different strains.


    What I need:
    • A method to identify which genes are which between these data!



    So how can I get that mapping? I realize it wont be a 1-to-1 mapping, but with closely related sequences like this I could at least identify a majority of genes. If it's not something commonly done, can someone at least point me in a promising direction? Thanks very much for any suggestions you can offer.

  • #2
    Have you looked at Mauve for doing the genome-level comparisons? If you have annotation available for your reference and your strains are similar to the reference then that can make the comparisons easy. Are your assemblies in a single contig i.e. finished?

    Comment


    • #3
      Mauve

      Thanks for your reply.

      I've been looking at Mauve now at your suggestion and it looks like it might be very useful if it does the kind of things I think it says it does. As for my annotations, all of the organisms I am using have published reference genomes available. As for my assemblies, I'm not exactly sure. I assembled a huge number of reads using trinity, I read that trinity is supposed to perform scaffolding, but I honestly hadn't been paying much attention to that. This is my first time doing any assembly work, so I'm going back now to check on those details. What I had been doing, was creating a bowtie index based on the trinity.fasta file, then mapping reads back with bowtie. Like this:

      bowtie2-build Trinity.fasta Trinity

      bowtie2 --all -x Trinity -1 forward_reads.fastq -2 reverse_reads.fastq -S aligned_reads.sam


      I had assumed, incorrectly I expect, that I could do the same thing, just using the actual reference genome instead of the assembled data, and end up with two sam files that I could make comparisons on. Am I missing something huge here?



      Anyhow, here is some information from one of my assemblies in case it helps. I'm going to continue looking into Mauve, I would welcome any suggestions or advice on what might be a good way to proceed and any pitfalls I might want to avoid. Thanks very much!


      ################################
      ## Counts of transcripts, etc.
      ################################
      Total trinity 'genes': 2254
      Total trinity transcripts: 2501
      Percent GC: 46.47

      ########################################
      Stats based on ALL transcript contigs:
      ########################################

      Contig N10: 20428
      Contig N20: 16367
      Contig N30: 14168
      Contig N40: 10881
      Contig N50: 8121

      Median contig length: 688
      Average contig: 2780.90
      Total assembled bases: 6955043


      #####################################################
      ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
      #####################################################

      Contig N10: 20428
      Contig N20: 16483
      Contig N30: 14250
      Contig N40: 11095
      Contig N50: 8404

      Median contig length: 790
      Average contig: 2950.57
      Total assembled bases: 6650579

      Comment


      • #4
        Trinity is not the right assembler for bacterial genomes. I suggest that you try SPAdes or Velvet. If you have a hugh amount of coverage you will be better off subsampling the data when you do assemblies. You should be able to generate a single contig (or a reasonably small number of contigs) easily.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X