SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to merge de novo transcriptome assemblies criggs Bioinformatics 6 09-09-2014 06:06 AM
Merging two bacteria de novo assemblies into one gmarco Bioinformatics 6 08-13-2014 09:51 AM
de novo RNAseq contig clustering dnusol Bioinformatics 9 09-29-2013 10:16 PM
using multiple de novo assemblies to aid in exon joining jbio Bioinformatics 0 03-29-2013 06:14 AM
Mapping paired reads to de novo RNA-seq assemblies for quality assessment? BobFreemanMA Bioinformatics 1 08-02-2012 11:36 AM

Reply
 
Thread Tools
Old 07-06-2015, 01:09 PM   #1
aprice67
Member
 
Location: New York

Join Date: Nov 2012
Posts: 49
Question Clustering/mapping genes across de novo assemblies

Hello,

I have two different strains of the same bacteria, strain A and strain B. I assembled both of these using trinity de novo, and I also have a reference genome for each strain, with reads aligned using bowtie. I want to find a way to make a mapping/clustering between strain A's de novo assembly, and strain B's de novo assembly. I also want to be able to make the same kind of mapping between strain A's de novo, and strain A's reference genome.

What I have:
  • strain A: trinity de novo assemble (fasta)
  • strain B: trinity de novo assemble (fasta)
  • strain A: reads mapped to de novo assembly (bam)
  • strain B: reads mapped to de novo assembly (bam)
  • Strain A: reads mapped to reference genome via bowtie2 (bam)
  • Strain B: reads mapped to reference genome via bowtie2 (bam)

What I want:
  • Compare gene expression levels across assembly methods.
  • Compare gene expression levels for same assembly methods, across different strains.

What I need:
  • A method to identify which genes are which between these data!


So how can I get that mapping? I realize it wont be a 1-to-1 mapping, but with closely related sequences like this I could at least identify a majority of genes. If it's not something commonly done, can someone at least point me in a promising direction? Thanks very much for any suggestions you can offer.
aprice67 is offline   Reply With Quote
Old 07-06-2015, 03:00 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,053
Default

Have you looked at Mauve for doing the genome-level comparisons? If you have annotation available for your reference and your strains are similar to the reference then that can make the comparisons easy. Are your assemblies in a single contig i.e. finished?
GenoMax is offline   Reply With Quote
Old 07-07-2015, 10:45 AM   #3
aprice67
Member
 
Location: New York

Join Date: Nov 2012
Posts: 49
Default Mauve

Thanks for your reply.

I've been looking at Mauve now at your suggestion and it looks like it might be very useful if it does the kind of things I think it says it does. As for my annotations, all of the organisms I am using have published reference genomes available. As for my assemblies, I'm not exactly sure. I assembled a huge number of reads using trinity, I read that trinity is supposed to perform scaffolding, but I honestly hadn't been paying much attention to that. This is my first time doing any assembly work, so I'm going back now to check on those details. What I had been doing, was creating a bowtie index based on the trinity.fasta file, then mapping reads back with bowtie. Like this:

bowtie2-build Trinity.fasta Trinity

bowtie2 --all -x Trinity -1 forward_reads.fastq -2 reverse_reads.fastq -S aligned_reads.sam


I had assumed, incorrectly I expect, that I could do the same thing, just using the actual reference genome instead of the assembled data, and end up with two sam files that I could make comparisons on. Am I missing something huge here?



Anyhow, here is some information from one of my assemblies in case it helps. I'm going to continue looking into Mauve, I would welcome any suggestions or advice on what might be a good way to proceed and any pitfalls I might want to avoid. Thanks very much!


################################
## Counts of transcripts, etc.
################################
Total trinity 'genes': 2254
Total trinity transcripts: 2501
Percent GC: 46.47

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 20428
Contig N20: 16367
Contig N30: 14168
Contig N40: 10881
Contig N50: 8121

Median contig length: 688
Average contig: 2780.90
Total assembled bases: 6955043


#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

Contig N10: 20428
Contig N20: 16483
Contig N30: 14250
Contig N40: 11095
Contig N50: 8404

Median contig length: 790
Average contig: 2950.57
Total assembled bases: 6650579
aprice67 is offline   Reply With Quote
Old 07-07-2015, 12:03 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,053
Default

Trinity is not the right assembler for bacterial genomes. I suggest that you try SPAdes or Velvet. If you have a hugh amount of coverage you will be better off subsampling the data when you do assemblies. You should be able to generate a single contig (or a reasonably small number of contigs) easily.
GenoMax is offline   Reply With Quote
Reply

Tags
cluster identification, de novo assembly, gene identification, orthomcl

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO