![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
how to merge de novo transcriptome assemblies | criggs | Bioinformatics | 6 | 09-09-2014 07:06 AM |
Merging two bacteria de novo assemblies into one | gmarco | Bioinformatics | 6 | 08-13-2014 10:51 AM |
de novo RNAseq contig clustering | dnusol | Bioinformatics | 9 | 09-29-2013 11:16 PM |
using multiple de novo assemblies to aid in exon joining | jbio | Bioinformatics | 0 | 03-29-2013 07:14 AM |
Mapping paired reads to de novo RNA-seq assemblies for quality assessment? | BobFreemanMA | Bioinformatics | 1 | 08-02-2012 12:36 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: New York Join Date: Nov 2012
Posts: 49
|
![]()
Hello,
I have two different strains of the same bacteria, strain A and strain B. I assembled both of these using trinity de novo, and I also have a reference genome for each strain, with reads aligned using bowtie. I want to find a way to make a mapping/clustering between strain A's de novo assembly, and strain B's de novo assembly. I also want to be able to make the same kind of mapping between strain A's de novo, and strain A's reference genome. What I have:
What I want:
What I need:
So how can I get that mapping? I realize it wont be a 1-to-1 mapping, but with closely related sequences like this I could at least identify a majority of genes. If it's not something commonly done, can someone at least point me in a promising direction? Thanks very much for any suggestions you can offer. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Have you looked at Mauve for doing the genome-level comparisons? If you have annotation available for your reference and your strains are similar to the reference then that can make the comparisons easy. Are your assemblies in a single contig i.e. finished?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: New York Join Date: Nov 2012
Posts: 49
|
![]()
Thanks for your reply.
I've been looking at Mauve now at your suggestion and it looks like it might be very useful if it does the kind of things I think it says it does. As for my annotations, all of the organisms I am using have published reference genomes available. As for my assemblies, I'm not exactly sure. I assembled a huge number of reads using trinity, I read that trinity is supposed to perform scaffolding, but I honestly hadn't been paying much attention to that. This is my first time doing any assembly work, so I'm going back now to check on those details. What I had been doing, was creating a bowtie index based on the trinity.fasta file, then mapping reads back with bowtie. Like this: bowtie2-build Trinity.fasta Trinity bowtie2 --all -x Trinity -1 forward_reads.fastq -2 reverse_reads.fastq -S aligned_reads.sam I had assumed, incorrectly I expect, that I could do the same thing, just using the actual reference genome instead of the assembled data, and end up with two sam files that I could make comparisons on. Am I missing something huge here? Anyhow, here is some information from one of my assemblies in case it helps. I'm going to continue looking into Mauve, I would welcome any suggestions or advice on what might be a good way to proceed and any pitfalls I might want to avoid. Thanks very much! ################################ ## Counts of transcripts, etc. ################################ Total trinity 'genes': 2254 Total trinity transcripts: 2501 Percent GC: 46.47 ######################################## Stats based on ALL transcript contigs: ######################################## Contig N10: 20428 Contig N20: 16367 Contig N30: 14168 Contig N40: 10881 Contig N50: 8121 Median contig length: 688 Average contig: 2780.90 Total assembled bases: 6955043 ##################################################### ## Stats based on ONLY LONGEST ISOFORM per 'GENE': ##################################################### Contig N10: 20428 Contig N20: 16483 Contig N30: 14250 Contig N40: 11095 Contig N50: 8404 Median contig length: 790 Average contig: 2950.57 Total assembled bases: 6650579 |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Trinity is not the right assembler for bacterial genomes. I suggest that you try SPAdes or Velvet. If you have a hugh amount of coverage you will be better off subsampling the data when you do assemblies. You should be able to generate a single contig (or a reasonably small number of contigs) easily.
|
![]() |
![]() |
![]() |
Tags |
cluster identification, de novo assembly, gene identification, orthomcl |
Thread Tools | |
|
|