Seqanswers Leaderboard Ad

**maubp** · 04-28-2011, 04:44 AM

I'd try splitting the reads by GC, and assembling the high GC and low GC pools separately.

**PHSchi** · 04-28-2011, 04:49 AM

my first though, but

Hi!

That was my first thought as well, but then I loose all the reads of high GC from the target genome, i.e. they will be included in the other one - it is a curve of GC content and thus the target genome has to have regions with ~ 50% GC as well, whereas the possible endosymbiont has have reads of lower GC as well. I could take all the reads from a certain level of GC upwards and hope to assemble a bacterium out of those, then extract all the reads that went into the bacterial genome from the complete set of reads and hope to end up with more or less pure target genome, but is this sensible and feasible?!?

**Rockx** · 12-06-2011, 03:38 PM

I am also having the same problem.

What programs should I be using to separate reads based on GC content? A search of these forums only revealed replies such as "there are many programs that do this" but with no examples.

Any help would be appreciated, cheers!

**maubp** · 12-12-2011, 11:35 AM

What is your favourite scripting language? e.g. BioPerl/Biopython/etc would all make it easy to write a quick script to filter FASTQ on GC content. Also, do you have paired end data - and if so presumably you might want to filter at the pair level? That makes things a little more complicated...

**swbarnes2** · 12-12-2011, 11:38 AM

The other way to split is to align all the reads to your target organism stringently, and the velvet only the unmapped reads. Then, either figure out what your mystery contaminant is from the velvet, or include the velvet contigs in your genome alongside your desired organism, so that the reads will align to that, instead of beign forced somewhere in your target organism genome.

**Rockx** · 12-12-2011, 02:55 PM

Thanks maubp and swbarnes2. Indeed, I did end up editing the DynamicTrim perl script to include and option for GC trimming, this deals with paired end data fine.

swbarnes2, thanks for this tip. However, I 'm unable to do this as I am assembling de novo. Makes things a bit tougher.

**koadman** · 12-12-2011, 03:11 PM

You could try running an assembly pipeline designed explicitly to deal with mixes of organisms. metAMOS seems to be one such option:

GitHub - marbl/metAMOS: A metagenomic and isolate assembly and analysis pipeline built with AMOS

https://github.com/treangen/metAMOS/wiki

A metagenomic and isolate assembly and analysis pipeline built with AMOS - marbl/metAMOS

It uses metagenome taxonomy analysis to figure out which organism each scaffold group comes from and creates a separate assembly fasta file for each organism. Looks like it's under very active development at the moment.

**polyatail** · 12-12-2011, 07:10 PM

There are a number of tools out there that attempt to cluster or classify reads or contigs by sequence-intrinsic properties (i.e. k-mers, protein domains). Check out TETRA, WebCarma, TACOA or PhyloPythia.

**koadman** · 12-12-2011, 08:54 PM

The authors of PhyloPythia have an interesting comparison of the nucleotide composition-based methods to a sequence identity/homology-based method (MEGAN) in the 50 pages of supplemental material for this 1.5 page paper:

http://www.nature.com/nmeth/journal/v8/n3/full/nmeth0311-191.html

I didn't notice whether they ran a nucleotide or amino acid blast search for MEGAN, but either way, it seems that using homology information gives pretty darn good results compared to the composition methods (among which PhyloPythiaS seems to be superior).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

disentangling target genome and endosymbiont at read level

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News