SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
the review of metagenomics assembly Mao_WGX Metagenomics 1 03-16-2012 05:44 AM
Comparative studies of de novo assembly tools for next-generation sequencing technolo strob Literature Watch 2 07-25-2011 03:58 PM
PubMed: Comparative metagenomics of microbial communities inhabiting deep-sea hydroth Newsbot! Literature Watch 0 10-12-2010 03:50 AM
PubMed: Comparative metagenomics of Daphnia symbionts. Newsbot! Literature Watch 0 04-23-2009 05:00 AM
PubMed: Methods for comparative metagenomics ECO Metagenomics 0 02-28-2009 01:00 PM

Reply
 
Thread Tools
Old 02-16-2012, 11:12 AM   #1
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default crAss for comparative metagenomics using cross-assembly

Cross-assembly of reads from different metagenomes allows us to assess the degree of similarity between the sampled communities. crAss is a tool to analyse the cross-assembly files. It creates a distance measure between metagenome pairs using several possible distance formulas. Please share your experiences with us!
The tool is available online at http://edwards.sdsu.edu/crass/ and for download at https://sourceforge.net/projects/crass/
dutilh is offline   Reply With Quote
Old 06-27-2012, 10:40 PM   #2
MikeT
Member
 
Location: Italy

Join Date: Jul 2010
Posts: 22
Default

Greetings,
I'm very interested into this tool. I've tried it, but when I load my datasets the program tells me that it doesn't recognize any read coming from my unassembled, single metagenomes.
I assembled my samples by concatenating them together and using Cap3 as assembler. Each read in my samples has a unique identifier. Do I need to change it, in order to tell the program which read comes from which metagenome?
Thanks in advance.

Michael Tangherlini
MikeT is offline   Reply With Quote
Old 07-03-2012, 01:01 AM   #3
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

Quote:
Originally Posted by MikeT View Post
Greetings,
I'm very interested into this tool. I've tried it, but when I load my datasets the program tells me that it doesn't recognize any read coming from my unassembled, single metagenomes.
I assembled my samples by concatenating them together and using Cap3 as assembler. Each read in my samples has a unique identifier. Do I need to change it, in order to tell the program which read comes from which metagenome?
Thanks in advance.

Michael Tangherlini
Hi Michael, my guess is Cap3 does not list the read IDs in the ACE format in the same way as other assemblers. Could you send me your Fasta files and the resulting ACE file in an email? Maybe a small example of an assembly of just a couple of reads would be better than the whole thing at once . Then I'll take a look! Best, Bas
dutilh is offline   Reply With Quote
Old 07-03-2012, 01:49 AM   #4
MikeT
Member
 
Location: Italy

Join Date: Jul 2010
Posts: 22
Default

Hello Bas,
I managed to solve the issue by simply adding the sample name as a prefix to the read. But I experienced issues with the website: as I upload my sequences, it doesn't seem to be able to update itself. The wheel on the left, after starting the run, keeps spinning forever.
Instead, the program works great locally. I've installed it and is able to analyze my data barely in seconds.
I have a question regarding the plot, though. For 3 samples, crAss automatically generates a threedimensional box, but I'm not able to understand what the dots and points stand for. I mean, I see that dots on each surface represent the contigs plotted along each axis according to the number of reads used to assemble them, but what about the points? What do they represent?
Best regards

Michael Tangherlini
MikeT is offline   Reply With Quote
Old 07-03-2012, 02:02 AM   #5
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

Quote:
Originally Posted by MikeT View Post
Hello Bas,
I managed to solve the issue by simply adding the sample name as a prefix to the read. But I experienced issues with the website: as I upload my sequences, it doesn't seem to be able to update itself. The wheel on the left, after starting the run, keeps spinning forever.
Instead, the program works great locally. I've installed it and is able to analyze my data barely in seconds.
I have a question regarding the plot, though. For 3 samples, crAss automatically generates a threedimensional box, but I'm not able to understand what the dots and points stand for. I mean, I see that dots on each surface represent the contigs plotted along each axis according to the number of reads used to assemble them, but what about the points? What do they represent?
Best regards

Michael Tangherlini
Great that you managed to solve the issue! I will ask Rob Schmieder to take a look at the website when he's back at his desk in August (he made it). Could it be that the files are really big and just take a very long time to upload? You could try to make them smaller using the commands shown on the help page.

The dots in the 3D plot are the projections of the points onto the three plains (superimposed on the points for visibility).
dutilh is offline   Reply With Quote
Old 07-03-2012, 02:12 AM   #6
MikeT
Member
 
Location: Italy

Join Date: Jul 2010
Posts: 22
Default

Hi Bas,
Thanks for the reply! No, the files have already been treated to greatly reduce their size as specified on the website. On that PC I've been using FireFox from CentOS, now I'm repeating it with Chrome on Win7 and get the same results: the wheel keeps spinning.
Is there a way to change the plot into a triangle plot as the one visible on the website? I fear that this kind of visualization is not so clear (at least for my samples).
Kind regards

Michael Tangherlini
MikeT is offline   Reply With Quote
Old 07-03-2012, 02:44 AM   #7
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

Quote:
Originally Posted by MikeT View Post
Hi Bas,
Thanks for the reply! No, the files have already been treated to greatly reduce their size as specified on the website. On that PC I've been using FireFox from CentOS, now I'm repeating it with Chrome on Win7 and get the same results: the wheel keeps spinning.
Is there a way to change the plot into a triangle plot as the one visible on the website? I fear that this kind of visualization is not so clear (at least for my samples).
Kind regards

Michael Tangherlini
The stand-alone script should also output both plots, we will fix that! Meanwhile, you can take the data from the "output.contigs2reads.txt" file, that lists the number of reads from each dataset assembled into each contig, and make your own plot. If you want, you can also filter out the unassembled reads (listed at the end of the file) because they will not really add much info to the plot: they will all overlap in the same three points: (0,0,1) ; (0,1,0) and (1,0,0).
dutilh is offline   Reply With Quote
Old 07-04-2012, 06:47 AM   #8
MikeT
Member
 
Location: Italy

Join Date: Jul 2010
Posts: 22
Default

Hello Bas,
I've used the output contig2reads provided by the program. I loaded it into a spreadsheet and associated to each contig (removing the single, unassembled reads) a specific color. I created the color code by assigning a single color in the RGB spectrum to each of my three samples (so one was Red, one was Green and one was Blue), normalizing each contribution to the 0-255 range and extrapolating the corresponding hex value from the mix of the three single values.
I then exported the spreadsheet and used the hex value as a fourth column in GNUplot to create a 3d plot as you do with crAss, only that I have only the contigs as points colored according to the contribution of each metagenome to the assembly. Pretty nifty.

Michael Tangherlini
MikeT is offline   Reply With Quote
Old 07-04-2012, 06:52 AM   #9
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

Quote:
Originally Posted by MikeT View Post
Hello Bas,
I've used the output contig2reads provided by the program. I loaded it into a spreadsheet and associated to each contig (removing the single, unassembled reads) a specific color. I created the color code by assigning a single color in the RGB spectrum to each of my three samples (so one was Red, one was Green and one was Blue), normalizing each contribution to the 0-255 range and extrapolating the corresponding hex value from the mix of the three single values.
I then exported the spreadsheet and used the hex value as a fourth column in GNUplot to create a 3d plot as you do with crAss, only that I have only the contigs as points colored according to the contribution of each metagenome to the assembly. Pretty nifty.

Michael Tangherlini
Wow that sounds great! Make sure you share it when it's published
I'll see if we can include it in our plots too...

Last edited by dutilh; 07-04-2012 at 06:53 AM. Reason: update
dutilh is offline   Reply With Quote
Old 07-05-2012, 11:12 PM   #10
MikeT
Member
 
Location: Italy

Join Date: Jul 2010
Posts: 22
Default

The end plot looks quite cluttered, though, and a little bit uneasy to understand. So I did something different: I plotted the cross-assembled contigs using four colors. 1, 2 and 3 are for the three metagenomes considered and represent the most important contributor to each contig. 4 is for contigs with ties in contribution.
I can show the plot, since it's just an attempt with very preliminary data.

-MikeT
Attached Images
File Type: png testcom.png (74.5 KB, 26 views)
MikeT is offline   Reply With Quote
Old 07-08-2012, 07:02 AM   #11
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

The first paper that cites crAss is out: "Taxonomic and Functional Microbial Signatures of the Endemic Marine Sponge Arenosclera brasiliensis" by Trindade-Silva et al.:
http://www.plosone.org/article/info%...l.pone.0039905
It cites the website, as the paper is still under review.
dutilh is offline   Reply With Quote
Old 09-03-2012, 07:22 AM   #12
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

And another paper that cites crAss is out: "Going viral: next-generation sequencing applied to phage populations in the human gut" by Reyes et al.:
http://www.nature.com/nrmicro/journa...micro2853.html
Still citing the website ...

Last edited by dutilh; 09-03-2012 at 07:29 AM.
dutilh is offline   Reply With Quote
Old 10-17-2012, 08:32 AM   #13
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

crAss has now been published in Bioinformatics:
http://bioinformatics.oxfordjournals...vm&keytype=ref
dutilh is offline   Reply With Quote
Old 11-03-2012, 05:43 AM   #14
ecogenetics
Junior Member
 
Location: USA

Join Date: Nov 2012
Posts: 1
Default crAss Plots

Hello Bas:
Congrats on the publication! It lends credibility to all of our efforts to use crAss. I've run two experiments and all went well with one exception and I hope that you can provide some advice. The second run has 3 sets of reads contributing to the cross-assembly. The distance matrices were as expected but there was no plot at the end. The toggle between 3D and Triangle options is there and enabled but no plot shows. The output (output.contigs2reads.txt) is there and is populated with appropriate data. For reference, a similar analysis of 2 other data sets resulted in a 2D plot so it doesn't appear to be a browser issue.
Thank you in advance for providing advice,
Bonnie Brown
ecogenetics is offline   Reply With Quote
Old 11-08-2012, 10:15 AM   #15
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

Quote:
Originally Posted by ecogenetics View Post
Hello Bas:
Congrats on the publication! It lends credibility to all of our efforts to use crAss. I've run two experiments and all went well with one exception and I hope that you can provide some advice. The second run has 3 sets of reads contributing to the cross-assembly. The distance matrices were as expected but there was no plot at the end. The toggle between 3D and Triangle options is there and enabled but no plot shows. The output (output.contigs2reads.txt) is there and is populated with appropriate data. For reference, a similar analysis of 2 other data sets resulted in a 2D plot so it doesn't appear to be a browser issue.
Thank you in advance for providing advice,
Bonnie Brown

Hi Bonnie,
We have just installed a bigger server to run crAss, which might have caused some of the problems with the online version of crAss.
Second, the new version of DrawTree (3.69) creates an error in the plot for some cladograms (due to zero branch lengths), for example those in the toy example that we provide on the SourceForge site. This results in no output image being created (this should not be a problem for real datasets).
Finally, I've now uploaded a new version (v1.2) that allows the user to more easily change the executable commands for BioNJ, GNUplot, GhostScript and DrawTree.
I hope that everything is working for you now, let me know if you still have trouble! Best, Bas
dutilh is offline   Reply With Quote
Old 05-31-2013, 07:41 PM   #16
martintay
Junior Member
 
Location: Singapore

Join Date: May 2013
Posts: 2
Default Input file "intree"

Hi Bas,

I just installed the stand alone version on my mac and have been trying to run it.
It seems that drawtree is looking for the input tree file "intree" which was not made.
Is there some configuration settings I missing for drawtree to allow it to accept cladogram.ph files?

Many thanks,
Martin
martintay is offline   Reply With Quote
Old 06-01-2013, 02:46 AM   #17
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

Quote:
Originally Posted by martintay View Post
Hi Bas,

I just installed the stand alone version on my mac and have been trying to run it.
It seems that drawtree is looking for the input tree file "intree" which was not made.
Is there some configuration settings I missing for drawtree to allow it to accept cladogram.ph files?

Many thanks,
Martin
Hi Martin,
The drawtree program is used to visualize the cladogram, taking a Phylip bracketnotation as input, and outputting a PNG image. It is called from within crAss (v1.2) with the following system call (with branch lengths):

echo -e "output.$size_correction.cladogram.ph\n$crAss_dir/fontfile\nV\nN\nL\nA\nD\nM\n0\n0\\nF\nHelvetica\nC\n0.5\nY\n" | $drawtree_exe

Something is probably going wrong with this system call. There are three variables, as indicated with the dollar sign (basic if you know some Perl...):
- $size_correction is either "minimum", "reads", "shot", or "wootters"
- $crAss_dir/fontfile is the location of the file "fontfile" that drawtree uses (it should be in the directory where the crAss.pl script is)
- $drawtree_exe is the location of the drawtree executable in your shell

By default, drawtree looks for a Phylip cladogram file called "infile", but as you can see above, the input file should be called "output.$size_correction.cladogram.ph". This means the location of the input file in the system call was not passed correctly, possibly as a result of different use of "echo" on a Mac.

As a solution, you could try running the drawtree command on your command line. Alternatively, and this may be the easiest solution, is not to use drawtree but another program to visualize the cladogram files (the four "output.$size_correction.cladogram.ph" files in the output directory). You could use any tree plotting program for this, such as FigTree, NJPlot, or the website http://itol.embl.de/ are just a few examples. Those programs take the bracketnotation in the *.ph files as input and show you a cladogram (tree) as output. This is all that drawtree does as well.

Good luck, let me know if/where you get stuck!
Bas
dutilh is offline   Reply With Quote
Old 06-04-2013, 06:22 PM   #18
martintay
Junior Member
 
Location: Singapore

Join Date: May 2013
Posts: 2
Default

Hi Bas,

Thanks for the advice, it is indeed the echo command that was giving the problem.
I continued on manually and got what I needed.

Cheers,
Martin
martintay is offline   Reply With Quote
Old 10-02-2015, 03:24 PM   #19
brettin
Junior Member
 
Location: Chicago IL, US

Join Date: Aug 2015
Posts: 1
Default

Hi,

I would like to try crAss.pl out on a metagenome study.

Two treatments
Three time points
Three technical replicates per time point

I have a cross assembly of all the samples, though I do not have an ace file. The assembler that I used produces only a fasta file of assembled contigs.

I'm wondering how to use crass.pl properly on this study. Any guidance would be greatly appreciated. Thanks.
brettin is offline   Reply With Quote
Old 12-08-2015, 08:07 AM   #20
dutilh
Member
 
Location: Netherlands

Join Date: Jan 2012
Posts: 12
Default

Hi Brettin, the new version of crAss can also take a SAM file as input. This allows you to first make cross-contigs, and then do the read mapping back to those contigs for example with Bowtie2, which creates a SAM file which you could use as input. You can find the latest crAss version on SourceForge (https://sourceforge.net/projects/crass/). Best! Bas


Quote:
Originally Posted by brettin View Post
Hi,

I would like to try crAss.pl out on a metagenome study.

Two treatments
Three time points
Three technical replicates per time point

I have a cross assembly of all the samples, though I do not have an ace file. The assembler that I used produces only a fasta file of assembled contigs.

I'm wondering how to use crass.pl properly on this study. Any guidance would be greatly appreciated. Thanks.
dutilh is offline   Reply With Quote
Reply

Tags
assembly, crass, cross-assembly, metagenomics

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:17 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO