The development of pyCRAC was driven by a requirement for (1) flexible, (2) user-friendly and (3) coherent set of tools tailored to more specifically handle CRAC/CLIP data. The pyCRAC package integrates the vast majority of basic CRAC/CLIP data processing tools we use routinely in a single easily transferrable software package. To make the tools easily adaptable, pyCRAC was written entirely in Python. As such, many of the pyCRAC tools can be used as a foundation for the rapid development of new scripts. Finally, to make the CLIP/CRAC techniques more generally accessible, we have tried to make pyCRAC as intuitive and user-friendly as possible so that researchers with little experience in bioinformatics will be able to basic analysis on their CLIP/CRAC datasets. The pyCRAC tools were designed to run as command line utilities on Unix and Linux based operating systems. Each tool comes with detailed help menus explaining how the various options work. We have also written a very detailed pyCRAC documentation that includes numerous command line examples and a large number of illustrations to describe the functionality of the pyCRAC tools. To provide the functionality of pyCRAC in a user-friendlier web interface, we have also made the tools compatible with Galaxy.
PyCRAC can handle multiplexed raw Solexa data and process Novoalign and SAM/BAN single-end and paired-end CLIP/CRAC data.
CRAC/CLIP cDNA library preparation protocols generate directional cDNA libraries. PyCRAC can also be used to analyse RNA-seq datasets, provided these contain strand information (!!!).
A major advantage of pyCRAC is that all programs share the same read filtering options, including removal of repetitive reads and reads with multiple alignment locations and flattening of the data by performing hit cluster analyses. Using GTF annotation files, pyCRAC programs can count overlap between reads/ read clusters and coding or genomic sequences of genes/alternative transcripts, overlap with intron, exons and UTRs, generate genome browser compatible output files, generate pileups and multiple sequence alignments and, finally, extract RNA binding motifs.
I am in the process of writing a manuscript on pyCRAC and I am looking for people who are willing to test the package. Minimal requirements to run pyCRAC:
Python 2.6
Unix/Linux/OSX system
at least 4 GB ram (microorganisms), >=8 human/mouse genome
a GTF annotation file and matching genomic sequence for your organism of interest (from UCSC or ENSEMBL).
If you are interested in testing pyCRAC. Please drop me an e-mail:
[email protected]
PyCRAC can handle multiplexed raw Solexa data and process Novoalign and SAM/BAN single-end and paired-end CLIP/CRAC data.
CRAC/CLIP cDNA library preparation protocols generate directional cDNA libraries. PyCRAC can also be used to analyse RNA-seq datasets, provided these contain strand information (!!!).
A major advantage of pyCRAC is that all programs share the same read filtering options, including removal of repetitive reads and reads with multiple alignment locations and flattening of the data by performing hit cluster analyses. Using GTF annotation files, pyCRAC programs can count overlap between reads/ read clusters and coding or genomic sequences of genes/alternative transcripts, overlap with intron, exons and UTRs, generate genome browser compatible output files, generate pileups and multiple sequence alignments and, finally, extract RNA binding motifs.
I am in the process of writing a manuscript on pyCRAC and I am looking for people who are willing to test the package. Minimal requirements to run pyCRAC:
Python 2.6
Unix/Linux/OSX system
at least 4 GB ram (microorganisms), >=8 human/mouse genome
a GTF annotation file and matching genomic sequence for your organism of interest (from UCSC or ENSEMBL).
If you are interested in testing pyCRAC. Please drop me an e-mail:
[email protected]
Comment