PacBioEDA: Exploratory Data Analysis tools for PacBio RS

SillyPoint

Member

Join Date: May 2008

Posts: 39
- Share
- Tweet
#1

PacBioEDA: Exploratory Data Analysis tools for PacBio RS

01-10-2012, 09:43 AM

I've just posted to github PacBioEDA, a package of python scripts for subread-level examination of the output of the PacBio instrument.

From the blurb (on pacbiodevnet, registration required):

Do you want to have a closer look at the data your PacBio RS
instrument is producing? Do you want to see a lower level of detail
than that contained in the reports produced by SMRTanalysis? The
PacBioEDA package lets you do exploratory data analysis at the
subread/region level.

For example: a given read was split into 5 subreads. Each subread
aligned nicely -- but all of them to the reverse strand, rather than
in the +-+-+ sequence you'd expect. Looking further, we see that the
aligned portion of each subread included only the first first half of
it. Conjecture: we're miscalling the adapter on one end of the
SMRTbell. Graph the alignment scores for the read against the adapter
sequence. Conjecture confirmed.

(The above problem went away with the 1.2.3 release, which does
a better job of recognising adapters.)

PacBioEDA consists of a set of python scripts, which accept as input a
bas.h5 file, and in some cases the associated cmp.h5 alignments file. The
scripts are run from the command line, and produce either a text file
or a .png plot as output. This is a no-frills package intended for
people who are willing to get dirty with their data.

I've included lots of commentary in the scripts themselves (unlike
most Open Source bioinformatics offerings, where the only comment is
the copyright notice), in the hope that this will help you understand
what the scripts are telling you about your data.

Tom Skelly ([email protected])
Tags: None
SillyPoint

Member

Join Date: May 2008

Posts: 39
- Share
- Tweet
#2

03-14-2012, 12:21 PM

Bug fix for PacBioEDA

I've just uploaded (link in OP) a fix to PacBioEDA for a bug which caused PacBio_Bas.py to print reference chromosome/contig numbers which do not reflect the order of the chrs in the reference file. It was printing an internal representation that is pretty useless for analysis purposes.

There are a few improvements as well since I originally uploaded the package. See the CHANGELOG file.

--TS
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

PacBioEDA: Exploratory Data Analysis tools for PacBio RS

Comment

Latest Articles

ad_right_rmr

News