bfast jobs for analyzing AB's SOLiD data

genome_anawk1

Junior Member

Join Date: May 2011

Posts: 7
- Share
- Tweet
#1

bfast jobs for analyzing AB's SOLiD data

05-20-2011, 11:00 AM

Hello bfast experts,

I have split the output of AB SOLiD reads into different "reads.j.fastq" files for a speedy parallel processing. Each fastq file ~ 100MB.

I would really like your help now to resolve an ambiguity in analysis time of the independent bfast jobs. This analysis refers to PART-B of my pervious post.

Some jobs have converged with final outputs (called *.sam files) in < 5hrs (one of them as little as 1.5 hrs).

Some jobs seem to be "progressing" much slowly - walltime is nearing 24hrs and its stuck in "bfast postprocess" step. Steps "bfast match" and "bfast localalign" have completed. The output *.sam file size is indeed incrementing slowly. I am concerned about the 5-20 fold diversity in the time duration for results to converge. The jobs are all running on single cores ( I have no choice there - it a matter of principle) - housed at central facility hosting hundreds of uniform cores. So there is uniformity of hardware on the compute nodes.

Is the diversity in computation a cause of concern indicating a poor reads library preparation or is this the norm .. sometimes results converge after many more iterations than they would otherwise ! It could be stochastic .. Can one implement a flag in bfast postprocess that can speed up computation - AND also use the color space information. I prefer not to compromise on the accuracy of aligning the reads ..

Hope you can please help,
Thanks very much,
a bfast analyzer.

Last edited by genome_anawk1; 05-20-2011, 11:03 AM.
Tags: None
nilshomer

Nils Homer

Join Date: Nov 2008

Posts: 1285
- Share
- Tweet
#2

05-20-2011, 12:45 PM

It may be the pairing-rescue is taking a long time. Try disabling that feature with the "-U" flag. It most likely will not affect the results too much.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

bfast jobs for analyzing AB's SOLiD data

Comment

Latest Articles

ad_right_rmr

News