Seqanswers Leaderboard Ad

**dpryan** · 02-27-2014, 04:25 AM

@blam: A not so small correction. The only solution that is actually correct was to concatenate files together, but then only keep the resulting multi-fasta file (i.e. "cat *.fa > genome.fa" and then either move genome.fa to its own directory or delete the other .fa files). The other solution is not guaranteed to always work correctly. I'm finishing a new release that will fix this. In the new version, bison_index will in fact accept a directory of fasta files, rather than needing one to specify files individually (in fact, I will explicitly remove the ability for it to handle that since it can have unintended consequences).

The previous implementation could work incorrectly in cases where the input file list didn't match the order in which the files appeared in the directory entry, which can actually change over time. What that would mean is that the files could have been indexed in one order (e.g., chr1, then chr2, then chr3, ...) but then later read into memory in a different order (e.g., chr3, then chr1, then chr2, ...), which could cause all sorts of problems. This could only occur if you passed bison_index a list of files, rather than a single multi-fasta file. While I don't expect people to get bitten by this bug, it's very much possible and I consider it a major issue. I'm testing a fix and will upload a new version within the next couple hours.

For anyone who stores the genome in a single file, this won't be an issue for you. If, however, you store chromosomes/contigs in individual files, then I recommend deleting the current indices (just "rm -rf bisulfite_genome" in the directory with the fasta files) and reindexing. The version I'm testing will always process files in the same order, regardless of their order in the dirent structure on disk, so this problem will be resolved.

**dpryan** · 02-27-2014, 05:27 AM

v0.3.0

I've just release version 0.3.0, which should address the problem I mentioned in my last post as well as a few other small bugs. I should note that you can now track the development version(s) of bison on github. I have a few branches (some not yet on github), implementing discordant/mixed alignments and using the development version of samtools/htslib.

Note: The indices produced by previous versions are not guaranteed to be compatible unless you used a multi-fasta file. There was a serious implementation problem with how bison_index worked when given multiple files as input and how multiple files were read into memory in previous versions. If you used a multi-fasta file, then everything will continue to work correctly. However, if you used multiple fasta files in a list then I strongly encourage you to delete the previous indices (just remove the bisulfite_genome directory) and reindex. The technical reasons for this issue are that when the bison tools previously read multiple fasta files into memory, they would do so in whatever order they appeared in the directory structure, which can change over time and isn't guaranteed to match the order of files someone specified during indexing. While the alignments wouldn't be affected by this, the methylation calls could have been seriously compromised. In this version, bison_index will only accept a directory, not a list of files, and it will always alphasort() the list of files in that directory prior to processing. This should eliminate this problem. My apologies to anyone affected by this.
Added --genome-size option to a number of the tools. Many of the bison programs need to read the genome into memory. By default, 3 gigabases worth of memory are allocated for that and the size increased as needed. For smaller genomes, this wasted space. For larger genomes, the constant reallocation of space could seriously slow things down. Consequently, this option was added to any tool that reads the genome into memory. It's convenient to overestimate this slightly, so if your genome is 3.8 gigabases, then just use 4000000000 as the genome size.
bison_merge_CpGs can now take multiple input files at once.
A number of small bug fixes, such as when "genome_dir" doesn't end in a /.

**dpryan** · 08-25-2014, 05:46 AM

It seems that I missed posting when I released version 0.3.1. Anyway, I've just released version 0.3.2. Changes of note are below, though the biggest one is support for HTSlib. I should note that I've also created a tutorial with compilation instructions and a couple example datasets available here.

Added bedGraph2MOABS to convert bedGraph files for use by MOABS.
Added support for HTSlib.
Fixed a small bug wherein --reorder wasn't being invoked when multiple output BAM files were to be used.
Fixed a small bug that only manifested in DEBUG mode.
There is now a tutorial.
The default minimum MAPQ and Phred scores used by bison_mbias have been updated to match bison_methylation_extractor.

**dpryan** · 08-26-2014, 06:53 AM

I've just posted version 0.3.2b, which fixes the Makefile so that bison will use the static htslib file. Otherwise, users would need to keep htslib around (convenient for me, but probably not for you).

**dpryan** · 10-27-2014, 12:56 AM

I've just posted version 0.3.3, which supports discordant and singleton alignments. The tutorial has also been updated to demonstrate how to suppress such alignments, if desired.

Bison how now been published. If you use it in your research, please cite the paper here.

I'll note that the next version will add support for CRAM files.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News