SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bedtools : BGZF ERROR sgaa201 Introductions 1 02-19-2016 08:47 AM
Issue with BGZF and Samtools granzanimo Bioinformatics 9 04-22-2014 12:48 AM
To Build a New Index or Not To Build? arcolombo698 RNA Sequencing 0 11-30-2013 01:04 PM
BGZF ERROR: BEDtools on SPARC with Solaris 10 MILLHILL Bioinformatics 0 03-08-2011 07:05 AM
BAM, Zlib and BGZF library xinwu Bioinformatics 4 09-10-2010 12:06 AM

Reply
 
Thread Tools
Old 04-25-2014, 06:55 AM   #1
granzanimo
Junior Member
 
Location: Paris

Join Date: Apr 2014
Posts: 6
Default How to build a BGZF

Hi,

I would know how to build correctly a BGZF header using bgzf.c or bam.c functions.
I want to generate a BAM file using BGZF but I always get errors.

I'm trying this for 1 week now, but no matter how I try to build the header, I always get this error.

Code:
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
granzanimo is offline   Reply With Quote
Old 04-25-2014, 08:25 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Are you familiar with hexdump? It is very useful for actually looking at the raw file and (for example) checking the final 28 bytes of the BAM/BGZF file to see if there is an EOF marker.
maubp is offline   Reply With Quote
Old 04-25-2014, 11:42 AM   #3
granzanimo
Junior Member
 
Location: Paris

Join Date: Apr 2014
Posts: 6
Default

No i'm not, I will give a try. But when i checked with gedit, I didn't saw this marker. I only saw the beginning marker (BAM\001)
granzanimo is offline   Reply With Quote
Old 04-25-2014, 02:25 PM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Have you looked at the bam_header_write function? Remembering that bam_write() is just bgzf_write(), if you want to avoid the bam_* functions for some reason. Have a look at bgzf_close() for how the EOF is added (it's not explicitly mentioned, but it's the empty block that's added there).
dpryan is offline   Reply With Quote
Old 04-26-2014, 01:28 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Quote:
Originally Posted by granzanimo View Post
No i'm not, I will give a try.
I find using hexdump -C most helpful, often piped into head or tail.
Quote:
Originally Posted by granzanimo View Post
But when i checked with gedit, I didn't saw this marker. I only saw the beginning marker (BAM\001)
That is a problem - the 'naked' BAM files starts like that once you have removed the BGZF (GZIP) compression. You should see a (special) GZIP header at the start of the file.
maubp is offline   Reply With Quote
Old 04-26-2014, 04:08 PM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

@granzanimo: In case you're still not getting things to work, this would be the absolute minimum BAM file:

Code:
#include "bam.h"
int main(int argc, char *argv[]) {
    BGZF *fp = bgzf_open("foo.bam", "w");
    bam_header_t *header = bam_header_init();
    bam_header_write(fp,header);
    bgzf_close(fp);
    bam_header_destroy(header);
    return(0);
}
You could instead have a bit of a header with something like:
Code:
#include "bam.h"
int main(int argc, char *argv[]) {
    BGZF *fp = bgzf_open("foo.bam", "w");
    bam_header_t *header = bam_header_init();
    char *text = malloc(sizeof(char) * 128);

    //Add a chromosome to the header
    header->n_targets=1;
    header->target_name = malloc(sizeof(char *));
    header->target_name[0] = malloc(5*sizeof(char));
    header->target_name[0] = strcpy(header->target_name[0], "chr1");
    header->target_len = malloc(sizeof(uint32_t));
    header->target_len[0] = 1000;
    //The @HD line
    text = strcpy(text, "@HD\tVN:1.0\tSO:unsorted\n");
    header->text = text;
    header->l_text = strlen(text);

    bam_header_write(fp,header);
    bgzf_close(fp);
    bam_header_destroy(header);
    return(0);
}
You can also add the @SQ lines to header->text, though it's not strictly required.
dpryan is offline   Reply With Quote
Old 09-15-2015, 01:00 PM   #7
darked89
Member
 
Location: Barcelona, Spain

Join Date: Jun 2009
Posts: 36
Default

Quote:
Originally Posted by dpryan View Post
@granzanimo: In case you're still not getting things to work, this would be the absolute minimum BAM file:

Code:
#include "bam.h"
int main(int argc, char *argv[]) {
    BGZF *fp = bgzf_open("foo.bam", "w");
    bam_header_t *header = bam_header_init();
    bam_header_write(fp,header);
    bgzf_close(fp);
    bam_header_destroy(header);
    return(0);
}
You could instead have a bit of a header with something like:
Code:
#include "bam.h"
int main(int argc, char *argv[]) {
    BGZF *fp = bgzf_open("foo.bam", "w");
    bam_header_t *header = bam_header_init();
    char *text = malloc(sizeof(char) * 128);

    //Add a chromosome to the header
    header->n_targets=1;
    header->target_name = malloc(sizeof(char *));
    header->target_name[0] = malloc(5*sizeof(char));
    header->target_name[0] = strcpy(header->target_name[0], "chr1");
    header->target_len = malloc(sizeof(uint32_t));
    header->target_len[0] = 1000;
    //The @HD line
    text = strcpy(text, "@HD\tVN:1.0\tSO:unsorted\n");
    header->text = text;
    header->l_text = strlen(text);

    bam_header_write(fp,header);
    bgzf_close(fp);
    bam_header_destroy(header);
    return(0);
}
You can also add the @SQ lines to header->text, though it's not strictly required.

Just a short tip taken from dpryan earlier post, but somehow not popping up in google results on top how to compile the above code snippets:
Code:
gcc -Wall -o  mysnippet  mysnippet.c -lbam -lz -lpthread -I/path/2/samtools_src 
-L//path/2/samtools_src
Tested with samtools_0.1.19 and gcc 4.9.2

Hope it helps.

Darek
darked89 is offline   Reply With Quote
Old 09-15-2015, 01:10 PM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

It's good that you mentioned that! I have the unfortunate habit of assuming that everyone else knows how to compile C code manually...which is understandably not the case.

Want to convert that code to work with htslib too (it's probably just modifying the included headers)?
dpryan is offline   Reply With Quote
Old 09-15-2015, 02:24 PM   #9
darked89
Member
 
Location: Barcelona, Spain

Join Date: Jun 2009
Posts: 36
Default

Quote:
Originally Posted by dpryan View Post
It's good that you mentioned that! I have the unfortunate habit of assuming that everyone else knows how to compile C code manually...which is understandably not the case.

Want to convert that code to work with htslib too (it's probably just modifying the included headers)?
Well, catching up with htslib is one thing, but going from test BAM with
Code:
@HD	VN:1.0	SO:unsorted
@SQ	SN:chr1	LN:1000
towards one where one can:
* create BAM with real world header taken from conventionally created BAM file (samtools view -H real.bam > real.header.txt; read the real.header.txt and create a test1.bam)
* add even a tiny chunk with 10 unpaired but mapped reads
* if possible, add another minimal chunk and get say test3.bam
* create bam with a real world header plus two already sorted SAM files taken from 2 different chromosomes from real.bam

If for time being these steps are doable only using old branch of samtools, so be it. As long as the resulting BAMs are readable by the htslib, it is not a software archaeology

Anyway, many thanks for posting the code. I think it may be worthy to start putting it in say github repo bam_hacks_for_nonC_ppl

Darek
darked89 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO