![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fastest way to extract differing positions from each alignment in a BAM file | CHRYSES | Bioinformatics | 5 | 12-14-2011 12:28 PM |
Sizing Solution for GS FLX+/XL+ upgrade | flashnglow | Sample Prep / Library Generation | 1 | 07-21-2011 06:42 AM |
sizing solution? | litali | 454 Pyrosequencing | 0 | 10-25-2010 05:11 AM |
Target Enrichmnet In-Solution | mestro2 | Sample Prep / Library Generation | 10 | 07-28-2010 09:27 AM |
YACs as a solution to repeat regions | qnc | Sample Prep / Library Generation | 3 | 07-01-2010 04:56 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: UK Join Date: Nov 2010
Posts: 49
|
![]()
Hi,
I am currently using Hydra package solution but quite quickly this gets clogged as all the RAM is used (4gb). Is there any other solution that has smaller memory footprint (picard?)? What would be the optimal memory size (considering that any other tool would not offer better memory handling)? Thanks! |
![]() |
![]() |
![]() |
#2 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
When you say bam2fastq are you trying to recover the original read sequences (possibly paired) as FASTQ files, in their original orientation (reverse complement undone if mapped to the reverse strand)?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: UK Join Date: Nov 2010
Posts: 49
|
![]()
that is correct maubp.
|
![]() |
![]() |
![]() |
#4 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
I think EMBOSS seqret can do BAM to FASTQ, but I'm not sure what it does with pairs or reads mapped to the reverse strand.
I did this once myself using a custom Python script. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Cambridge, United Kingdom Join Date: Jun 2011
Posts: 58
|
![]()
This software worked OK on my 4GB workstation:
http://www.hudsonalpha.org/gsl/software/bam2fastq.php Chris |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
For the peoples ...
I've run into similar problems: bam to fastq implementations take up too much memory and take all day to run. I whipped together my own solution. This thing is very fast and uses very little memory (it outputs a pair when the second pair is input). It will take a little knowledge of how to point to libraries and headers to get working on your system. It uses AVL trees. It only works on paired end data. You must install samtools, tree library and possibly zlib. The tree library is at http://piumarta.com/software/tree/ CODE HAS BEEN REDACTED FOR MORE ROBUST VERSION, SEE LATER POSTS (edit 6/21/2013) Last edited by Richard Finney; 06-21-2013 at 11:46 AM. |
![]() |
![]() |
![]() |
#7 |
Member
Location: UK Join Date: Nov 2010
Posts: 49
|
![]()
@Richard
wow, this really zips thru the data! is there counterpart solution for non-paired data? |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
There's actually faster ways to do it, a hash table implementation I did was 5x faster, but had a fixed amount of memory. I just wanted something simple and guaranteed to work on about any system (including the 4GB nodes on biowulf at NIH) and would run fast enough that I didn't lose focus. The implementation above uses heap for unmatched pairs and so uses very little memory on a proper ("strict"?) paired end bam. Soon as the pair is matched, the node is deleted and the AVL tree is rebalanced. The pieces are there for a single end implementation, just need to cut some code out. I'll see if can do it; check back tomorrow. Everybody's welcome to do whatever they want with the code.
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Here it is. This is the single ended version. Nothing special but does (re-)reverse compliment the sequence and (re-)reverses the quality for opposite strand hits. Please check your results as this is brand new code.
/* bam to fastq SE (single end) version [ use bampe2fq for paried end data] . how to compile: edit line below to point to samtools libraries gcc -Wall -O3 -I/h1/finneyr/samtools-0.1.18/ bamse2fqs.c -o bamse2fqs -lm /h1/finneyr/samtools-0.1.18/libbam.a /h1/finneyr//zlib-1.2.5/libz.a (AVL tree not used in this version). example ./bamse2fqs /tcga_next_gen/NG_buckets/bucket11/bam/TCGA-AA-3858-01A-01W-0900-09_IlluminaGA-DNASeq_exome.bam outputs a fastq file called 1se.fq */ #include <stdio.h> #include <stdlib.h> off_t ftello(FILE *stream); // sam.h complains unless this prototype is there for -Wall on older systems #include "sam.h" #include <sys/types.h> #include <unistd.h> static char seq[500]; static char qual[500]; static char compli(char ch) { if (ch == 'A') return 'T'; else if (ch == 'T') return 'A'; else if (ch == 'C') return 'G'; else if (ch == 'G') return 'C'; else if (ch == 'N') return 'N'; else if (ch == 'a') return 't'; else if (ch == 't') return 'a'; else if (ch == 'c') return 'g'; else if (ch == 'g') return 'c'; return '.'; } static void reverse_and_compliment(char s[], char outs[], int len) { register int i,j; j = 0; i = len - 1; while (i >= 0) outs[j++] = compli(s[i--]); return; } static void reverse(char s[], char outs[]) { register int i,j; i = strlen(s); i--; j = 0; while (i >= 0) { outs[j] = s[i]; j++; i--; } outs[j] = (char)0; strcpy(s,outs); return; } static FILE *fp1; // fastq output file long int in_cnt = 0L; long int flip_cnt = 0L; static int fetch_func(const bam1_t *b, void *data) { uint32_t *cigar = bam1_cigar(b); const bam1_core_t *c = &b->core; int i, l; // NO!!!!!!!!! if (b->core.tid < 0) return 0; for (i = l = 0; i < c->n_cigar; ++i) { int op = cigar[i]&0xf; if (op == BAM_CMATCH || op == BAM_CDEL || op == BAM_CREF_SKIP) l += cigar[i]>>4; } uint8_t *s = bam1_seq(b); uint8_t *t = bam1_qual(b); qual[0] = (char)0; seq[0] = (char)0; for (i = 0; i < c->l_qseq; i++) seq[i] = bam_nt16_rev_table[bam1_seqi(s, i)]; seq[i] = (char)0; i = 0; for (i = 0; i < c->l_qseq; ++i) qual[i] = t[i] + 33; qual[i] = (char)0; if (c->flag&BAM_FREVERSE) { char tmps[512]; reverse_and_compliment(seq, tmps, strlen(seq)); strcpy(seq,tmps); reverse(qual, tmps); strcpy(qual,tmps); flip_cnt++; } in_cnt++; fprintf(fp1,"@%s\n%s\n+\n%s\n",bam1_qname(b), seq,qual); // single end, just output it #if 0 // for paired end - see bampe2fq.c if (c->flag&BAM_FREAD1) add(bam1_qname(b),0,seq,qual); else add(bam1_qname(b),1,seq,qual); #endif return 0; } int main(int argc, char *argv[]) { samfile_t *fp; if (argc == 1) { fprintf(stderr, "Usage: bamse2fq <in.bam> \n"); return 1; } if ((fp = samopen(argv[1], "rb", 0)) == 0) { fprintf(stderr, "bamse2fq: Fail to open BAM file %s\n", argv[1]); return 1; } if (argc == 2) { /* if a region is not specified */ fp1 = fopen("1se.fq","w"); bam1_t *b = bam_init1(); while (samread(fp, b) >= 0) fetch_func(b, fp); bam_destroy1(b); fclose(fp1); printf("DONE in_cnt = %ld , reversed=%ld \n",in_cnt,flip_cnt); } else { fprintf(stderr,"ERROR: "); fprintf(stderr, "Usage: bamse2fq <in.bam> \n"); } samclose(fp); return 0; } Last edited by Richard Finney; 12-22-2011 at 06:35 PM. |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
The latest samtools actually has a hidden command: bam2fq, which converts a BAM to *single-end* FASTQ.
@Richard: samtools comes with a single-header hash table library khash.h. BTW, I could not open your link. I would be interested in an AVL tree implementation in C. I know a couple single-header libraries for B-tree and red-black trees. |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Heng (lh3),
Yes, the link is down. It was up the other day. The server at http://piumarta.com must be taking a holiday. It will probably be back. Regardless I'll email you the AVL "library" at your "me" email address (let me know if there's a better one). Tree-1.0 is really just include files, not an actual linkable library. It is MIT license. Re-implementing the data structures and operations of an AVL package would be pretty easy if it does not suit your needs. AVL seems most elegant way and will avoid the pathological worst case key clashes of a hash solution; but hash will in most cases be faster with the cost of larger memory usage. |
![]() |
![]() |
![]() |
#12 |
Member
Location: Atlanta, US Join Date: Jan 2010
Posts: 59
|
![]()
Hi guys,
I've run into a more complex problem regarding bam to fastq conversion. Somehow my bam files have some unpaired reads mixed with the paired-end reads. I was running Picard but that seems to terminate when it finds unpaired read. Any suggestions on how to proceed now? Thanks in advance. |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
[ EDIT 6/21/2013 ... see later posts for full, improved more robust version ]
If you want the "orphan" reads [ i.e. read not properly paired], modify bampe2fqs.c as instructed below ... This will dump "orphan" reads to the file "orphans.fq" in addtion to the pairs in 1.fq and 2.fq 1) Uncomment dump_orphans() call. 2) Add this code to bampe2fqs.c after line with "static Tree tree;" Remember AVL tree "library" is at http://piumarta.com/software/tree/ as include files and point to your libz and libbam libraries when compiling. =============== insert after Tree declaration ============== int orphcnt = 0; // count of orphans reads void orphan_node_printer(Node *self, void *stream) { fprintf(stream,"@%s\n%s\n+\n%s\n",self->name, self->seq,self->qual); orphcnt++; } void dump_orphans() { FILE *fpo; // orphan fastqs fpo = fopen("orphan.fq","w"); TREE_FORWARD_APPLY(&tree, _Node, linkage, orphan_node_printer, fpo); fclose(fpo); fprintf(stderr,"DONE, %d orphans flushed\n",orphcnt); } Last edited by Richard Finney; 06-21-2013 at 11:47 AM. |
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
@Richard: khash takes smaller memory than most search trees on small key-value pairs. The only implementations that beat khash on memory are stx-tree and my own kbtree.h, both of which are based on B-tree/B+-tree. Nonetheless, if you store large objects (I usually store pointers instead), tree structures can be more memory efficient.
|
![]() |
![]() |
![]() |
#15 |
Member
Location: Atlanta, US Join Date: Jan 2010
Posts: 59
|
![]()
Hi Richard,
Thanks a lot for the info. Works perfectly. Does this program make any assumptions about the input bam file (sorting on bam file). Does it also work if I have multiple alignments for reads (non-primary alignments)? |
![]() |
![]() |
![]() |
#16 |
Member
Location: Atlanta, US Join Date: Jan 2010
Posts: 59
|
![]()
Hi Richard,
Please have a look at my previous question. I was wondering what happens to those reads whose second is pair is unmapped and also missing from the bam file? Thanks. |
![]() |
![]() |
![]() |
#17 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Sorry for a late response, vinay.
Those reads with no pair in the bam are "orphan reads". See the 05-10-2012 post for outputting orphan reads. "Good" bams should have all pairs whether mapped or not, though sometimes programs drop unmapped reads to save space. |
![]() |
![]() |
![]() |
#18 |
Member
Location: North Sea Join Date: Apr 2008
Posts: 34
|
![]()
Hi,
I just downloaded the bam2fastq file but I cannot extract it. Is there any problem with the file? Thanks -- tar -xzf bam2fastq-1.1.0.tgz gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error exit delayed from previous errors |
![]() |
![]() |
![]() |
#19 |
Member
Location: Cambridge, United Kingdom Join Date: Jun 2011
Posts: 58
|
![]()
Hi,
It worked for me. % tar -xvzf bam2fastq-1.1.0.tgz bam2fastq-1.1.0/ bam2fastq-1.1.0/bam2fastq.cpp bam2fastq-1.1.0/HISTORY.txt bam2fastq-1.1.0/LICENSE bam2fastq-1.1.0/Makefile bam2fastq-1.1.0/README.txt bam2fastq-1.1.0/samtools/ bam2fastq-1.1.0/samtools/._AUTHORS bam2fastq-1.1.0/samtools/AUTHORS bam2fastq-1.1.0/samtools/._bam.c bam2fastq-1.1.0/samtools/bam.c bam2fastq-1.1.0/samtools/._bam.h bam2fastq-1.1.0/samtools/bam.h bam2fastq-1.1.0/samtools/._bam_aux.c bam2fastq-1.1.0/samtools/bam_aux.c bam2fastq-1.1.0/samtools/._bam_color.c bam2fastq-1.1.0/samtools/bam_color.c bam2fastq-1.1.0/samtools/._bam_endian.h bam2fastq-1.1.0/samtools/bam_endian.h bam2fastq-1.1.0/samtools/._bam_import.c bam2fastq-1.1.0/samtools/bam_import.c bam2fastq-1.1.0/samtools/._bam_index.c bam2fastq-1.1.0/samtools/bam_index.c bam2fastq-1.1.0/samtools/._bam_lpileup.c bam2fastq-1.1.0/samtools/bam_lpileup.c bam2fastq-1.1.0/samtools/._bam_maqcns.c bam2fastq-1.1.0/samtools/bam_maqcns.c bam2fastq-1.1.0/samtools/._bam_maqcns.h bam2fastq-1.1.0/samtools/bam_maqcns.h bam2fastq-1.1.0/samtools/._bam_mate.c bam2fastq-1.1.0/samtools/bam_mate.c bam2fastq-1.1.0/samtools/._bam_md.c bam2fastq-1.1.0/samtools/bam_md.c bam2fastq-1.1.0/samtools/._bam_pileup.c bam2fastq-1.1.0/samtools/bam_pileup.c bam2fastq-1.1.0/samtools/._bam_plcmd.c bam2fastq-1.1.0/samtools/bam_plcmd.c bam2fastq-1.1.0/samtools/._bam_reheader.c bam2fastq-1.1.0/samtools/bam_reheader.c bam2fastq-1.1.0/samtools/._bam_rmdup.c bam2fastq-1.1.0/samtools/bam_rmdup.c bam2fastq-1.1.0/samtools/._bam_rmdupse.c bam2fastq-1.1.0/samtools/bam_rmdupse.c bam2fastq-1.1.0/samtools/._bam_sort.c bam2fastq-1.1.0/samtools/bam_sort.c bam2fastq-1.1.0/samtools/._bam_stat.c bam2fastq-1.1.0/samtools/bam_stat.c bam2fastq-1.1.0/samtools/._bam_tview.c bam2fastq-1.1.0/samtools/bam_tview.c bam2fastq-1.1.0/samtools/._bamtk.c bam2fastq-1.1.0/samtools/bamtk.c bam2fastq-1.1.0/samtools/._bgzf.c bam2fastq-1.1.0/samtools/bgzf.c bam2fastq-1.1.0/samtools/._bgzf.h bam2fastq-1.1.0/samtools/bgzf.h bam2fastq-1.1.0/samtools/._bgzip.c bam2fastq-1.1.0/samtools/bgzip.c bam2fastq-1.1.0/samtools/._ChangeLog bam2fastq-1.1.0/samtools/ChangeLog bam2fastq-1.1.0/samtools/COPYING bam2fastq-1.1.0/samtools/._examples bam2fastq-1.1.0/samtools/examples/ bam2fastq-1.1.0/samtools/._faidx.c bam2fastq-1.1.0/samtools/faidx.c bam2fastq-1.1.0/samtools/._faidx.h bam2fastq-1.1.0/samtools/faidx.h bam2fastq-1.1.0/samtools/._glf.c bam2fastq-1.1.0/samtools/glf.c bam2fastq-1.1.0/samtools/._glf.h bam2fastq-1.1.0/samtools/glf.h bam2fastq-1.1.0/samtools/._INSTALL bam2fastq-1.1.0/samtools/INSTALL bam2fastq-1.1.0/samtools/._kaln.c bam2fastq-1.1.0/samtools/kaln.c bam2fastq-1.1.0/samtools/._kaln.h bam2fastq-1.1.0/samtools/kaln.h bam2fastq-1.1.0/samtools/._khash.h bam2fastq-1.1.0/samtools/khash.h bam2fastq-1.1.0/samtools/._klist.h bam2fastq-1.1.0/samtools/klist.h bam2fastq-1.1.0/samtools/._knetfile.c bam2fastq-1.1.0/samtools/knetfile.c bam2fastq-1.1.0/samtools/._knetfile.h bam2fastq-1.1.0/samtools/knetfile.h bam2fastq-1.1.0/samtools/._kseq.h bam2fastq-1.1.0/samtools/kseq.h bam2fastq-1.1.0/samtools/._ksort.h bam2fastq-1.1.0/samtools/ksort.h bam2fastq-1.1.0/samtools/._kstring.c bam2fastq-1.1.0/samtools/kstring.c bam2fastq-1.1.0/samtools/._kstring.h bam2fastq-1.1.0/samtools/kstring.h bam2fastq-1.1.0/samtools/._Makefile bam2fastq-1.1.0/samtools/Makefile bam2fastq-1.1.0/samtools/._Makefile.mingw bam2fastq-1.1.0/samtools/Makefile.mingw bam2fastq-1.1.0/samtools/._misc bam2fastq-1.1.0/samtools/misc/ bam2fastq-1.1.0/samtools/NEWS bam2fastq-1.1.0/samtools/._razf.c bam2fastq-1.1.0/samtools/razf.c bam2fastq-1.1.0/samtools/._razf.h bam2fastq-1.1.0/samtools/razf.h bam2fastq-1.1.0/samtools/._razip.c bam2fastq-1.1.0/samtools/razip.c bam2fastq-1.1.0/samtools/._sam.c bam2fastq-1.1.0/samtools/sam.c bam2fastq-1.1.0/samtools/._sam.h bam2fastq-1.1.0/samtools/sam.h bam2fastq-1.1.0/samtools/._sam_header.c bam2fastq-1.1.0/samtools/sam_header.c bam2fastq-1.1.0/samtools/._sam_header.h bam2fastq-1.1.0/samtools/sam_header.h bam2fastq-1.1.0/samtools/._sam_view.c bam2fastq-1.1.0/samtools/sam_view.c bam2fastq-1.1.0/samtools/._samtools.1 bam2fastq-1.1.0/samtools/samtools.1 bam2fastq-1.1.0/samtools/samtools.txt bam2fastq-1.1.0/samtools/._win32 bam2fastq-1.1.0/samtools/win32/ bam2fastq-1.1.0/samtools/win32/._xcurses.h bam2fastq-1.1.0/samtools/win32/xcurses.h bam2fastq-1.1.0/samtools/win32/._zconf.h bam2fastq-1.1.0/samtools/win32/zconf.h bam2fastq-1.1.0/samtools/win32/._zlib.h bam2fastq-1.1.0/samtools/win32/zlib.h bam2fastq-1.1.0/samtools/misc/._blast2sam.pl bam2fastq-1.1.0/samtools/misc/blast2sam.pl bam2fastq-1.1.0/samtools/misc/._bowtie2sam.pl bam2fastq-1.1.0/samtools/misc/bowtie2sam.pl bam2fastq-1.1.0/samtools/misc/._export2sam.pl bam2fastq-1.1.0/samtools/misc/export2sam.pl bam2fastq-1.1.0/samtools/misc/._interpolate_sam.pl bam2fastq-1.1.0/samtools/misc/interpolate_sam.pl bam2fastq-1.1.0/samtools/misc/._Makefile bam2fastq-1.1.0/samtools/misc/Makefile bam2fastq-1.1.0/samtools/misc/._maq2sam.c bam2fastq-1.1.0/samtools/misc/maq2sam.c bam2fastq-1.1.0/samtools/misc/._md5.c bam2fastq-1.1.0/samtools/misc/md5.c bam2fastq-1.1.0/samtools/misc/._md5.h bam2fastq-1.1.0/samtools/misc/md5.h bam2fastq-1.1.0/samtools/misc/._md5fa.c bam2fastq-1.1.0/samtools/misc/md5fa.c bam2fastq-1.1.0/samtools/misc/._novo2sam.pl bam2fastq-1.1.0/samtools/misc/novo2sam.pl bam2fastq-1.1.0/samtools/misc/._psl2sam.pl bam2fastq-1.1.0/samtools/misc/psl2sam.pl bam2fastq-1.1.0/samtools/misc/._sam2vcf.pl bam2fastq-1.1.0/samtools/misc/sam2vcf.pl bam2fastq-1.1.0/samtools/misc/._samtools.pl bam2fastq-1.1.0/samtools/misc/samtools.pl bam2fastq-1.1.0/samtools/misc/._soap2sam.pl bam2fastq-1.1.0/samtools/misc/soap2sam.pl bam2fastq-1.1.0/samtools/misc/._varfilter.py bam2fastq-1.1.0/samtools/misc/varfilter.py bam2fastq-1.1.0/samtools/misc/._wgsim.c bam2fastq-1.1.0/samtools/misc/wgsim.c bam2fastq-1.1.0/samtools/misc/._wgsim_eval.pl bam2fastq-1.1.0/samtools/misc/wgsim_eval.pl bam2fastq-1.1.0/samtools/misc/._zoom2sam.pl bam2fastq-1.1.0/samtools/misc/zoom2sam.pl bam2fastq-1.1.0/samtools/examples/._00README.txt bam2fastq-1.1.0/samtools/examples/00README.txt bam2fastq-1.1.0/samtools/examples/._bam2bed.c bam2fastq-1.1.0/samtools/examples/bam2bed.c bam2fastq-1.1.0/samtools/examples/._calDepth.c bam2fastq-1.1.0/samtools/examples/calDepth.c bam2fastq-1.1.0/samtools/examples/._ex1.fa bam2fastq-1.1.0/samtools/examples/ex1.fa bam2fastq-1.1.0/samtools/examples/._ex1.sam.gz bam2fastq-1.1.0/samtools/examples/ex1.sam.gz bam2fastq-1.1.0/samtools/examples/._Makefile bam2fastq-1.1.0/samtools/examples/Makefile bam2fastq-1.1.0/samtools/examples/._toy.fa bam2fastq-1.1.0/samtools/examples/toy.fa bam2fastq-1.1.0/samtools/examples/._toy.sam bam2fastq-1.1.0/samtools/examples/toy.sam |
![]() |
![]() |
![]() |
#20 |
Member
Location: North Sea Join Date: Apr 2008
Posts: 34
|
![]()
Problem solved!
The download file at the webpage isn't working. I downloaded again from the "Version History" 1.1.0, is working. Maybe the admin didn't realize it. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|