SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   How to quick index the sam record according to the read name? (http://seqanswers.com/forums/showthread.php?t=5868)

genelab 07-08-2010 05:06 AM

How to quick index the sam record according to the read name?
 
How to quick index the sam record according to the read name?

assum i hava a read which named "afaNma_1", and this read has record in the sam format file;

I want to index and get the samRecord of read "afaNma_1" in this sam file quickly, Can anyone tell me how should i do?


Thanks

Bio.X2Y 07-08-2010 11:28 AM

I'm not aware of anything convenient for doing this, but someone else might be able to shed light.

If you are comfortable with programming, I'd sort the file by name, leaving the header records on top (samtools probably has something for this). Then write a program to pluck out your target record using a binary search algorithm (http://en.wikipedia.org/wiki/Binary_search_algorithm). Java has a RandomAccessFile class for quickly accessing arbitary file bytes, though I'm sure other languages have their equivalents. The tricky part will be finding the start of a record containing an arbitary byte - you will have to work backwards to find a newline or the start-of-file.

I know this isn't creating an index, but it should be lightning fast for practical purposes.

brentp 07-09-2010 08:17 AM

you can use this
http://github.com/brentp/bio-playgro...ter/fileindex/
if you have python and tokyo-cabinet

maubp 07-13-2010 04:46 AM

I have some still experimental code for SAM/BAM with indexing by name for Biopython here: http://github.com/peterjc/biopython/...-sam-bam-index


All times are GMT -8. The time now is 10:45 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.