SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tab delimited text files of gene counts ronaldrcutler Bioinformatics 6 06-17-2016 08:48 AM
BEDTools: Unexpected file format. Please use tab-delimited BED, GFF, or VCF. id0 Bioinformatics 16 02-22-2016 05:43 AM
Converting tab-delimited text file into HTML/PDF/latex/knitr report. Anil K Bioinformatics 2 06-19-2015 08:44 PM
creating tab-delimited output from unix tonybert Bioinformatics 2 01-27-2013 06:37 AM
Tab Delimited File Editors? (GFF to GTF) DrD2009 Bioinformatics 16 11-30-2012 04:52 AM

Reply
 
Thread Tools
Old 05-09-2018, 03:04 AM   #1
Joseph White
Junior Member
 
Location: USA

Join Date: Jun 2016
Posts: 5
Default best way to index tab-delimited text file

What is the best way to index a tab-delimited text file containing chromosome, position and variant data? My files are huge and too big to maintain in memory, so indexing seems the only viable option.

jwhite
Joseph White is offline   Reply With Quote
Old 05-09-2018, 03:18 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,745
Default

tabix from Samtools.
GenoMax is offline   Reply With Quote
Old 05-09-2018, 05:50 AM   #3
Joseph White
Junior Member
 
Location: USA

Join Date: Jun 2016
Posts: 5
Default

Quote:
Originally Posted by GenoMax View Post
tabix from Samtools.
The file is not in BED, GFF, or SAM format.
Joseph White is offline   Reply With Quote
Old 05-09-2018, 06:00 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,745
Default

Tabix can accept these formats: -p gff|bed|sam|vcf
GenoMax is offline   Reply With Quote
Old 05-09-2018, 06:11 AM   #5
Joseph White
Junior Member
 
Location: USA

Join Date: Jun 2016
Posts: 5
Default

Quote:
Originally Posted by GenoMax View Post
Tabix can accept these formats: -p gff|bed|sam|vcf
Oh, that's right. All I have to do is add a third column of '.' to make it VCF. Thanks.
Joseph White is offline   Reply With Quote
Old 05-12-2018, 09:57 PM   #6
finswimmer
Member
 
Location: Europe

Join Date: Oct 2016
Posts: 47
Default

Hello,

Quote:
Originally Posted by GenoMax View Post
Tabix can accept these formats: -p gff|bed|sam|vcf
these are just presets. One can define in which column the chromosome (aka sequence name), begin and end position are located.

So if you have the chromosome name in the first column, the position (begin == end) in the second column you can index like this:

Code:
tabix -s1 -b2 -e2 my_file.gz
This way, tabix provide a way to index each tab delimited file, which have sorted positional data. Also one can define whether the position is 0-base or 1-based give the parameter "-0" if it's 0-base.

fin swimmer
finswimmer is offline   Reply With Quote
Old 05-13-2018, 02:24 AM   #7
sam657
Junior Member
 
Location: Asia

Join Date: May 2018
Posts: 1
Default

These methods are still working for you?
sam657 is offline   Reply With Quote
Old 05-13-2018, 07:59 PM   #8
finswimmer
Member
 
Location: Europe

Join Date: Oct 2016
Posts: 47
Default

Quote:
Originally Posted by sam657 View Post
These methods are still working for you?
Yes, why they shouldn't? It's a documented feature. The -p parameter is just a shorthand for this. So in the case above, it would also work to use "-p vcf" as the chromosome name is column 1 and the position in column two like it is in a vcf. About the other columns tabix don't care. It doesn't check whether it is a valid vcf file.

fin swimmer
finswimmer is offline   Reply With Quote
Reply

Tags
index, tab-delimited

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO