SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there a BED file format validator? Does a BED file have to be sorted position? LauraSmith Bioinformatics 3 05-21-2013 12:54 PM
Bowtie output to BED format polsum Bioinformatics 11 02-20-2013 12:00 AM
convert vcf into bed format nans_bn Bioinformatics 0 08-02-2011 12:34 AM
Question about using sra_toolkit to transform the SRA format into FASTQ format areyousad Bioinformatics 0 05-16-2010 11:56 PM
how to get target gene from BED format dwb0211 Bioinformatics 3 10-25-2009 08:26 PM

Reply
 
Thread Tools
Old 10-23-2010, 08:16 PM   #1
cliff
Member
 
Location: USA

Join Date: Oct 2009
Posts: 41
Default Question about BED format - chromStart and End

I am a bit confused about the chromStart and chromEnd positions in the BED format.

According to UCSC:
chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.

Assuming I download a bed file for a gene from UCSC as below:

chromStart: 300
chromEnd: 500

Now, I get a set of SNPs by mapping reads to hg18 and calling SNPs using whatever SNPcaller. I want to know how many SNPs were called within the above gene. Should I compare each SNP position with the gene range as

300<=SNP_POSITION<=500

or

301<=SNP_POSITION<=500

or

301<=SNP_POSITION<=499

?

Does anyone know which is correct?

Thanks
cliff is offline   Reply With Quote
Old 10-24-2010, 09:31 AM   #2
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

300 <= x < 500

so the first base is no. 300, the last base is no .499, and the range covers 200 bases.
Just as the doc says. You may imagine the mark being between the bases.
ffinkernagel is offline   Reply With Quote
Old 10-24-2010, 11:53 AM   #3
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

301st<=snp_position<=500th

EDIT:

BED is always 0-based. The first base in a sequence has coordinate 0 and therefore coordinate 300 denotes the 301st base. A more obvious example is

0 1

which denotes the first base.

Last edited by lh3; 10-25-2010 at 05:14 AM.
lh3 is offline   Reply With Quote
Old 10-25-2010, 04:38 AM   #4
Hena
Member
 
Location: Finland

Join Date: Nov 2009
Posts: 19
Default

It depends on how the range is defined: 0 based or 1 based positions? If one based, then 301 <= SNP_POSITION <= 500 is the range you had in the bed file.

Last edited by Hena; 10-25-2010 at 04:43 AM.
Hena is offline   Reply With Quote
Old 10-25-2010, 04:40 AM   #5
ndaniel
Member
 
Location: Helsinki

Join Date: Feb 2009
Posts: 33
Default

300 <= x < 500
ndaniel is offline   Reply With Quote
Old 10-25-2010, 06:50 AM   #6
ndaniel
Member
 
Location: Helsinki

Join Date: Feb 2009
Posts: 33
Default

Quote:
Originally Posted by lh3 View Post
301st<=snp_position<=500th

EDIT:

BED is always 0-based. The first base in a sequence has coordinate 0 and therefore coordinate 300 denotes the 301st base. A more obvious example is

0 1

which denotes the first base.
In case that everything is zero-based one has for example for range 3 (instead of 300) to 5 (instead of 500):


0123456789 <-- positions
---gg----- <-- gene in range [3,5)


therefore the correct answer is 300<=x<500!
ndaniel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO