I was wondering what the best tool is to use if you have a set of SNPs and a list of intervals on chromosomes, how do you determine how many SNPs are in each interval? So far I have been writing my own code, which has been very inefficient. It takes days to run. I was wondering if there is a program that will do this?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
If you have the chromosome co-ordinates of your SNPs, you could use Bedtools.
Save your list of SNPs in bed format as SNP.bed
chr1 100 101 rs1
chr1 105 106 rs2
chr1 110 111 rs3
chr1 5000 5001 rs_not_in_interval
chr2 100 101 rs4
chr2 105 106 rs5
chr2 110 111 rs6
chr2 120 121 rs7
chr2 400 401 rs_not_in_interval
Save your list of intervals in bed format as Intervals.bed
chr1 99 120
chr2 11 130
Then use bedtools intersectBed:
Code:intersectBed -a SNP.bed -b Intervals.bed -wb >SNPs.in.intervals.bed
chr1 100 101 rs1 chr1 99 120
chr1 105 106 rs2 chr1 99 120
chr1 110 111 rs3 chr1 99 120
chr2 100 101 rs4 chr2 11 130
chr2 105 106 rs5 chr2 11 130
chr2 110 111 rs6 chr2 11 130
chr2 120 121 rs7 chr2 11 130
To go one further and count how many SNPs are in each interval:
Code:intersectBed -a SNP.bed -b Interval.bed -wb | awk -F"\t" '{print$5" "$6" "$7}' | uniq -c
4 chr2 11 130
First column gives counts of SNPs in each intervalLast edited by rbagnall; 02-05-2014, 06:49 PM.
-
that's what I usually do
Originally posted by rbagnall View PostIf you have the chromosome co-ordinates of your SNPs, you could use Bedtools.
Save your list of SNPs in bed format as SNP.bed
chr1 100 101 rs1
chr1 105 106 rs2
chr1 110 111 rs3
chr1 5000 5001 rs_not_in_interval
chr2 100 101 rs4
chr2 105 106 rs5
chr2 110 111 rs6
chr2 120 121 rs7
chr2 400 401 rs_not_in_interval
Save your list of intervals in bed format as Intervals.bed
chr1 99 120
chr2 11 130
Then use bedtools intersectBed:
Code:intersectBed -a SNP.bed -b Intervals.bed -wb >SNPs.in.intervals.bed
chr1 100 101 rs1 chr1 99 120
chr1 105 106 rs2 chr1 99 120
chr1 110 111 rs3 chr1 99 120
chr2 100 101 rs4 chr2 11 130
chr2 105 106 rs5 chr2 11 130
chr2 110 111 rs6 chr2 11 130
chr2 120 121 rs7 chr2 11 130
To go one further and count how many SNPs are in each interval:
Code:intersectBed -a SNP.bed -b Interval.bed -wb | awk -F"\t" '{print$5" "$6" "$7}' | uniq -c
4 chr2 11 130
First column gives counts of SNPs in each interval
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:49 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 11:49 AM
|
||
Started by seqadmin, Yesterday, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment