I was wondering what the best tool is to use if you have a set of SNPs and a list of intervals on chromosomes, how do you determine how many SNPs are in each interval? So far I have been writing my own code, which has been very inefficient. It takes days to run. I was wondering if there is a program that will do this?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
If you have the chromosome co-ordinates of your SNPs, you could use Bedtools.
Save your list of SNPs in bed format as SNP.bed
chr1 100 101 rs1
chr1 105 106 rs2
chr1 110 111 rs3
chr1 5000 5001 rs_not_in_interval
chr2 100 101 rs4
chr2 105 106 rs5
chr2 110 111 rs6
chr2 120 121 rs7
chr2 400 401 rs_not_in_interval
Save your list of intervals in bed format as Intervals.bed
chr1 99 120
chr2 11 130
Then use bedtools intersectBed:
Code:intersectBed -a SNP.bed -b Intervals.bed -wb >SNPs.in.intervals.bed
chr1 100 101 rs1 chr1 99 120
chr1 105 106 rs2 chr1 99 120
chr1 110 111 rs3 chr1 99 120
chr2 100 101 rs4 chr2 11 130
chr2 105 106 rs5 chr2 11 130
chr2 110 111 rs6 chr2 11 130
chr2 120 121 rs7 chr2 11 130
To go one further and count how many SNPs are in each interval:
Code:intersectBed -a SNP.bed -b Interval.bed -wb | awk -F"\t" '{print$5" "$6" "$7}' | uniq -c
4 chr2 11 130
First column gives counts of SNPs in each intervalLast edited by rbagnall; 02-05-2014, 06:49 PM.
-
that's what I usually do
Originally posted by rbagnall View PostIf you have the chromosome co-ordinates of your SNPs, you could use Bedtools.
Save your list of SNPs in bed format as SNP.bed
chr1 100 101 rs1
chr1 105 106 rs2
chr1 110 111 rs3
chr1 5000 5001 rs_not_in_interval
chr2 100 101 rs4
chr2 105 106 rs5
chr2 110 111 rs6
chr2 120 121 rs7
chr2 400 401 rs_not_in_interval
Save your list of intervals in bed format as Intervals.bed
chr1 99 120
chr2 11 130
Then use bedtools intersectBed:
Code:intersectBed -a SNP.bed -b Intervals.bed -wb >SNPs.in.intervals.bed
chr1 100 101 rs1 chr1 99 120
chr1 105 106 rs2 chr1 99 120
chr1 110 111 rs3 chr1 99 120
chr2 100 101 rs4 chr2 11 130
chr2 105 106 rs5 chr2 11 130
chr2 110 111 rs6 chr2 11 130
chr2 120 121 rs7 chr2 11 130
To go one further and count how many SNPs are in each interval:
Code:intersectBed -a SNP.bed -b Interval.bed -wb | awk -F"\t" '{print$5" "$6" "$7}' | uniq -c
4 chr2 11 130
First column gives counts of SNPs in each interval
Comment
Latest Articles
Collapse
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
-
by seqadmin
The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.
Avian Conservation
Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...-
Channel: Articles
03-08-2024, 10:41 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-27-2024, 06:37 PM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
03-27-2024, 06:37 PM
|
||
Started by seqadmin, 03-27-2024, 06:07 PM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
03-27-2024, 06:07 PM
|
||
Started by seqadmin, 03-22-2024, 10:03 AM
|
0 responses
53 views
0 likes
|
Last Post
by seqadmin
03-22-2024, 10:03 AM
|
||
Started by seqadmin, 03-21-2024, 07:32 AM
|
0 responses
69 views
0 likes
|
Last Post
by seqadmin
03-21-2024, 07:32 AM
|
Comment