![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Beginner question about Picard AddOrReplaceReadGroups | yeppomonkey | Bioinformatics | 1 | 09-03-2012 11:59 PM |
Bowtie beginner question... | milesgr | General | 6 | 03-14-2012 07:21 AM |
samtools sort bam file error: generate non-existent file | mediator | Bioinformatics | 0 | 03-05-2012 09:42 PM |
Beginner Sequencing Analysis question | mgibson | General | 0 | 06-17-2011 10:30 AM |
1000 genomes - a beginner question | EHC | Bioinformatics | 3 | 05-26-2011 01:51 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Germany Join Date: Sep 2012
Posts: 5
|
![]()
I found a really interesting published ChIP-Seq file I'd like to load into UCSCs genome browser. The file has the following content
(just showing the first few lines) chr11 41481370 41481395 U0 0 + chr15 93881170 93881195 U0 0 - chr17 70890479 70890504 U0 0 + chr2 170407262 170407287 U0 0 - chr17 33602056 33602081 U0 0 + chr11 92554663 92554688 U0 0 - chr14 22768281 22768306 U0 0 + chr14 28429583 28429608 U0 0 - chr14 98610846 98610871 U0 0 + chr7 100735751 100735776 U0 0 + chr9 46391869 46391894 U0 0 + So apparently a list of single 25 base reads. All my other files I loaded into UCSC had the following structure: track type=wiggle_0 name="DHS WT naive" color=0,0,0 chr1 4486200 4486399 0.3 chr1 4486400 4486599 0.8 chr1 4486600 4486799 0.8 chr1 4486800 4486999 0.4 chr1 4758400 4758599 1.2 chr1 4758600 4758799 1.2 chr1 4758800 4758999 0.2 chr1 4759000 4759199 0.2 chr1 4759600 4759799 0.2 Apparently, some sort of sum was created from the single reads. What tools do I need to create the former file into the latter? Windows-based would be nice, but could also be on Linux. My aplogies if this was already mentioned, but I didn't find anything during my search for the problem. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Vancouver, BC Join Date: Mar 2010
Posts: 275
|
![]()
You will need to figure out which field in the first file is the value you want in your wiggle file (I just assumed it was the 5th field below). If you have a file called "Chip-Seq.txt" you could do something like:
Code:
grep "chr11" Chip-Seq.txt | cut -f1-3,5 > Chip-Seq_chr11.wig |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Germany Join Date: Sep 2012
Posts: 5
|
![]()
Thanks for your answer, SES.
The thing is, it's not just a matter of re-formatting. When I sort the file of interest in terms of chromosome and read-position, it looks like this (Sorry for not mentioning this before): chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3007329 3007354 U0 0 + chr1 3013242 3013267 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3053811 3053836 U0 0 + chr1 3053930 3053955 U0 0 - chr1 3053930 3053955 U0 0 - chr1 3053930 3053955 U0 0 - So the task is: Realize, that chr1 3001356 to 3001381 is measured 4 times and give an according score etc. Any advice how to continue? PS: I got the original file from http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE27158 |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Vancouver, BC Join Date: Mar 2010
Posts: 275
|
![]()
Thanks for the information. What are you wanting to do with the 4 scores for each position? From this snippet it looks like the scores are the same and you could just remove the duplicate entries in the original file. Is this what you want? If so, you could just add one command to the original (assuming you are on a *nix machine).
Original file: Code:
$ cat Chip-Seq.txt chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3007329 3007354 U0 0 + chr1 3013242 3013267 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3053811 3053836 U0 0 + chr1 3053930 3053955 U0 0 - chr1 3053930 3053955 U0 0 - chr1 3053930 3053955 U0 0 - Code:
$ uniq Chip-Seq.txt | grep "chr1" | cut -f1-3,5 > Chip-Seq_chr1_uniq.wig chr1 3001356 3001381 0 chr1 3007329 3007354 0 chr1 3013242 3013267 0 chr1 3016493 3016518 0 chr1 3053811 3053836 0 chr1 3053930 3053955 0 |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]()
The data is in BED format with one line per alignment. You can upload it to UCSC as a BED file but to get it in wiggle format you need to use some type of peak finder. There are online tools that can use the SRA URL, I know cistrome.org accepts BED file and can produce wiggle files using the MACS peak finder.
|
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Germany Join Date: Sep 2012
Posts: 5
|
![]()
For me as a layperson it seems that the number of reads does correspond to the signal intensity. Is that correct?
The file comes from a paper where they performed chromatin-immunoprecipitation for STAT5, followed by sequencing. Is it a valid approach to count the number of individual reads? Code:
chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3007329 3007354 U0 0 + chr1 3013242 3013267 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3016493 3016518 U0 0 - Code:
chr1 3001356 3001381 4 chr1 3007329 3007354 1 chr1 3013242 3013267 1 chr1 3016493 3016518 3 Thanks for your support so far, SES! |
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: Vancouver, BC Join Date: Mar 2010
Posts: 275
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]()
You want to get the count of all reads surrounding a bindingsite, in your example you will just count all reads with the same alignment position. Email the author and ask him to send the processed data, if it is not available as supplementary to the article.
|
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: Vancouver, BC Join Date: Mar 2010
Posts: 275
|
![]() Quote:
Call this script "bed2wig.pl" (or whatever you want): Code:
#!/usr/bin/env perl use strict; use warnings; use Getopt::Long; my $usage = "$0 -i BEDfile -o wiggle_file\n"; my $infile; my $outfile; GetOptions( 'i|infile=s' => \$infile, 'o|outfile=s' => \$outfile, ); die $usage if !$infile or !$outfile; open(my $in, '<', $infile) or die "\nERROR: Could not open file: $infile\n"; open(my $out, '>', $outfile) or die "\nERROR: Could not open file: $outfile\n"; my %mapped; print $out "track type=wiggle_0 name=\"DHS WT naive\" color=0,0,0\n"; # change this line to whatever you need while (my $line = <$in>) { chomp $line; my @line = split('\s+', $line); my $read = join("|",@line); $mapped{$read}++; } for my $key (sort keys %mapped) { my @pos = split(/\|/, $key); print $out join("\t",($pos[0],$pos[1],$pos[2],$mapped{$key})),"\n"; } close($in); close($out); Code:
perl bed2wig.pl -i Chip-Seq.txt -o Chip-Seq.wig Code:
$ cat Chip-Seq.txt chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3001356 3001381 U0 0 - chr1 3007329 3007354 U0 0 + chr1 3013242 3013267 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3016493 3016518 U0 0 - chr1 3016493 3016518 U0 0 - Code:
$ cat Chip-Seq.wig track type=wiggle_0 name="DHS WT naive" color=0,0,0 chr1 3001356 3001381 4 chr1 3007329 3007354 1 chr1 3013242 3013267 1 chr1 3016493 3016518 3 |
|
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Germany Join Date: Sep 2012
Posts: 5
|
![]()
Thanks for your ideas.
I'll start with cistrome.org and see how far I get! |
![]() |
![]() |
![]() |
#11 |
Member
Location: Boston, MA Join Date: May 2009
Posts: 75
|
![]()
You can use igvtools to compute coverage from a bed file. I think bedtools can also do this.
|
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: Germany Join Date: Sep 2012
Posts: 5
|
![]()
My problem is solved, many thanks for your advice.
www.cistrome.org offers a GUI and is just the right thing, if one just wants to convert a couple of BED files. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|