Seqanswers Leaderboard Ad

**dpryan** · 12-06-2011, 02:58 PM

You could largely convert a BED format file to ELAND format. BED format files don't usually contain anything about mismatches to the reference sequence, so you'd have to fudge that. Also, you'd have to look up the sequence for each read, though that's trivial. Frankly, those are the biggest differences in the formats and I doubt that any of the peak finders actually care about those fields. So, in short, yeah, you could probably convert the file type enough to work with a one line command using awk.

**sp_wade** · 12-06-2011, 05:16 PM

Hi, dpryan
It really do make sense. I tried to fudge those data and found it do have no effect on the called peaks.
Thx very much.

**Pravara_@bioinformatics** · 12-12-2011, 03:54 AM

Dear sir

i am working with chip-seq data.sir i have tried with SISSRS,QuEST,MACS,SICER.

Sir my problme is like ,i am not able to recognize files...like there are several file formats with me..all are chip-seq data...but i don't know whether this all files can i used with all softwares what i mentioned above ..sir please let me know what kind of data is this???

i know chip-seq data always present in following format

chr4 130135336 130135360 U0 0 -
chr1 110547319 110547343 U0 0 -
chr10 63922216 63922240 U0 0 -
chr2 71081880 71081904 U0 0 +

I used SISSRS for such files (bed files)

now there are other formats also like

1 E2H2.aligned.txt

chr13 81419432 81419468 + 205E9.6.559265 2
chr11 44462781 44462817 + 205E9.6.559267 0
chr1 89426606 89426642 - 205E9.6.559270 3
chr12 103518323 103518359 - 205E9.6.559271 0
chrX 128953935 128953971 - 205E9.6.559272 2
chr19 4888146 4888182 - 205E9.6.559274 5
chr4 137770387 137770423 + 205E9.6.559275 1

2.densities.txt

chr1 25 -1
chr1 50 -1
chr1 75 -1
chr1 100 -1
chr1 125 -1
chr1 150 -1
chr1 175 -1
chr1 200 -1

3.chip3034_multi_hg18.txt

AGAGTGTTTCAAACCTGCTCCATGAA 13000 13
AGACGAAGTCTCACTCTGTCACCCAG 13000 164
ATTCCATTCCACTCTGTTCCATTCCA 11953 24
AGTAACCCTTATTCTACTTAATAATG 13000 2
ATGGTAGTTCACACCTATAATCCCCG 11953 11
ATTGGCCAGATGCAGAGGCTCACACC 11953 9
ATAGCACAAAGGCAATAACACTTAAT 10906 3

i used this file format for QuEST

4.bed file

chr1 454 489 CCTAACCCTAACCCTCGCGGTACCCTCAGCCGGCC 0 + - - 0,0,255
chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGTTCTC 0 - - - 255,0,0
chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGCTCTC 0 - - - 255,0,0
chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGTTCTC 0 - - - 255,0,0

5.bam files(these files are not opening in my system)

6.bed files .

6 38662156 38662189 +
8 102050882 102050916 +
16 16805607 16805640 -
10 18950674 18950708 -
4 52586623 52586657 -
8 126508725 126508748 -
5 83713731 83713758 +
1 217224630 217224664 -
2 234129500 234129531 -
5 116295091 116295124 -
17 36024302 36024336 -

7..bed files

chr1 564621 564687 . 0 . 5.575970 3.58854 -1
chr1 569893 569962 . 0 . 7.441230 6.19321 -1
chr1 712868 713455 . 0 . 11.857200 11.4429 -1
chr1 713653 713670 . 0 . 7.278470 4.21542 -1
chr1 713880 714756 . 0 . 87.115402 246.909 -1
chr1 715081 715443 . 0 . 18.861601 21.5467 -1
chr1 761030 763152 . 0 . 99.675797 201.571 -1

8.peaks.txt

chr1 6216808 6219103 985 186 5.29979577395856 799 1.34744732317805e-129
chr6 158010381 158011325 686 65 10.5893955160332 621 1.43057401891788e-129
chr5 33110401 33111074 644 51 12.7903624851984 593 1.50406065933793e-129
chr3 197589215 197590103 652 54 12.2534188623185 598 3.17417576226315e-129
chr3 150539977 150541729 852 129 6.62571157437829 723 3.84605198529492e-129

9.bed file

chr1 5319 6069
chr1 15612 16329
chr1 81077 82406
chr1 227508 228733
chr1 456299 456770
chr1 477582 478232
chr1 501635 501985
chr1 584463 586213

10.bed file

chr14 68535052 68535087 Neg2 1 - 68535052 68535087 153,255,153
chr10 72774109 72774144 Neg3 1 - 72774109 72774144 153,255,153
chr6 163049829 163049864 Pos4 14 + 163049829 163049864 0,0,102
chr7 144599649 144599684 Neg5 1 - 144599649 144599684 153,255,153
chr9 106823345 106823380 Pos6 1 + 106823345 106823380 153,153,255

**dpryan** · 12-12-2011, 05:14 AM

Originally posted by Pravara_@bioinformatics View Post

i know chip-seq data always present in following format

chr4 130135336 130135360 U0 0 -
chr1 110547319 110547343 U0 0 -
chr10 63922216 63922240 U0 0 -
chr2 71081880 71081904 U0 0 +

I used SISSRS for such files (bed files)

As you're finding out, there are a LOT of different file formats. Most of these are interchangeable. BED format can have anywhere between 3 and 12 columns. You tend to find data with the first 6 columns, but if you find pre-aligned paired-end sequences, they may have only the first 3 (required) columns. Also, this is all pre-aligned data as raw data will tend to be in fastq format.

I'm assuming you're getting these datasets from GEO. If so, the formats of the files are normally described there. Otherwise, #1-3 I'm not familiar with. #4 is a BED format file, you could use this in SISSRS like above. #5 is a BAM format file, that can be directly used in things like MACS and can also be converted to BED using bamtools if whatever program you prefer can't use BAM format. #6 looks like a modified BED format, it's actually close to the format I usually keep things in. I imagine you can put a "chr" in front of the number in the first column and add two columns of periods between columns 3 and 4 to make it a usable BED file. #7 and #8 look like the output of a peak finder. #9 is probably also the output of a peak finder, since the regions are quite broad and there's no strand information. #10 is another BED file. Presumably it was intended for visualization in the genome browser since someone bothered to fill in the itemRgb field.

BTW, it's probably best to only compare results within a single peak caller. Otherwise, differences in peaks you see between datasets may be due solely to the different algorithms behind the peak callers. Also, it can sometimes be easier to just realign things yourself and thereby produce a BED or BAM format file, since that's pretty quick.

**sikidiri** · 03-08-2012, 06:43 AM

technical or biological difference between the two dataset?

Hello,

I have two chip-seq samples for the same protein in embryonic stem (ES) cells and rationic acid induced cells. I have obtained around 800 peaks in ES cells and around 7500 peaks in induced cells. Protocol, antibody, peak calling paramteres (MACS) and the person who has done the the experiments are all same. Number of reads obtained in both the samples is similar with similar level of background. If I see peaks in my new dataset, it has good enrichment as compared to the old one at the same region (~50% higher enrichment). I want to know, is this the real biological difference or because of deep sequencing, in the new data set I see good enrichment of tags which is not seen in the old dataset. How to rule out any technical problems, if there are any? Any suggestions are most welcome. Thanks

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

problem on file format in ChIP-Seq data analysis

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News