Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • View and manipulate bigWig files

    Dear experts,

    I have downloaded the FAIRE-seq 'Signal' files that are in bigWig format from the UCSC Open Chromatin downloads to use them for further manipulation: I would like to extract the density values from these bigWig files for specific positions on the genome. Do you know how I can view these files, or convert them to a format that I can view and manipulated in for example cran R?

    Thank you.

  • #2
    See the 'Extracting Data from the bigWig Format' section of this page:

    Comment


    • #3
      Originally posted by francy View Post
      Dear experts,

      I have downloaded the FAIRE-seq 'Signal' files that are in bigWig format from the UCSC Open Chromatin downloads to use them for further manipulation: I would like to extract the density values from these bigWig files for specific positions on the genome. Do you know how I can view these files, or convert them to a format that I can view and manipulated in for example cran R?

      Thank you.
      Hi there, you can use bx-python to read BigFile (i.e. bigWig and bigBed). if foo.bigwig is your file you can

      Code:
      import bx.bbi.bigwig_file
      bwh = bx.bbi.bigwig_file.BigWigFile(open("foo.bigwig", "rb"))
      data = bwh.get_as_array(chrom, 0, csize)
      where chrom is a string for your chromosome and csize is integer for its size. Of course you can get a smaller interval each time. If you need to know chromosome size from the bigwig, it may be (without bx-python):

      Code:
      def getChromosomeSizesFromBigWig(bwname):
        csize = {}
        fh = open(os.path.expanduser(bwname), "rb")
        # read magic number to guess endianness
        magic = fh.read(4)
        if magic == '&\xfc\x8f\x88':
          endianness = '<'
        elif magic == '\x88\x8f\xfc&':
          endianness = '>'
        else:
          raise IOError("The file is not in bigwig format")
        # read the header
        (version, zoomLevels, chromosomeTreeOffset, 
        fullDataOffset, fullIndexOffset, fieldCount, definedFieldCount, 
        autoSqlOffset, totalSummaryOffset, uncompressBufSize, reserved) = struct.unpack(endianness + 'HHQQQHHQQIQ', fh.read(60))
        if version < 3:
          raise IOError("Bigwig files version <3 are not supported")
        # go to the data
        fh.seek(chromosomeTreeOffset)
        # read magic again
        magic = fh.read(4)
        if magic == '\x91\x8c\xcax':
          endianness = '<'
        elif magic == 'x\xca\x8c\x91':
          endianness = '>'
        else:
          raise ValueError("Wrong magic for this bigwig data file")
        (blockSize, keySize, valSize, itemCount, reserved) = struct.unpack(endianness + 'IIIQQ', fh.read(28))
        (isLeaf, reserved, count) = struct.unpack(endianness + 'BBH', fh.read(4))
        for n in range(count):
          (key, chromId, chromSize) = struct.unpack(endianness + str(keySize) + 'sII', fh.read(keySize + 2 * 4))
          # we have chrom and size
          csize[key.replace('\x00', '')] = chromSize
        return csize
      This is based on the specs released along with bigwig paper.

      HTH

      Comment


      • #4
        Originally posted by gringer View Post
        See the 'Extracting Data from the bigWig Format' section of this page:

        http://genome.ucsc.edu/goldenPath/help/bigWig.html
        I have tried using this but when I use the script 'bigWigToBedGraph' as described from UCSC downloads (bigWigToBedGraph wgEncodeOpenChromFaireGm12878Sig.bigWig out.bedGraph) the script takes very long time and then tells me that it crashed. I have also tried limiting the output by chromosome or position but it still freezes. Could it be because the bigWig file is too large? Do you know if there is a way to extract only density for certain SNPs from a list?

        Thank you.
        Last edited by francy; 01-24-2012, 02:09 AM.

        Comment


        • #5
          Originally posted by dawe View Post
          Hi there, you can use bx-python to read BigFile (i.e. bigWig and bigBed). if foo.bigwig is your file you can
          Thank you for this tip, I am trying the px-python now...In particular, I am trying to understand if bx-python allows extracting single SNPs density. If you have any idea could you please let me know?
          Thank you.

          Originally posted by dawe View Post
          This is based on the specs released along with bigwig paper.
          I am not too good with programming yet and I am having trouble understanding this script, could you please point me to the bigwig paper you are referring to so I can read more about this?
          Thank you again.

          Comment


          • #6
            If the output file size is a problem, then you can pipe to standard out by telling the program that '/dev/fd/1' is the output file:

            Code:
            $ bigWigToBedGraph -chrom=v31.005068 -start=11200 -end=20000 Irr_day3_B.bw /dev/fd/1 | head
            v31.005068	11253	11286	1
            v31.005068	11300	11328	1
            v31.005068	11328	11348	2
            v31.005068	11348	11376	1
            v31.005068	11533	11566	1
            v31.005068	11757	11805	2
            v31.005068	11817	11833	1
            v31.005068	11833	11839	3
            v31.005068	11839	11846	4
            v31.005068	11846	11872	6

            Comment


            • #7
              Originally posted by gringer View Post
              If the output file size is a problem, then you can pipe to standard out by telling the program that '/dev/fd/1' is the output file:
              Dear Gringer, thank you very much for your help.
              The script still hangs when I try this as you suggested:

              Code:
               bigWigToBedGraph -chrom=chr1 wgEncodeOpenChromFaireGm12878Sig.bigWig /dev/fd/1 | head
              The bigWig file that I have downloaded is 2.7 GB...

              Comment


              • #8
                Originally posted by francy View Post
                The script still hangs when I try this as you suggested:

                Code:
                 bigWigToBedGraph -chrom=chr1 wgEncodeOpenChromFaireGm12878Sig.bigWig /dev/fd/1 | head
                The bigWig file that I have downloaded is 2.7 GB...
                It may be faster if you specify both -start and -end. Assuming you don't have chromosomes with more than 1GB, this should work:

                Code:
                $ time bigWigToBedGraph -chrom=chr1 -start=1 -end=1000000000 wgEncodeOpenChromFaireGm12878Sig.bigWig /dev/fd/1 | head
                chr1	9999	10000	0.0028
                chr1	10000	10005	0.0029
                chr1	10005	10009	0.003
                chr1	10009	10013	0.0031
                chr1	10013	10018	0.0032
                chr1	10018	10022	0.0033
                chr1	10022	10027	0.0034
                chr1	10027	10031	0.0035
                chr1	10031	10036	0.0036
                chr1	10036	10041	0.0037
                
                real	0m7.691s
                user	0m5.708s
                sys	0m1.776s
                But then I re-ran this with no start/end points, and it took a similar length of time:
                Code:
                $ time bigWigToBedGraph -chrom=chr1 wgEncodeOpenChromFaireGm12878Sig.bigWig /dev/fd/1 | head
                chr1	9999	10000	0.0028
                chr1	10000	10005	0.0029
                chr1	10005	10009	0.003
                chr1	10009	10013	0.0031
                chr1	10013	10018	0.0032
                chr1	10018	10022	0.0033
                chr1	10022	10027	0.0034
                chr1	10027	10031	0.0035
                chr1	10031	10036	0.0036
                chr1	10036	10041	0.0037
                
                real	0m7.671s
                user	0m5.860s
                sys	0m1.596s

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Working...
                X