Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to split BED file according to chromsome

    Does anyone know a program which can split BED file according to the chromosome? I have generate a BED file which contains the data for all chromosome, but it is not sorted. When I did sorting using BedSort, the output was not ordered according the numeric order, it always give chr10 on the top and then followed chr11, up to chr19. It seems I have to do the sorting for each chr respectively, I wonder whether there is a program which can split BED file according to the chromosome. Thanks

  • #2
    You could try the following with your bed file:

    Code:
    sort -k 1V,1 -k 2n,2 file.bed -o file.sorted.bed
    if you want to split your bed file you could do with bash:

    Code:
    mkdir -p split_results
    for chr in `cut -f 1 file.bed | sort | uniq`; do
                    grep -w $chr file.bed > split_results/$chr.output.bed
    
    done

    Comment


    • #3
      An alternative:
      Code:
      awk '{close(f);f=$1}{print > f".bed"}'

      Comment


      • #4
        Similar to adamdeluca's suggestion, here is another simple awk solution. Note that the ">>" creates and appends to files named CHROM.bed, where CHROM is column 1 of the bed input bed file (in this case, example.bed).

        So, in plain English, the awk command prints each entire line ($0) from example.bed to distinct files that are each named by the chrom field ($1).

        This strategy is useful in many other cases where you want to do a context-based "grep", and route the results to distinct files.

        Code:
        $ awk '{print $0 >> $1".bed"}' example.bed
        
        $ ls -1 *.bed
        chr1.bed
        chr2.bed
        ... (snip)
        chrY.bed
        example.bed
        arq

        Comment


        • #5
          Thank you !

          Many thanks to you guys! I have worked it out.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          57 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          56 views
          0 likes
          Last Post seqadmin  
          Working...
          X