Seqanswers Leaderboard Ad

**vivek_** · 01-14-2013, 02:16 PM

For unique bases:

Code:

cat <File.bed> | awk '{for(i=$1;i<$2;i++) print i}' | sort | uniq | wc -l

For all bases:

Code:

cat <File.bed> | awk '{for(i=$1;i<$2;i++) print i}' | wc -l

**rama** · 01-14-2013, 02:48 PM

Hi Vivek,

Thanks for your reply.

I am not familiar with awk, I tried your commands but both gave 0 as result.
here I have copied first few lines from my bed file.

chr1 43814962 43815050 . MPL
chr1 115256504 115256554 . NRAS
chr1 115258717 115258765 . NRAS
chr2 29432657 29432711 . ALK
chr2 29443688 29443741 . ALK
chr2 209113105 209113153 . IDH1

don't know if I should change any of the arguments with the command.

thanks again for you

**vivek_** · 01-14-2013, 02:55 PM

Hi,

Try these:

For unique bases:

Code:

cat yourFile.bed | awk '{for(i=$2;i<$3;i++) print $1"\t"i}' | sort | uniq | wc -l

For all bases:

Code:

cat yourFile.bed | awk '{for(i=$2;i<$3;i++) print $1"\t"i}' | wc -l

**rama** · 01-14-2013, 03:04 PM

Thank you Vivek, it worked like a charm.

just one more request, how do I group by chromosome to see the number of bases per chromosome covered by the target regions.

**vivek_** · 01-14-2013, 03:18 PM

This should work

Code:

cat yourFile.bed | awk '{for(i=$2;i<$3;i++) print $1"\t"i}' | sort | uniq | awk '{print $1}' | sort | uniq -c

**rama** · 01-15-2013, 08:21 AM

Thank you vivek.

**tedtoal** · 01-12-2017, 04:58 PM

vivek_'s solution works, but it is very slow for large BED files. Here's a short Perl program to do it, along with bedtools:

Code:

    RGN_SIZE='
        $chr = "";
        $sum = 0;
        $sumAll = 0;
        $start = 0;
        while ($ln = <STDIN>){
            chomp $ln;
            @A = split("\t", $ln);
            if ($A[0] ne $chr){
                if ($chr ne ""){
                    print("$chr\t$sum\n");
                    $sumAll += $sum;
                }
                $chr = $A[0];
                $sum = 0;
                $start = 0; # 0-based, next position after last one counted.
            }
            if ($A[1] > $start){ $start = $A[1]; }
            if ($A[2] > $start){ $sum += $A[2] - $A[1]; $start = $A[2]; }
        }
        print("$chr\t$sum\n");
        $sumAll += $sum;
        print("Total: $sumAll\n");
        '

    bedtools sort -i <bedfile_name> | perl -e "$RGN_SIZE"

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

calculate total number of unique positions in target regions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News