Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unique bases from .gff file

    Hey all,

    I want to find the unique number of bases from the gff file. My problem here is the exons might be overlapping. I tried one algorithm but it failed.

    For example -
    Start Stop
    8 11
    9 14
    10 18
    15 20

    1. I took each Set A and compared with all the others
    2. if Set A overlapped with Set B, I took the minimum start point and maximum end point and marked the Set B as 'Done' ( i.e. do not consider them again ).
    3. After comparing Set A to all the coordinates, I found the bases in Set A. ( End - Start ) ( zero based )

    I repeated the process for all the co-ordinates.

    However, this fails.

    Can anyone please tell me some other way to do this.

    Thanks in advance,
    K
    Last edited by Khanjan; 11-30-2010, 07:49 AM.

  • #2
    Originally posted by Khanjan View Post
    Hey all,

    I want to find the unique number of bases from the gff file. My problem here is the exons might be overlapping. I tried one algorithm but it failed.

    For example -
    Start Stop
    8 11
    9 14
    10 18
    15 20
    The memory intensive but straight-forward way is to put all of the bases into one large array. E.g., something like this (sort of perlish code):

    @array=();
    foreach $line (@lines) {
    ($begin, $end) = split /\s+/, $line;
    $array[$base] = 1 foreach ($begin ... $end);
    }

    $count = 0;
    for $pos (0..$#array) {
    $count++ if $array[$pos] == 1
    }

    print "count is $count\n";

    ----
    I am sure there is better code (and I didn't test the above). And your language will vary. But the idea is dead simple.

    Comment


    • #3
      Thanks !

      Khanjan

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Innovations in Spatial Biology
        by seqadmin


        Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

        3D Genomics
        While spatial biology often involves studying proteins and RNAs in their...
        Yesterday, 07:30 PM
      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-30-2024, 01:35 PM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      41 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      40 views
      0 likes
      Last Post seqadmin  
      Working...
      X