Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract several DNA ranges from reference sequence. (need help)

    Hello everybody

    I am trying to find a ready Perl scrip, or any equivalent solution, to help me my data analysis task.

    I need a code that takes a CVS file contains DNA ranges, and extract them from two reference sequences, then append them, and write them to a fasta file.

    For example:

    DNA ref. Sequence 1:
    AAAAAGGGGG

    DNA ref. Sequence 2:
    CCCCCTTTTTT

    The CVS file contains five columns, the first is the name of that particular range, next two columns define the range from the first sequence, i.e. where to start extraction and where to end it, and the last two describe the-the range from the second sequence, for example:

    seq1 1 5 5 10
    seq2 5 10 1 5
    seq3 1 6 4 10

    The Perl script output will be

    >seq1
    AAAAATTTTT

    >seq2
    GGGGGCCCCC

    >seq3
    AAAAAGCCTTTTTT

    The only similar tool I found is the DNA range extractor, part of Sequence Manipulation Suite. However, it can extract only one range per time per sequence, which makes it unsuitable for extracting hundreds of ranges.

    Many thanks
    Fadi

  • #2
    This seems like exactly the type of beginning Perl script that would be worth your time to figure out...

    Comment


    • #3
      Thank you fanli, I appreciate your suggestion.

      I am in the final stage of my research, and wanted to look around if there is any ready code that can help me, so I can save myself some time for another task...

      In regards to the code's complexity, I am not sure if we can assume that this is a beginners' task especially for a biologist who has no bioinformatics background.

      In all cases, here what I wrote to do the job. This extracts two sequences (defined by their location ) of 10 nts each from the ref sequences and appends them in one sequence.

      Code:
      #!/ usr/bin/perl
      # subtraction.pl
      use strict; use warnings;
      
      my $seq1="AAAAATTTTTAAAAAAATTTATATAGGAGAGAGAGAGACCCAAAAATATAA";
      my $seq2="aaaaatttttaaaaaaatttatataggagagagagagacccaaaaatataa";
      
      while (<>) {
      	
      	my @locations = split(/\t/, $_);
      	my $seqName = $locations[0];	
      	my $seq1Start = $locations[1]-10;
      	my $seq1End = $locations[1]-1;
      	my $seq2Start = $locations[2]-1;
      	my $seq2End = $locations[2]+10;
      	
      	my $seq1_part = substr $seq1, $seq1Start, 10; 
      	my $seq2_part = substr $seq2, $seq2Start, 10;   
      		
      	print ">$seqName\n $seq1_part$seq2_part\n";
      It takes location from a tab separated values file (seqName start end)
      Code:
      1	10	20
      2	20	40
      3	30	50

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X