Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with adding numerical sequence at the end of line

    Hi,

    Anyone has any idea how to get this:

    >no_name
    TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
    >no_name
    GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
    >no_name
    GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
    >no_name
    GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
    >no_name
    GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

    become this:

    >no_name_1
    TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
    >no_name_2
    GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
    >no_name_3
    GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
    >no_name_4
    GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
    >no_name_5
    GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

  • #2
    Here's one way using Perl. Save the text in a file named numbers.pl (or whatever). Usage would be:

    perl numbers.pl --in file_to_change.fasta --out revised_file.fasta


    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use Getopt::Long;
    
    my $inFile;
    my $outFile;
    
    GetOptions  ("in=s"      => \$inFile,
                 "out=s"      => \$outFile);
    
    if (!$inFile or !$outFile) {
        die "Must supply both infile and outfile as command line arguments.\n";
    }
    
    open(my $inFH, "<", $inFile) or die "couldn't open infile for reading.\n";
    if (-e $outFile) {
        die "Output file $outFile already exists--aborting so you don't overwrite.\n";
    }
    open(my $outFH, ">", $outFile) or die "couldn't open outfile for writing.\n";
        
    my $counter = 1;
    while (my $line = <$inFH>) {
        chomp $line;
        if ($line =~ /^(>.*)/) {
            print $outFH $1 . "_$counter\n";
            $counter++;
        } else {
            print $outFH "$line\n";
        }
    }
    Last edited by atcghelix; 09-26-2013, 09:57 PM. Reason: Edited to move $counter++ so that you didn't just get odd-numbered sequences

    Comment


    • #3
      Heres another way: R

      Code:
      library(seqinr)
      read.fasta("fastafile.fa")->fa
      write.fasta(fa,names=paste(getName(fa),1:5,sep="_"),file.out="fa_new_name.fa")
      where you swap '1:5' with '1:n', n being the number of sequences you have.

      Comment


      • #4
        Anyone know how to use AWK to do this task?

        Comment


        • #5
          Thanks. I am pretty weak in Perl. Do you have any idea using AWK to do this?

          Comment


          • #6
            What version of Awk are you running/what operating system?

            Comment


            • #7
              Running is UNIX

              Comment


              • #8
                This work? (It assumes all sequence strings are on a single line)

                Code:
                awk '{if($0 ~ /^>/){print $0"_"(NR+1)/2}else{print $0}}' input.fasta > changed.fasta
                Last edited by atcghelix; 09-26-2013, 11:33 PM. Reason: Less confusing regex

                Comment


                • #9
                  try this

                  Code:
                  paste - - < input.fa | awk ' { print $1"_"NR"\n"$2 } ' > output.fa
                  make sure to have spaces between the hyphens for 'paste'

                  Comment


                  • #10
                    Thank you everybody. I have done my task. =)

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Non-Coding RNA Research and Technologies
                      by seqadmin




                      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                      Nobel Prize for MicroRNA Discovery
                      This week,...
                      10-07-2024, 08:07 AM
                    • seqadmin
                      Recent Developments in Metagenomics
                      by seqadmin





                      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                      09-23-2024, 06:35 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 07:29 AM
                    0 responses
                    14 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-15-2024, 06:35 AM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-14-2024, 02:44 PM
                    0 responses
                    12 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-11-2024, 06:55 AM
                    0 responses
                    19 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X