Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with adding numerical sequence at the end of line

    Hi,

    Anyone has any idea how to get this:

    >no_name
    TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
    >no_name
    GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
    >no_name
    GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
    >no_name
    GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
    >no_name
    GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

    become this:

    >no_name_1
    TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
    >no_name_2
    GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
    >no_name_3
    GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
    >no_name_4
    GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
    >no_name_5
    GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

  • #2
    Here's one way using Perl. Save the text in a file named numbers.pl (or whatever). Usage would be:

    perl numbers.pl --in file_to_change.fasta --out revised_file.fasta


    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use Getopt::Long;
    
    my $inFile;
    my $outFile;
    
    GetOptions  ("in=s"      => \$inFile,
                 "out=s"      => \$outFile);
    
    if (!$inFile or !$outFile) {
        die "Must supply both infile and outfile as command line arguments.\n";
    }
    
    open(my $inFH, "<", $inFile) or die "couldn't open infile for reading.\n";
    if (-e $outFile) {
        die "Output file $outFile already exists--aborting so you don't overwrite.\n";
    }
    open(my $outFH, ">", $outFile) or die "couldn't open outfile for writing.\n";
        
    my $counter = 1;
    while (my $line = <$inFH>) {
        chomp $line;
        if ($line =~ /^(>.*)/) {
            print $outFH $1 . "_$counter\n";
            $counter++;
        } else {
            print $outFH "$line\n";
        }
    }
    Last edited by atcghelix; 09-26-2013, 09:57 PM. Reason: Edited to move $counter++ so that you didn't just get odd-numbered sequences

    Comment


    • #3
      Heres another way: R

      Code:
      library(seqinr)
      read.fasta("fastafile.fa")->fa
      write.fasta(fa,names=paste(getName(fa),1:5,sep="_"),file.out="fa_new_name.fa")
      where you swap '1:5' with '1:n', n being the number of sequences you have.

      Comment


      • #4
        Anyone know how to use AWK to do this task?

        Comment


        • #5
          Thanks. I am pretty weak in Perl. Do you have any idea using AWK to do this?

          Comment


          • #6
            What version of Awk are you running/what operating system?

            Comment


            • #7
              Running is UNIX

              Comment


              • #8
                This work? (It assumes all sequence strings are on a single line)

                Code:
                awk '{if($0 ~ /^>/){print $0"_"(NR+1)/2}else{print $0}}' input.fasta > changed.fasta
                Last edited by atcghelix; 09-26-2013, 11:33 PM. Reason: Less confusing regex

                Comment


                • #9
                  try this

                  Code:
                  paste - - < input.fa | awk ' { print $1"_"NR"\n"$2 } ' > output.fa
                  make sure to have spaces between the hyphens for 'paste'

                  Comment


                  • #10
                    Thank you everybody. I have done my task. =)

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 06:37 PM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Today, 06:07 PM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X