Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with adding numerical sequence at the end of line

    Hi,

    Anyone has any idea how to get this:

    >no_name
    TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
    >no_name
    GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
    >no_name
    GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
    >no_name
    GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
    >no_name
    GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

    become this:

    >no_name_1
    TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
    >no_name_2
    GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
    >no_name_3
    GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
    >no_name_4
    GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
    >no_name_5
    GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

  • #2
    Here's one way using Perl. Save the text in a file named numbers.pl (or whatever). Usage would be:

    perl numbers.pl --in file_to_change.fasta --out revised_file.fasta


    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use Getopt::Long;
    
    my $inFile;
    my $outFile;
    
    GetOptions  ("in=s"      => \$inFile,
                 "out=s"      => \$outFile);
    
    if (!$inFile or !$outFile) {
        die "Must supply both infile and outfile as command line arguments.\n";
    }
    
    open(my $inFH, "<", $inFile) or die "couldn't open infile for reading.\n";
    if (-e $outFile) {
        die "Output file $outFile already exists--aborting so you don't overwrite.\n";
    }
    open(my $outFH, ">", $outFile) or die "couldn't open outfile for writing.\n";
        
    my $counter = 1;
    while (my $line = <$inFH>) {
        chomp $line;
        if ($line =~ /^(>.*)/) {
            print $outFH $1 . "_$counter\n";
            $counter++;
        } else {
            print $outFH "$line\n";
        }
    }
    Last edited by atcghelix; 09-26-2013, 09:57 PM. Reason: Edited to move $counter++ so that you didn't just get odd-numbered sequences

    Comment


    • #3
      Heres another way: R

      Code:
      library(seqinr)
      read.fasta("fastafile.fa")->fa
      write.fasta(fa,names=paste(getName(fa),1:5,sep="_"),file.out="fa_new_name.fa")
      where you swap '1:5' with '1:n', n being the number of sequences you have.

      Comment


      • #4
        Anyone know how to use AWK to do this task?

        Comment


        • #5
          Thanks. I am pretty weak in Perl. Do you have any idea using AWK to do this?

          Comment


          • #6
            What version of Awk are you running/what operating system?

            Comment


            • #7
              Running is UNIX

              Comment


              • #8
                This work? (It assumes all sequence strings are on a single line)

                Code:
                awk '{if($0 ~ /^>/){print $0"_"(NR+1)/2}else{print $0}}' input.fasta > changed.fasta
                Last edited by atcghelix; 09-26-2013, 11:33 PM. Reason: Less confusing regex

                Comment


                • #9
                  try this

                  Code:
                  paste - - < input.fa | awk ' { print $1"_"NR"\n"$2 } ' > output.fa
                  make sure to have spaces between the hyphens for 'paste'

                  Comment


                  • #10
                    Thank you everybody. I have done my task. =)

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Investigating the Gut Microbiome Through Diet and Spatial Biology
                      by seqadmin




                      The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                      02-24-2025, 06:31 AM
                    • seqadmin
                      Quality Control Essentials for Next-Generation Sequencing Workflows
                      by seqadmin




                      Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                      Nucleic Acid Quality Control
                      Preparing for NGS starts with isolating the...
                      02-10-2025, 01:58 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    46 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-28-2025, 12:58 PM
                    0 responses
                    167 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-24-2025, 02:48 PM
                    0 responses
                    525 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-21-2025, 02:46 PM
                    0 responses
                    256 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X