Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • acyrocks
    Junior Member
    • Sep 2012
    • 3

    compare two Fasta files

    Hi,
    I am new in NGS and have very little knowledge in tools used for analyzing sequences generated by NGS. Here, I have a problem and think somebody with good perl script skill may be able to help me out:
    I have two Fasta files like below:

    File1:
    >a
    GVKKDVKCTTTGGG
    .
    .
    >f
    AAATTTGGGCCCEEE
    >g
    SSSGGGYYYTTTGTFR
    .
    .
    >x
    DDDGGGYYYTTTGTFR
    .
    .
    .

    File2:
    1>
    GVKKDVKCTTTGGG
    .
    .
    >41
    FFFGGGYYYTTTGTFR
    .
    .
    >200
    AAATTTGGGCCCEEE
    .
    .
    >1000
    SSSGGGYYYTTTGTFR
    .
    .
    .

    Many but not all sequences are identical in these two files. I would like to compare each sequence of the first file with the second file and make a following table:

    a GVKKDVKCTTTGGG 1
    f AAATTTGGGCCCEEE 200
    g SSSGGGYYYTTTGTFR 1000
    x DDDGGGYYYTTTGTFR 0
    .
    .
    .

    In the table, the first column is the header of each sequence of the first file. The second column is each sequence of the first file and the third column is the header of the second file with the identical sequence with the first file. If there is no sequence identical in the second file, then use number zero instead.
    Appreciate if someone can help me out.
    Thanks a lot.
    Acyrocks
  • Torst
    Senior Member
    • Apr 2008
    • 275

    #2
    Is this a homework question?

    Comment

    • krobison
      Senior Member
      • Nov 2007
      • 734

      #3
      It sounds like one, but in any case it is pretty simple to hash this out in not much code :-)

      Comment

      • Torst
        Senior Member
        • Apr 2008
        • 275

        #4
        Agreed, I think you've given him the key idea.

        Comment

        • acyrocks
          Junior Member
          • Sep 2012
          • 3

          #5
          Thanks for replying. No, it's not a home work. I don't have much Perl knowledge, except have read couple of chapters and know how to run a script. I may spend more time in future to learn it if I found some time. If you guys can help me out, I appreciate it.

          Comment

          • Torst
            Senior Member
            • Apr 2008
            • 275

            #6
            Do you know how to program in any particular language? Perhaps we could assist you to implement it in a language you have experience in. If not, someone will probably give you a solution to try.

            Comment

            • rahularjun86
              Member
              • Jan 2011
              • 58

              #7
              Dear acyrocks,
              Please run the attached code using following command(unix):
              perl for_seqanswer_fasta_formatter.pl file1.fasta file2.fasta out.fasta
              It is working fine here
              Best wishes,
              Rahul
              Attached Files
              Rahul Sharma,
              Ph.D
              Frankfurt am Main, Germany

              Comment

              • acyrocks
                Junior Member
                • Sep 2012
                • 3

                #8
                Thanks so much, Rahul.
                I tried your code and it works for most parts. However, it skip the sequences in file1 which are not identical to any sequences in the file2. I hope these sequences are still include in out file and with zero printed on the third column. I think this could be easy fix for you. Thanks again.

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 11:08 AM
                0 responses
                6 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                11 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                53 views
                0 reactions
                Last Post SEQadmin2  
                Working...