Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by mghita View Post
    I have added the program to my path and I set the permission right, but now I have another issue:
    "You need the Rosetta software to run faSomeRecords. The Rosetta installer is in Optional Installs on your Mac OS X installation disc."

    and I don't have Rosetta installed, or the CD for installation, so I don't know how to handle this problem. Any suggestions?


    Thanks,
    Madalina
    Originally posted by GenoMax View Post
    Madalina,

    If you are connected to the internet you should automatically be offered the option to download rosetta and install it.

    Do you have a PowerPC- or an intel-based Mac? What OS are you running?
    Originally posted by mghita View Post
    I have Mac OS X 10.6.8, 3.06 GHz. I just get that message in bash, I don't get any install option. I tried to download it, but it doesn't work.
    Madalina,

    Your Mac has an Intel CPU but the version of faSomeRecords which you are trying to run is compiled for PowerPC based Macs. You could try to intall Rosetta (Rosetta is a compatibility layer which allows PPC code to run on Intel Macs) but the easier course of action would be to install a proper version of the binary for your computer.

    If you go back to the download site (http://hgdownload.cse.ucsc.edu/admin/exe/) you will see that there are two directories for macOSX software, one for PowerPC (macOSX.ppc) and one for Intel (macOSX.i386). Make sure to download and install the program from the macOSX.i386 directory.

    Comment


    • #17
      Hi,

      Yes, that seems to work, but the command itself doesn't. The reads in my fasta file (file.fas) are named @Frag_1, @Frag_2 .... @Frag_20000. I want to extract some of them - I have their names in a text file (diff.txt) saved like this

      @Frag_93
      @Frag_530
      @Frag_2183
      @Frag_3988
      @Frag_7733

      I used:

      faSomeRecord file.fas diff.txt output.fas

      and output.fas is empty. Any idea why this happens?


      Thanks
      Madalina

      Comment


      • #18
        Originally posted by mghita View Post
        Hi,

        Yes, that seems to work, but the command itself doesn't. The reads in my fasta file (file.fas) are named @Frag_1, @Frag_2 .... @Frag_20000. I want to extract some of them - I have their names in a text file (diff.txt) saved like this

        @Frag_93
        @Frag_530
        @Frag_2183
        @Frag_3988
        @Frag_7733

        I used:

        faSomeRecord file.fas diff.txt output.fas

        and output.fas is empty. Any idea why this happens?


        Thanks
        Madalina
        NOTE: Please use new names for the files as shown below on the command lines. This would preserve your original files as they are.

        Madalina,

        The program is expecting the fasta identifiers to start with ">" rather than "@". You can do the replacement with a program called "sed" that should be there in MacOS (do not have a Mac handy to check that out).

        Do this on the command line (note single quotes):

        sed 's/@/>/g' original_fasta_file > new_file.fas

        The "new_file.fas" should have all "@" replaced by ">".

        Remember you need fasta id's (without the ">") in the file you supply for extraction. You can use the same "sed" program to strip the "@" signs from your fasta identifiers like this,

        sed 's/@//g' diff.txt new_diff.txt

        Now you can use the two new files you created to get the output.

        faSomeRecord new_file.fas new_diff.txt output.fas
        Last edited by GenoMax; 08-09-2011, 04:36 AM. Reason: adding_info_to_clarify

        Comment


        • #19
          I have given up. I replaced the @ with > and still didn't work. I have combined a little awk and R and does my job just fine. Thanks a lot for the effort!

          Madalina

          Comment


          • #20
            krobison, I too like Perl one-liners.

            In the example below, sed bookends are used to add and remove blank lines for the regex search.

            sed 's/^>.*/\n&/' <in.fasta | perl -e ' while(<>){ print if(/^>chr1/.../^\n/); }' | sed '/^$/d' >patterns.fasta

            Sed is used to add a blank line above each fasta record beginning with '>.*' in the file in.fasta. The stdout is then piped to a Perl range finder that searches for lines that begin with >chr1 and all sequence lines to the next blank line (^\n).
            Finally, blank lines are removed with sed and the matching records are saved to the outfile, patterns.fasta.

            Hope that helps

            Comment


            • #21
              Thanks.

              I didn`t know about Biopieces. It is really useful. Highly recommended for those whose programing ability is low

              Comment


              • #22
                A quick way to do in bioperl

                Comment


                • #23
                  hello everyone...

                  I am using the following perl script for retrieving sequences in fasta format.....


                  use Bio::Perl;
                  $database="genbank";
                  $format="fasta";
                  $pipe ="\\|";
                  $space = " ";
                  open(INPUTFILE, "<1.txt");
                  while(<INPUTFILE>)
                  {
                  my($line) = $_;
                  chomp($line);
                  $line=~ s/$space/:/;
                  $line=~ s/$pipe/$space/;
                  $line=~ s/g/G/;
                  $line=~ s/i/I/;
                  $id= "$line";
                  #print "$id";
                  #print "\n";
                  $sequence = get_sequence($database, $id);
                  $test = write_sequence( ">>sequences_1.txt", $format, $sequence);
                  open (CHK , ">>checking.txt");
                  print CHK <<HERE;
                  $test
                  HERE
                  close CHK;
                  }
                  exit;



                  after getting some sequences i am getting an error messege....

                  -----------Exception-------------
                  MSG: WebDBSeqI Request Error:
                  HTTP/1.1 502 Bad Gateway
                  connection: close
                  Date:
                  .
                  .
                  .
                  .
                  .
                  .
                  <?xml version="1.0" encoding="ISO-8859-1"?




                  The proxy server received an invalid response from an upstream server.


                  plz help me out...

                  Comment


                  • #24
                    hello everyone...

                    I am using the following perl script for retrieving sequences in fasta format.....


                    use Bio::Perl;
                    $database="genbank";
                    $format="fasta";
                    $pipe ="\\|";
                    $space = " ";
                    open(INPUTFILE, "<1.txt");
                    while(<INPUTFILE>)
                    {
                    my($line) = $_;
                    chomp($line);
                    $line=~ s/$space/:/;
                    $line=~ s/$pipe/$space/;
                    $line=~ s/g/G/;
                    $line=~ s/i/I/;
                    $id= "$line";
                    #print "$id";
                    #print "\n";
                    $sequence = get_sequence($database, $id);
                    $test = write_sequence( ">>sequences_1.txt", $format, $sequence);
                    open (CHK , ">>checking.txt");
                    print CHK <<HERE;
                    $test
                    HERE
                    close CHK;
                    }
                    exit;



                    after getting some sequences i am getting an error messege....

                    -----------Exception-------------
                    MSG: WebDBSeqI Request Error:
                    HTTP/1.1 502 Bad Gateway
                    connection: close
                    Date:
                    .
                    .
                    .
                    .
                    .
                    .
                    <?xml version="1.0" encoding="ISO-8859-1"?
                    <!DOCTYPE html PUBLIC "-//W#C//DTD XHTML 1.0 Strict//EN"
                    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
                    <html xmlns="htttp://www.org/1999/xhtml" lang="en" xm:lang="en"
                    <head>
                    <title>Bad Gateway!</title> <link rev="made" href="mailto:[email protected]"/>





                    The proxy server received an invalid response from an upstream server.


                    plz help me out...

                    Comment


                    • #25
                      Dear ......,

                      I follow the same steps but it is not working ...

                      Vivek

                      Originally posted by apc2010 View Post
                      If you need sequences extracted from a multi-FASTA and are open to using a pre-existing tool, I would also suggest either the faSomeRecords or faOneRecord command line utilities from UCSC.

                      They have versions of this tool for OSX and Linux. Here is a link to the executable downloads:



                      The difference between the two: faOneRecord takes the sequence name to extract from the command line, faSomeRecords reads in a file of 1 or more sequence names to extract from the multi-FASTA.

                      Usage:
                      Code:
                      ================================================================
                      ========   faOneRecord   ====================================
                      ================================================================
                      faOneRecord - Extract a single record from a .FA file
                      usage:
                         faOneRecord in.fa recordName
                      
                      ================================================================
                      ========   faSomeRecords   ====================================
                      ================================================================
                      faSomeRecords - Extract multiple fa records
                      usage:
                         faSomeRecords in.fa listFile out.fa
                      options:
                         -exclude - output sequences not in the list file.
                      Vivek Keshri

                      Comment


                      • #26
                        don't contain > in the file list, the script faSomeRecords can work well.
                        Originally posted by mghita View Post
                        I have given up. I replaced the @ with > and still didn't work. I have combined a little awk and R and does my job just fine. Thanks a lot for the effort!

                        Madalina

                        Comment


                        • #27
                          Originally posted by boetsie View Post
                          Hi,

                          I've attached a script which can do this. If i understand it correctly you have a file like;

                          >chr1
                          AGCTGATGATAGT...
                          >chr2
                          ACAAAATAGTCGAT....
                          >chr3
                          ....

                          And your perl script would be something like;

                          perl extractSequence.pl genomefile.fa chr1

                          where 'chr1' corresponds to a sequence named chr1 (indicated by chr1)?

                          Say you have a more complicated file like;

                          >chr1_coverage1000_length100
                          AGATGTATGTTAGA

                          You can do something like;

                          perl extractSequence.pl genomefile.fa chr1_.

                          which will extract all the sequences containing the header chr1_

                          To store the results, do;

                          perl extractSequence.pl genomefile.fa chr1 > filename.txt

                          If this is what you want, you can use my script.

                          Boetsie
                          7 years later and I have used your script - thanks for sharing Works a treat!

                          Comment


                          • #28
                            Originally posted by boetsie View Post
                            Hi,

                            I've attached a script which can do this. If i understand it correctly you have a file like;

                            >chr1
                            AGCTGATGATAGT...
                            >chr2
                            ACAAAATAGTCGAT....
                            >chr3
                            ....

                            And your perl script would be something like;

                            perl extractSequence.pl genomefile.fa chr1

                            where 'chr1' corresponds to a sequence named chr1 (indicated by chr1)?

                            Say you have a more complicated file like;

                            >chr1_coverage1000_length100
                            AGATGTATGTTAGA

                            You can do something like;

                            perl extractSequence.pl genomefile.fa chr1_.

                            which will extract all the sequences containing the header chr1_

                            To store the results, do;

                            perl extractSequence.pl genomefile.fa chr1 > filename.txt

                            If this is what you want, you can use my script.

                            Boetsie

                            Hello,

                            Can you please tell me how can I fetch multiple identifiers like chr1 chr2 chr3 chr5 etc putting them into a single file using your script? I believe this script doesn't take a file with several identifiers and when i tried it showed me a black file output instead.

                            Thank a lot if you can help

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            18 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            22 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            17 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X