Seqanswers Leaderboard Ad

**kmcarr** · 08-08-2011, 12:18 PM

Originally posted by mghita View Post

I have added the program to my path and I set the permission right, but now I have another issue:
"You need the Rosetta software to run faSomeRecords. The Rosetta installer is in Optional Installs on your Mac OS X installation disc."

and I don't have Rosetta installed, or the CD for installation, so I don't know how to handle this problem. Any suggestions?

Thanks,
Madalina

Originally posted by GenoMax View Post

Madalina,

If you are connected to the internet you should automatically be offered the option to download rosetta and install it.

Do you have a PowerPC- or an intel-based Mac? What OS are you running?

Originally posted by mghita View Post

I have Mac OS X 10.6.8, 3.06 GHz. I just get that message in bash, I don't get any install option. I tried to download it, but it doesn't work.

Madalina,

Your Mac has an Intel CPU but the version of faSomeRecords which you are trying to run is compiled for PowerPC based Macs. You could try to intall Rosetta (Rosetta is a compatibility layer which allows PPC code to run on Intel Macs) but the easier course of action would be to install a proper version of the binary for your computer.

If you go back to the download site (http://hgdownload.cse.ucsc.edu/admin/exe/) you will see that there are two directories for macOSX software, one for PowerPC (macOSX.ppc) and one for Intel (macOSX.i386). Make sure to download and install the program from the macOSX.i386 directory.

**mghita** · 08-09-2011, 12:28 AM

Hi,

Yes, that seems to work, but the command itself doesn't. The reads in my fasta file (file.fas) are named @Frag_1, @Frag_2 .... @Frag_20000. I want to extract some of them - I have their names in a text file (diff.txt) saved like this

@Frag_93
@Frag_530
@Frag_2183
@Frag_3988
@Frag_7733

I used:

faSomeRecord file.fas diff.txt output.fas

and output.fas is empty. Any idea why this happens?

Thanks
Madalina

**GenoMax** · 08-09-2011, 04:34 AM

Originally posted by mghita View Post

Hi,

Yes, that seems to work, but the command itself doesn't. The reads in my fasta file (file.fas) are named @Frag_1, @Frag_2 .... @Frag_20000. I want to extract some of them - I have their names in a text file (diff.txt) saved like this

@Frag_93
@Frag_530
@Frag_2183
@Frag_3988
@Frag_7733

I used:

faSomeRecord file.fas diff.txt output.fas

and output.fas is empty. Any idea why this happens?

Thanks
Madalina

NOTE: Please use new names for the files as shown below on the command lines. This would preserve your original files as they are.

Madalina,

The program is expecting the fasta identifiers to start with ">" rather than "@". You can do the replacement with a program called "sed" that should be there in MacOS (do not have a Mac handy to check that out).

Do this on the command line (note single quotes):

sed 's/@/>/g' original_fasta_file > new_file.fas

The "new_file.fas" should have all "@" replaced by ">".

Remember you need fasta id's (without the ">") in the file you supply for extraction. You can use the same "sed" program to strip the "@" signs from your fasta identifiers like this,

sed 's/@//g' diff.txt new_diff.txt

Now you can use the two new files you created to get the output.

faSomeRecord new_file.fas new_diff.txt output.fas

**mghita** · 08-09-2011, 04:58 AM

I have given up. I replaced the @ with > and still didn't work. I have combined a little awk and R and does my job just fine. Thanks a lot for the effort!

Madalina

**scopak** · 08-22-2011, 01:22 PM

krobison, I too like Perl one-liners.

In the example below, sed bookends are used to add and remove blank lines for the regex search.

sed 's/^>.*/\n&/' <in.fasta | perl -e ' while(<>){ print if(/^>chr1/.../^\n/); }' | sed '/^$/d' >patterns.fasta

Sed is used to add a blank line above each fasta record beginning with '>.*' in the file in.fasta. The stdout is then piped to a Perl range finder that searches for lines that begin with >chr1 and all sequence lines to the next blank line (^\n).
Finally, blank lines are removed with sed and the matching records are saved to the outfile, patterns.fasta.

Hope that helps

**julianeishida** · 03-12-2012, 11:51 PM

Thanks.

I didn`t know about Biopieces. It is really useful. Highly recommended for those whose programing ability is low

**swaraj** · 03-13-2012, 02:21 AM

A quick way to do in bioperl

Site not found

http://biostar.stackexchange.com/questions/1196/extracting-sequence-from-a-3gb-fasta-file

We make Stack Overflow and 170+ other community-powered Q&A sites.

**pjyoti** · 05-06-2012, 08:40 PM

hello everyone...

I am using the following perl script for retrieving sequences in fasta format.....

use Bio::Perl;
$database="genbank";
$format="fasta";
$pipe ="\\|";
$space = " ";
open(INPUTFILE, "<1.txt");
while(<INPUTFILE>)
{
my($line) = $_;
chomp($line);
$line=~ s/$space/:/;
$line=~ s/$pipe/$space/;
$line=~ s/g/G/;
$line=~ s/i/I/;
$id= "$line";
#print "$id";
#print "\n";
$sequence = get_sequence($database, $id);
$test = write_sequence( ">>sequences_1.txt", $format, $sequence);
open (CHK , ">>checking.txt");
print CHK <<HERE;
$test
HERE
close CHK;
}
exit;

after getting some sequences i am getting an error messege....

-----------Exception-------------
MSG: WebDBSeqI Request Error:
HTTP/1.1 502 Bad Gateway
connection: close
Date:
.
.
.
.
.
.
<?xml version="1.0" encoding="ISO-8859-1"?

The proxy server received an invalid response from an upstream server.

plz help me out...

**pjyoti** · 05-06-2012, 08:47 PM

hello everyone...

I am using the following perl script for retrieving sequences in fasta format.....

use Bio::Perl;
$database="genbank";
$format="fasta";
$pipe ="\\|";
$space = " ";
open(INPUTFILE, "<1.txt");
while(<INPUTFILE>)
{
my($line) = $_;
chomp($line);
$line=~ s/$space/:/;
$line=~ s/$pipe/$space/;
$line=~ s/g/G/;
$line=~ s/i/I/;
$id= "$line";
#print "$id";
#print "\n";
$sequence = get_sequence($database, $id);
$test = write_sequence( ">>sequences_1.txt", $format, $sequence);
open (CHK , ">>checking.txt");
print CHK <<HERE;
$test
HERE
close CHK;
}
exit;

after getting some sequences i am getting an error messege....

-----------Exception-------------
MSG: WebDBSeqI Request Error:
HTTP/1.1 502 Bad Gateway
connection: close
Date:
.
.
.
.
.
.
<?xml version="1.0" encoding="ISO-8859-1"?
<!DOCTYPE html PUBLIC "-//W#C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="htttp://www.org/1999/xhtml" lang="en" xm:lang="en"
<head>
<title>Bad Gateway!</title> <link rev="made" href="mailto:[email protected]"/>

The proxy server received an invalid response from an upstream server.

plz help me out...

**vivek** · 07-12-2012, 03:54 AM

Dear ......,

I follow the same steps but it is not working ...

Vivek

Originally posted by apc2010 View Post

If you need sequences extracted from a multi-FASTA and are open to using a pre-existing tool, I would also suggest either the faSomeRecords or faOneRecord command line utilities from UCSC.

They have versions of this tool for OSX and Linux. Here is a link to the executable downloads:

Index of /admin/exe

http://hgdownload.cse.ucsc.edu/admin/exe/

The difference between the two: faOneRecord takes the sequence name to extract from the command line, faSomeRecords reads in a file of 1 or more sequence names to extract from the multi-FASTA.

Usage:

Code:

================================================================
========   faOneRecord   ====================================
================================================================
faOneRecord - Extract a single record from a .FA file
usage:
   faOneRecord in.fa recordName

================================================================
========   faSomeRecords   ====================================
================================================================
faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.

**yzzhang** · 01-31-2013, 01:27 PM

don't contain > in the file list, the script faSomeRecords can work well.

Originally posted by mghita View Post

I have given up. I replaced the @ with > and still didn't work. I have combined a little awk and R and does my job just fine. Thanks a lot for the effort!

Madalina

**ML1975** · 12-05-2017, 10:51 PM

Originally posted by boetsie View Post

Hi,

I've attached a script which can do this. If i understand it correctly you have a file like;

>chr1
AGCTGATGATAGT...
>chr2
ACAAAATAGTCGAT....
>chr3
....

And your perl script would be something like;

perl extractSequence.pl genomefile.fa chr1

where 'chr1' corresponds to a sequence named chr1 (indicated by chr1)?

Say you have a more complicated file like;

>chr1_coverage1000_length100
AGATGTATGTTAGA

You can do something like;

perl extractSequence.pl genomefile.fa chr1_.

which will extract all the sequences containing the header chr1_

To store the results, do;

perl extractSequence.pl genomefile.fa chr1 > filename.txt

If this is what you want, you can use my script.

Boetsie

7 years later and I have used your script - thanks for sharing

Works a treat!

**kausikmhg** · 04-23-2020, 12:32 PM

Originally posted by boetsie View Post

Hi,

I've attached a script which can do this. If i understand it correctly you have a file like;

>chr1
AGCTGATGATAGT...
>chr2
ACAAAATAGTCGAT....
>chr3
....

And your perl script would be something like;

perl extractSequence.pl genomefile.fa chr1

where 'chr1' corresponds to a sequence named chr1 (indicated by chr1)?

Say you have a more complicated file like;

>chr1_coverage1000_length100
AGATGTATGTTAGA

You can do something like;

perl extractSequence.pl genomefile.fa chr1_.

which will extract all the sequences containing the header chr1_

To store the results, do;

perl extractSequence.pl genomefile.fa chr1 > filename.txt

If this is what you want, you can use my script.

Boetsie

Hello,

Can you please tell me how can I fetch multiple identifiers like chr1 chr2 chr3 chr5 etc putting them into a single file using your script? I believe this script doesn't take a file with several identifiers and when i tried it showed me a black file output instead.

Thank a lot if you can help

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News