Seqanswers Leaderboard Ad

**maasha** · 05-16-2012, 03:02 AM

Have a look at Galaxy.

Alternatively you can use Biopieces like this:

Code:

read_fasta -i input.fasta |
grab -p scaffoldID -k SEQ_NAME |
extract_seq -b 437 -e 959 |
write_fasta -o output.fasta -x

Martin

**nexgengirl** · 05-16-2012, 03:19 AM

You could also use bedtools (code.google.com/p/bedtools/). I've used this tool to extract sub-sequence data before and I really like it because its fast and efficient.

The tool in bedtools is called fastaFromBed (Creates FASTA sequences based on intervals in a BED/GFF/VCF file) and can extract sub-regions of a fasta by specifying those regions in a bed file.

The manual is present here: http://code.google.com/p/bedtools/do...-Manual.v3.pdf

Example of the command from the mannual

fastaFromBed [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output
FASTA>

**struggler** · 05-16-2012, 06:22 AM

Thanks Maasha and NextGenGirl,
I could not install these tools in my system. Scaffold name and sequence ID name are same. Could you please suggest solution from perl (like grep) only? I am using biolinux.
Regards,
S

**GenoMax** · 05-16-2012, 08:40 AM

I assume you are perhaps missing a compiler (gcc)/libraries when you say that you could not install these tools.

Are you using a "live" image of biolinux to temporarily boot into a unix environment or are you using someone else's biolinux machine?

Originally posted by struggler View Post

I could not install these tools in my system. Scaffold name and sequence ID name are same. Could you please suggest solution from perl (like grep) only? I am using biolinux.
Regards,
S

**struggler** · 05-16-2012, 08:56 AM

Thanks for your message.

This is on my own machine through VMWare. I guess I can install these using SUDO command. Instead of 'could not' it is more like I was afraid or sceptical to install these tools as if anything goes messy then I don't have much knowhow to correct it. So, I don't want to play with my standard installation.
Regards,
S

**GenoMax** · 05-16-2012, 09:03 AM

Give it a try. This is something you need to learn if you are planning to keep using *nix in some form.

I doubt that you can cause major damage by installing bedtools ... but if you did manage to do that then perhaps you should not be using *nix in the first place

I have not used VMWare lately. Are there any tools that allow you to make a backup of the image so just in case something does go wrong you can revert back to the old image.

Originally posted by struggler View Post

Thanks for your message.

This is on my own machine through VMWare. I guess I can install these using SUDO command. Instead of 'could not' it is more like I was afraid or sceptical to install these tools as if anything goes messy then I don't have much knowhow to correct it. So, I don't want to play with my standard installation.
Regards,
S

**struggler** · 05-16-2012, 09:18 AM

Although my username tells my status of knowledge but with your encouragement I shall give it a try sometime later.
Regards,
S

**nexgengirl** · 05-16-2012, 03:57 PM

Hi struggler,

I agree with GenoMax. Try and install these tools. Otherwise, if you are concerned about that maasha's suggestion of Galaxy is also good. They have a tool there under Fetch sequences called "Extract Genomic DNA" and that is the tool I used to use before I learned how to use unix.

**twaddlac** · 05-16-2012, 05:45 PM

EMBOSS (http://emboss.sourceforge.net/) is probably the most useful package for basic sequence manipulation/analysis.

Note that in order to utilize stdin/stdout you need to call the '-filter' flag and the '-auto' flag disables the parameter prompting. Their manual on the website is very informative.

I hope this helps!

**nexgenboy** · 05-16-2012, 06:22 PM

@struggler .. try this

#fasta file: pa101.fasta

>gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP

#script: sequence_extractor.sh

#!/bin/bash

# The 1 based sequence extractor - sequence_extractor.sh
# No guarantees offered.

# usage:
# 1) download the script or copy the contents
# of the script and save it as sequence_extractor.sh
# 2) make it executable: chmod 755 sequence_extractor.sh
# reads from standard input or command line
# 3) run the script: ./sequence_extractor.sh ps101.fasta 4 6

# create a backup copy of the input fasta file
# and delete the header
sed -i.tmp -e '1d' $1 || exit $?

# merge the lines
temp_var1=`awk '{printf $0;}' $1` || exit $?

# select the region
temp_var2=$(((($3-1)-($2-1))+1)) || exit $?

# display the extracted sequence
echo ${temp_var1:$(($2-1)):$temp_var2} && mv $1.tmp $1 || exit $?

**Mark** · 05-18-2012, 05:08 AM

From the ncbi toolkit, formatdb and fastacmd works nicely

first format your sequence file

formatdb -i <fasta sequence file> -p F -o T

This creates a blastable sequence db (a useful bonus). The "o" flag makes it searchable by fastacmd

then

fastacmd -d <fasta sequence file> -o <output file name> -p F -s <ID of record you want to retrieve> -L <start position,end_position>

Fastacmd can also retrieve many records at once. See the documentation.

**struggler** · 05-18-2012, 09:14 AM

Dear Mark,
Many many thanks! The fastacmd command worked like a bullet!!

I am also thankful to all others for their helpful suggestions.

Regards,
S

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

nucleotide sequence extraction

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News