SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How do you download a FASTA sequence from NCBI Nucleotide onto a remote server? ehlin Bioinformatics 5 12-10-2018 11:34 AM
Best extraction method for miRNA wolfypita General 5 10-05-2012 11:22 AM
Automatic Sequence data extraction? tgup Bioinformatics 5 04-21-2011 11:26 PM
Bias toward G in first nucleotide in sequence? sem Sample Prep / Library Generation 0 01-16-2009 12:54 PM
templiphi dna extraction seqgirl123 General 0 10-26-2008 07:50 PM

Reply
 
Thread Tools
Old 05-16-2012, 03:26 AM   #1
struggler
Member
 
Location: UK

Join Date: May 2012
Posts: 12
Default nucleotide sequence extraction

I wish to extract part of a sequence from a particular sequence/scaffold ID like 437 to 959 bases from a 3 Mb scaffold.

I am more familiar with grep and used it before for like:
grep -A 1 scaffoldID sequencefasta.fa > saveoutput.fa

but don't know how to extract a particular part of the sequence.

Could anyone help me with this please.

S
struggler is offline   Reply With Quote
Old 05-16-2012, 04:02 AM   #2
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

Have a look at Galaxy.

Alternatively you can use Biopieces like this:

Code:
read_fasta -i input.fasta |
grab -p scaffoldID -k SEQ_NAME |
extract_seq -b 437 -e 959 |
write_fasta -o output.fasta -x

Martin
maasha is offline   Reply With Quote
Old 05-16-2012, 04:19 AM   #3
nexgengirl
Member
 
Location: Maryland

Join Date: Apr 2010
Posts: 31
Default

You could also use bedtools (code.google.com/p/bedtools/). I've used this tool to extract sub-sequence data before and I really like it because its fast and efficient.

The tool in bedtools is called fastaFromBed (Creates FASTA sequences based on intervals in a BED/GFF/VCF file) and can extract sub-regions of a fasta by specifying those regions in a bed file.

The manual is present here: http://code.google.com/p/bedtools/do...-Manual.v3.pdf

Example of the command from the mannual

fastaFromBed [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output
FASTA>
nexgengirl is offline   Reply With Quote
Old 05-16-2012, 07:22 AM   #4
struggler
Member
 
Location: UK

Join Date: May 2012
Posts: 12
Default

Thanks Maasha and NextGenGirl,
I could not install these tools in my system. Scaffold name and sequence ID name are same. Could you please suggest solution from perl (like grep) only? I am using biolinux.
Regards,
S
struggler is offline   Reply With Quote
Old 05-16-2012, 09:40 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,083
Default

I assume you are perhaps missing a compiler (gcc)/libraries when you say that you could not install these tools.

Are you using a "live" image of biolinux to temporarily boot into a unix environment or are you using someone else's biolinux machine?


Quote:
Originally Posted by struggler View Post
I could not install these tools in my system. Scaffold name and sequence ID name are same. Could you please suggest solution from perl (like grep) only? I am using biolinux.
Regards,
S
GenoMax is offline   Reply With Quote
Old 05-16-2012, 09:56 AM   #6
struggler
Member
 
Location: UK

Join Date: May 2012
Posts: 12
Default

Thanks for your message.

This is on my own machine through VMWare. I guess I can install these using SUDO command. Instead of 'could not' it is more like I was afraid or sceptical to install these tools as if anything goes messy then I don't have much knowhow to correct it. So, I don't want to play with my standard installation.
Regards,
S
struggler is offline   Reply With Quote
Old 05-16-2012, 10:03 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,083
Default

Give it a try. This is something you need to learn if you are planning to keep using *nix in some form.

I doubt that you can cause major damage by installing bedtools ... but if you did manage to do that then perhaps you should not be using *nix in the first place

I have not used VMWare lately. Are there any tools that allow you to make a backup of the image so just in case something does go wrong you can revert back to the old image.

Quote:
Originally Posted by struggler View Post
Thanks for your message.

This is on my own machine through VMWare. I guess I can install these using SUDO command. Instead of 'could not' it is more like I was afraid or sceptical to install these tools as if anything goes messy then I don't have much knowhow to correct it. So, I don't want to play with my standard installation.
Regards,
S

Last edited by GenoMax; 05-16-2012 at 10:06 AM.
GenoMax is offline   Reply With Quote
Old 05-16-2012, 10:18 AM   #8
struggler
Member
 
Location: UK

Join Date: May 2012
Posts: 12
Default

Although my username tells my status of knowledge but with your encouragement I shall give it a try sometime later.
Regards,
S
struggler is offline   Reply With Quote
Old 05-16-2012, 04:57 PM   #9
nexgengirl
Member
 
Location: Maryland

Join Date: Apr 2010
Posts: 31
Default

Hi struggler,

I agree with GenoMax. Try and install these tools. Otherwise, if you are concerned about that maasha's suggestion of Galaxy is also good. They have a tool there under Fetch sequences called "Extract Genomic DNA" and that is the tool I used to use before I learned how to use unix.
nexgengirl is offline   Reply With Quote
Old 05-16-2012, 06:45 PM   #10
twaddlac
Member
 
Location: Pittsburgh, PA

Join Date: Feb 2011
Posts: 49
Default

EMBOSS (http://emboss.sourceforge.net/) is probably the most useful package for basic sequence manipulation/analysis.

Note that in order to utilize stdin/stdout you need to call the '-filter' flag and the '-auto' flag disables the parameter prompting. Their manual on the website is very informative.

I hope this helps!
twaddlac is offline   Reply With Quote
Old 05-16-2012, 07:22 PM   #11
nexgenboy
Junior Member
 
Location: USA

Join Date: May 2012
Posts: 1
Default

@struggler .. try this

#fasta file: pa101.fasta
Quote:
>gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP
#script: sequence_extractor.sh
Quote:
#!/bin/bash

# The 1 based sequence extractor - sequence_extractor.sh
# No guarantees offered.

# usage:
# 1) download the script or copy the contents
# of the script and save it as sequence_extractor.sh
# 2) make it executable: chmod 755 sequence_extractor.sh
# reads from standard input or command line
# 3) run the script: ./sequence_extractor.sh ps101.fasta 4 6

# create a backup copy of the input fasta file
# and delete the header
sed -i.tmp -e '1d' $1 || exit $?

# merge the lines
temp_var1=`awk '{printf $0;}' $1` || exit $?

# select the region
temp_var2=$(((($3-1)-($2-1))+1)) || exit $?

# display the extracted sequence
echo ${temp_var1:$(($2-1)):$temp_var2} && mv $1.tmp $1 || exit $?
nexgenboy is offline   Reply With Quote
Old 05-18-2012, 06:08 AM   #12
Mark
Member
 
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 51
Default

From the ncbi toolkit, formatdb and fastacmd works nicely

first format your sequence file

formatdb -i <fasta sequence file> -p F -o T


This creates a blastable sequence db (a useful bonus). The "o" flag makes it searchable by fastacmd

then

fastacmd -d <fasta sequence file> -o <output file name> -p F -s <ID of record you want to retrieve> -L <start position,end_position>

Fastacmd can also retrieve many records at once. See the documentation.
Mark is offline   Reply With Quote
Old 05-18-2012, 10:14 AM   #13
struggler
Member
 
Location: UK

Join Date: May 2012
Posts: 12
Default

Dear Mark,
Many many thanks! The fastacmd command worked like a bullet!!

I am also thankful to all others for their helpful suggestions.

Regards,
S
struggler is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO