![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How do you download a FASTA sequence from NCBI Nucleotide onto a remote server? | ehlin | Bioinformatics | 5 | 12-10-2018 11:34 AM |
Best extraction method for miRNA | wolfypita | General | 5 | 10-05-2012 11:22 AM |
Automatic Sequence data extraction? | tgup | Bioinformatics | 5 | 04-21-2011 11:26 PM |
Bias toward G in first nucleotide in sequence? | sem | Sample Prep / Library Generation | 0 | 01-16-2009 12:54 PM |
templiphi dna extraction | seqgirl123 | General | 0 | 10-26-2008 07:50 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: UK Join Date: May 2012
Posts: 12
|
![]()
I wish to extract part of a sequence from a particular sequence/scaffold ID like 437 to 959 bases from a 3 Mb scaffold.
I am more familiar with grep and used it before for like: grep -A 1 scaffoldID sequencefasta.fa > saveoutput.fa but don't know how to extract a particular part of the sequence. Could anyone help me with this please. S |
![]() |
![]() |
![]() |
#3 |
Member
Location: Maryland Join Date: Apr 2010
Posts: 31
|
![]()
You could also use bedtools (code.google.com/p/bedtools/). I've used this tool to extract sub-sequence data before and I really like it because its fast and efficient.
The tool in bedtools is called fastaFromBed (Creates FASTA sequences based on intervals in a BED/GFF/VCF file) and can extract sub-regions of a fasta by specifying those regions in a bed file. The manual is present here: http://code.google.com/p/bedtools/do...-Manual.v3.pdf Example of the command from the mannual fastaFromBed [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output FASTA> |
![]() |
![]() |
![]() |
#4 |
Member
Location: UK Join Date: May 2012
Posts: 12
|
![]()
Thanks Maasha and NextGenGirl,
I could not install these tools in my system. Scaffold name and sequence ID name are same. Could you please suggest solution from perl (like grep) only? I am using biolinux. Regards, S |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]()
I assume you are perhaps missing a compiler (gcc)/libraries when you say that you could not install these tools.
Are you using a "live" image of biolinux to temporarily boot into a unix environment or are you using someone else's biolinux machine? |
![]() |
![]() |
![]() |
#6 |
Member
Location: UK Join Date: May 2012
Posts: 12
|
![]()
Thanks for your message.
This is on my own machine through VMWare. I guess I can install these using SUDO command. Instead of 'could not' it is more like I was afraid or sceptical to install these tools as if anything goes messy then I don't have much knowhow to correct it. So, I don't want to play with my standard installation. Regards, S |
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]()
Give it a try. This is something you need to learn if you are planning to keep using *nix in some form.
I doubt that you can cause major damage by installing bedtools ... but if you did manage to do that then perhaps you should not be using *nix in the first place ![]() I have not used VMWare lately. Are there any tools that allow you to make a backup of the image so just in case something does go wrong you can revert back to the old image. Quote:
Last edited by GenoMax; 05-16-2012 at 10:06 AM. |
|
![]() |
![]() |
![]() |
#8 |
Member
Location: UK Join Date: May 2012
Posts: 12
|
![]()
Although my username tells my status of knowledge but with your encouragement I shall give it a try sometime later.
Regards, S |
![]() |
![]() |
![]() |
#9 |
Member
Location: Maryland Join Date: Apr 2010
Posts: 31
|
![]()
Hi struggler,
I agree with GenoMax. Try and install these tools. Otherwise, if you are concerned about that maasha's suggestion of Galaxy is also good. They have a tool there under Fetch sequences called "Extract Genomic DNA" and that is the tool I used to use before I learned how to use unix. |
![]() |
![]() |
![]() |
#10 |
Member
Location: Pittsburgh, PA Join Date: Feb 2011
Posts: 49
|
![]()
EMBOSS (http://emboss.sourceforge.net/) is probably the most useful package for basic sequence manipulation/analysis.
Note that in order to utilize stdin/stdout you need to call the '-filter' flag and the '-auto' flag disables the parameter prompting. Their manual on the website is very informative. I hope this helps! |
![]() |
![]() |
![]() |
#11 | ||
Junior Member
Location: USA Join Date: May 2012
Posts: 1
|
![]()
@struggler .. try this
#fasta file: pa101.fasta Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#12 |
Member
Location: Raleigh, NC Join Date: Nov 2008
Posts: 51
|
![]()
From the ncbi toolkit, formatdb and fastacmd works nicely
first format your sequence file formatdb -i <fasta sequence file> -p F -o T This creates a blastable sequence db (a useful bonus). The "o" flag makes it searchable by fastacmd then fastacmd -d <fasta sequence file> -o <output file name> -p F -s <ID of record you want to retrieve> -L <start position,end_position> Fastacmd can also retrieve many records at once. See the documentation. |
![]() |
![]() |
![]() |
#13 |
Member
Location: UK Join Date: May 2012
Posts: 12
|
![]()
Dear Mark,
Many many thanks! The fastacmd command worked like a bullet!! I am also thankful to all others for their helpful suggestions. Regards, S |
![]() |
![]() |
![]() |
Thread Tools | |
|
|