![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
MAQ Scores & Quantify maq alignments? | AnamikaDarwin | Bioinformatics | 5 | 09-19-2015 08:24 AM |
Extract subset of Fastq sequences based on a list of IDs | pepperoni | Bioinformatics | 36 | 05-06-2013 01:38 AM |
How to extract sequences between adaptors ? | Giorgio C | Bioinformatics | 8 | 07-13-2011 12:10 AM |
who know extract linker/Primer sequences for HTS sequencing? | feng | Bioinformatics | 2 | 10-26-2010 02:19 PM |
Mapping reads with adapter sequences using MAQ | seq_GA | Bioinformatics | 1 | 11-05-2009 10:32 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: california Join Date: Jan 2010
Posts: 22
|
![]()
how to simply extract the sequences of a gene list (~1000) in FASTA format from a sequence database (~400MB) in FASTA format generated by MAQ?
|
![]() |
![]() |
![]() |
#2 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
Are you starting with a ~400MB FASTA file, containing ~1000 sequences, and you just want the list of sequence identifiers ("gene names")?
Try something like this at the Unix command line: grep "^>" my_database.fasta That string "^>" is a regular expression meaning look for any lines starting ("^") with the greater than symbol. |
![]() |
![]() |
![]() |
#3 |
Member
Location: california Join Date: Jan 2010
Posts: 22
|
![]()
Thanks for reply. I actually need the IDs (headers) and sequences in FASTA format.
|
![]() |
![]() |
![]() |
#4 | |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]() Quote:
How is your list of identifiers stored? e.g. a text file with one id per line? I would suggest you write a simple script, e.g. using Perl (perhaps with BioPerl) or Python (perhaps with Biopython), or your preferred script language. http://bioperl.org/wiki/HOWTO:SeqIO http://biopython.org/wiki/SeqIO Or, if you are happier just working at the command line, you can probably do this with EMBOSS seqret. http://emboss.sourceforge.net/apps/r...ps/seqret.html |
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
You can use a couple of the utilities in the BLAST package from NCBI. Take your large FASTA file and create a BLAST database from it using formatdb. Then retrieve just the sequences you want from the BLASTdb using the fastacmd tool.
Code:
%> formatdb -i <your.FASTA.file> -p F -n <your.blast.db> %> fastacmd -d <your.blast.db> -i <your.ID.file> > <output.file> |
![]() |
![]() |
![]() |
Tags |
maq, sequence |
Thread Tools | |
|
|