Hi,
I'm trying to use Blastn to blast an entire Unigene file (only cds) in order to get gene names for my reference sequence. I then want to attach these gene names to the FASTA sequence and use this reference for a cufflinks run. Hopefully, this would give me a good idea of the identities of differentially expressed genes, without having to blast them individually.
Could anyone help me with blastn to ensure that only the top hit is returned, and that this hit contains the full source name?
I'm currently using the following command...
$ blastn -db trop -query blast.fa -outfmt 7 -max_target_seqs 1 -out results.out
...and get results like this:
BLASTN 2.2.23+
# Query: gnl|UG|Xl#S25665548 Xenopus laevis fascin, mRNA (cDNA clone MGC:114829 IMAGE:4970584), complete cds /cds=p(36,1490) /gb=BC097600 /gi=67678339 /ug=Xl.151 /len=2703
# Database: trop
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 2 hits found
gnl|UG|Xl#S25665548 gi|45360982|ref|NM_203797.1|_Xenopus_(Silurana)_tropicalis_fascin_homolog_1,_actin-bundling_protein_(fscn1),_mRNA 89.17 1672 159 12 9 1677 18 1670 0.0 2065
gnl|UG|Xl#S25665548 gi|45360982|ref|NM_203797.1|_Xenopus_(Silurana)_tropicalis_fascin_homolog_1,_actin-bundling_protein_(fscn1),_mRNA 83.02 907 92 42 16492527 1675 2547 0.0 765
Whereas I just want:
BLASTN 2.2.23+
# Query: gnl|UG|Xl#S25665548 Xenopus laevis fascin, mRNA (cDNA clone MGC:114829 IMAGE:4970584), complete cds /cds=p(36,1490) /gb=BC097600 /gi=67678339 /ug=Xl.151 /len=2703
# 1 hit found
gnl|UG|Xl#S25665548 gi|45360982|ref|NM_203797.1|_Xenopus_(Silurana)_tropicalis_fascin_homolog_1,_actin-bundling_protein_(fscn1),_mRNA 89.17 1672 159 12 9 1677 18 1670 0.0 2065
Any tips or advice would be hugely appreciated,
Many thanks,
N
I'm trying to use Blastn to blast an entire Unigene file (only cds) in order to get gene names for my reference sequence. I then want to attach these gene names to the FASTA sequence and use this reference for a cufflinks run. Hopefully, this would give me a good idea of the identities of differentially expressed genes, without having to blast them individually.
Could anyone help me with blastn to ensure that only the top hit is returned, and that this hit contains the full source name?
I'm currently using the following command...
$ blastn -db trop -query blast.fa -outfmt 7 -max_target_seqs 1 -out results.out
...and get results like this:
BLASTN 2.2.23+
# Query: gnl|UG|Xl#S25665548 Xenopus laevis fascin, mRNA (cDNA clone MGC:114829 IMAGE:4970584), complete cds /cds=p(36,1490) /gb=BC097600 /gi=67678339 /ug=Xl.151 /len=2703
# Database: trop
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 2 hits found
gnl|UG|Xl#S25665548 gi|45360982|ref|NM_203797.1|_Xenopus_(Silurana)_tropicalis_fascin_homolog_1,_actin-bundling_protein_(fscn1),_mRNA 89.17 1672 159 12 9 1677 18 1670 0.0 2065
gnl|UG|Xl#S25665548 gi|45360982|ref|NM_203797.1|_Xenopus_(Silurana)_tropicalis_fascin_homolog_1,_actin-bundling_protein_(fscn1),_mRNA 83.02 907 92 42 16492527 1675 2547 0.0 765
Whereas I just want:
BLASTN 2.2.23+
# Query: gnl|UG|Xl#S25665548 Xenopus laevis fascin, mRNA (cDNA clone MGC:114829 IMAGE:4970584), complete cds /cds=p(36,1490) /gb=BC097600 /gi=67678339 /ug=Xl.151 /len=2703
# 1 hit found
gnl|UG|Xl#S25665548 gi|45360982|ref|NM_203797.1|_Xenopus_(Silurana)_tropicalis_fascin_homolog_1,_actin-bundling_protein_(fscn1),_mRNA 89.17 1672 159 12 9 1677 18 1670 0.0 2065
Any tips or advice would be hugely appreciated,
Many thanks,
N
Comment