Seqanswers Leaderboard Ad

**tomc** · 04-07-2012, 11:34 PM

blast knows nothing of chromosomes so it they are in a blast report it is because you happened to have a blast database constructed from fasta file with Chr info in the their defline ( the line above the sequence that begins with ">" )
I think your choices are going to be either,
extract it from the report
(which can be messy as there are also many standards for writing deflines as well)

or find/make blast databases that are already per chromosome then any hits are
to the chromosome you are blasting.

**logicthief** · 04-09-2012, 08:56 AM

Tomc, thank you. Making such a database seems too challenging for me....

**logicthief** · 04-11-2012, 05:00 PM

Update.
just got the answer from a nice NCBI staff:
choose NCBI genome(chromosome) as the database for web megablast, specify the organism (here I used mouse genome), then you get a lot NC_'s (and NT's NW's etc.) in the hit table, the NC is complete chromosome, NC_000067 stands for chr1, NC_000085 for chr19, NC_000086/87 for X/Y (I've left off the current .version number for these accessions). Besides, you can get the coordinates of alignment on corresponding chromosome by looking at column 9 and 10. For hit table format, click here(http://www.ornl.gov/sci/techresource...me/blast.shtml)

.

Originally posted by logicthief View Post

Hi everyone,

I am a beginner in bioinformatics, could anyone tell me how to extract the chromosome number of each BLAST hit for a bunch of query sequences? I looked at the "hit table" but only found the start and end loci of each hit, and I knew there's chr info in standard BLAST report (txt format), it's like below, hard to manipulate on a large scale:

Query=
Length=59

Score E
Sequences producing significant alignments: (Bits) Value

ref|NT_039716.7| Mus musculus strain C57BL/6J chromosome X ge... 93.5 2e-17
ref|NW_001035178.1| Mus musculus strain mixed chromosome X ge... 93.5 2e-17

ALIGNMENTS
>ref|NT_039716.7| Mus musculus strain C57BL/6J chromosome X genomic contig, MGSCv37
C57BL/6J
Length=15097629

Score = 93.5 bits (50), Expect = 2e-17
Identities = 57/60 (95%), Gaps = 1/60 (2%)
Strand=Plus/Plus

Query 1 TACC-CTGTAGGGTTGATAAGCTTATGTTCACTATAACAATTAACACATTTGCCATTGAC 59
|||| ||||| | |||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 1486960 TACCTCTGTAAGATTGATAAGCTTATGTTCACTATAACAATTAACACATTTGCCATTGAC 1487019

>ref|NW_001035178.1| Mus musculus strain mixed chromosome X genomic scaffold, alternate
assembly Mm_Celera 232000009784844, whole genome shotgun
sequence
Length=8237495

Features flanking this part of subject sequence:
323432 bp at 5' side: uncharacterized protein LOC211208
337262 bp at 3' side: uncharacterized protein LOC73934

Score = 93.5 bits (50), Expect = 2e-17
Identities = 57/60 (95%), Gaps = 1/60 (2%)
Strand=Plus/Plus

Query 1 TACC-CTGTAGGGTTGATAAGCTTATGTTCACTATAACAATTAACACATTTGCCATTGAC 59
|||| ||||| | |||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 577313 TACCTCTGTAAGATTGATAAGCTTATGTTCACTATAACAATTAACACATTTGCCATTGAC 577372
I am wondering whether the ASN file would contain such information (it seems not human readable?). If not, the only way I can think of is to extract chr no. from the standard report by perl or grep (another problem is I don't know how to write perl scripts)... Thanks a lot!

**Growlywolf** · 04-11-2012, 10:23 PM

use the module Bio::SearchIO of Bioperl

404 Not Found

http://bioperl.open-bio.org/wiki/HOWTO:SearchIO

**logicthief** · 04-12-2012, 06:34 PM

Originally posted by Growlywolf View Post

use the module Bio::SearchIO of Bioperl

http://bioperl.open-bio.org/wiki/HOWTO:SearchIO

Thanks, growlywolf. It seems very powerful (although not very straightforward for my purpose), I will try it later.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Get chromosome number from BLAST results

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News