SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Different starting position in Genbank blast result? (http://seqanswers.com/forums/showthread.php?t=9201)

khunny7 02-01-2011 04:41 PM

Different starting position in Genbank blast result?
 
I am new to this field and this probably a very ignorant question.
However, when I did a blast search with following sequence, "TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG" blast returned me this following result. As seen in the result, matched sequences are in same chromosome yet seemed to have different indices. Thus thinking those are repeated regions, I was trying to locate those. However, I could find only one not even two. My assumption is that those numbers are not really indices. Can anyone help me to understand this problem? Thank you.


>ref|NT_167247.1| Homo sapiens chromosome 6 genomic contig, GRCh37.p2 reference
assembly alternate locus group ALT_REF_LOCI_5
Length=4833398


Features in this part of subject sequence:
large proline-rich protein BAT2

Score = 108 bits (58), Expect = 7e-22
Identities = 58/58 (100%), Gaps = 0/58 (0%)
Strand=Plus/Plus

Query 1 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 58
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 2970579 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 2970636


>ref|NT_167245.1| Homo sapiens chromosome 6 genomic contig, GRCh37.p2 reference
assembly alternate locus group ALT_REF_LOCI_3
Length=4610396


Features in this part of subject sequence:
large proline-rich protein BAT2

Score = 108 bits (58), Expect = 7e-22
Identities = 58/58 (100%), Gaps = 0/58 (0%)
Strand=Plus/Plus

Query 1 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 58
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 2876461 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 2876518


>ref|NT_113891.2| Homo sapiens chromosome 6 genomic contig, GRCh37.p2 reference
assembly alternate locus group ALT_REF_LOCI_2
Length=4795371


Features in this part of subject sequence:
large proline-rich protein BAT2

Score = 102 bits (55), Expect = 3e-20
Identities = 57/58 (99%), Gaps = 0/58 (0%)
Strand=Plus/Plus

Query 1 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 58
|||||||||||| |||||||||||||||||||||||||||||||||||||||||||||
Sbjct 3100500 TGTCTTTGGACACGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 3100557

seb567 02-02-2011 12:30 PM

Quote:

Originally Posted by khunny7 (Post 34031)
I am new to this field and this probably a very ignorant question.
However, when I did a blast search with following sequence, "TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG" blast returned me this following result. As seen in the result, matched sequences are in same chromosome yet seemed to have different indices. Thus thinking those are repeated regions, I was trying to locate those. However, I could find only one not even two. My assumption is that those numbers are not really indices. Can anyone help me to understand this problem? Thank you.


>ref|NT_167247.1| Homo sapiens chromosome 6 genomic contig, GRCh37.p2 reference
assembly alternate locus group ALT_REF_LOCI_5
Length=4833398


Features in this part of subject sequence:
large proline-rich protein BAT2

Score = 108 bits (58), Expect = 7e-22
Identities = 58/58 (100%), Gaps = 0/58 (0%)
Strand=Plus/Plus

Query 1 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 58
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 2970579 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 2970636


>ref|NT_167245.1| Homo sapiens chromosome 6 genomic contig, GRCh37.p2 reference
assembly alternate locus group ALT_REF_LOCI_3
Length=4610396


Features in this part of subject sequence:
large proline-rich protein BAT2

Score = 108 bits (58), Expect = 7e-22
Identities = 58/58 (100%), Gaps = 0/58 (0%)
Strand=Plus/Plus

Query 1 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 58
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 2876461 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 2876518


>ref|NT_113891.2| Homo sapiens chromosome 6 genomic contig, GRCh37.p2 reference
assembly alternate locus group ALT_REF_LOCI_2
Length=4795371


Features in this part of subject sequence:
large proline-rich protein BAT2

Score = 102 bits (55), Expect = 3e-20
Identities = 57/58 (99%), Gaps = 0/58 (0%)
Strand=Plus/Plus

Query 1 TGTCTTTGGACATGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 58
|||||||||||| |||||||||||||||||||||||||||||||||||||||||||||
Sbjct 3100500 TGTCTTTGGACACGTAAGAATTGGAGGAAAATAAATGTGGATTTGGGAAACTTTGAGG 3100557


The numbers you carefully colored in red are chromosome positions.

They are starting positions for the alignments.

-seb

khunny7 02-02-2011 12:45 PM

Quote:

Originally Posted by seb567 (Post 34117)
The numbers you carefully colored in red are chromosome positions.

They are starting positions for the alignments.

-seb

Thanks for you reply, I can see that there are many matches but that sequence is not repetitive over chromosome 6.
I did the local blast against hg19, and I could find only a single match.

seb567 02-02-2011 01:13 PM

Each hit is for an alternate version of the human genome.

ref|NT_167247.1| Homo sapiens chromosome 6 genomic contig, GRCh37.p2 reference
assembly alternate locus group ALT_REF_LOCI_5 Length=4833398

ref|NT_113891.2| Homo sapiens chromosome 6 genomic contig, GRCh37.p2 reference
assembly alternate locus group ALT_REF_LOCI_2 Length=4795371


All times are GMT -8. The time now is 06:55 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.