SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can we sequence the Y Chromosome KerryOdair Personalized Genomics 109 06-09-2017 10:03 PM
Exon Sequencing of Human Chromosome X: Im in serious trouble frossit Genomic Resequencing 2 10-13-2011 06:07 AM
Whats the M in chromosome 3? aleferna Bioinformatics 2 08-03-2011 05:23 PM
different chromosome for snp ardmore General 0 07-27-2011 06:50 AM
BWA building index of full human (ensembl) fails inijman Bioinformatics 4 12-23-2009 06:00 AM

Reply
 
Thread Tools
Old 02-15-2011, 01:26 PM   #1
rdu
Member
 
Location: USA

Join Date: Aug 2010
Posts: 29
Default human chromosome index

Hi,

I got an alignment output, for each read which has a human chromosome index and its location. I'm trying to convert these records to corresponding gene ids.

Now, the question is in the output chromosome index ranges from number 1 to number 25; but the reference table I downloaded from ensemble website
list:

"7" "17" "9" "6" "20" "5"
"14" "3" "2" "4" "22" "16"
"15" "18" "1" "12" "Y" "X"
"19" "11" "8" "10" "c6_QBL" "NT_113958"
"NT_113871" "13" "NT_113935" "21" "NT_113930" "NT_113888"
"NT_113924" "c6_COX" "NT_113932" "NT_113898" "NT_113954" "MT"
"NT_113926" "NT_113933" "NT_113880" "NT_113886" "NT_113925" "NT_113936"
"NT_113951" "NT_113965" "NT_113944" "NT_113923" "NT_113931" "NT_113870"
"NT_113899" "NT_113901" "NT_113956" "NT_113934" "NT_113915" "NT_113964"
"c5_H2" "NT_113946" "NT_113957" "NT_113916" "NT_113929" "NT_113874"
"NT_113890" "NT_113949" "NT_113884" "NT_113878" "NT_113917" "NT_113906"
"NT_113960" "NT_113911" "NT_113963" "NT_113872" "NT_113881" "NT_113912"
"NT_113910" "NT_113903" "NT_113953" "NT_113937" "NT_113889" "NT_113909"
"NT_113927" "NT_113902" "NT_113885" "NT_113961" "NT_113962" "NT_113908"
"NT_113943" "NT_113966" "NT_113939"

Anyone knows which should match which? Thanks
rdu is offline   Reply With Quote
Old 02-15-2011, 02:17 PM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

The 1-22,x,y,m are straightforward: our favorite chromosomes that we learned in school.

The other stuff is one of two things:

Tthe NT_s are chunks of genome that the jigsaw puzzle folks piecing together the genome reference can't quite place. Typically the best they can do is know that it's part of a specific chromosome but where on the chromosome ... they don't know. Type in the NT_strings into entrez at ncbi to find out more. A typical name for the NT_ contig is "Homo sapiens chromosome 4 unlocalized genomic contig, GRCh37.p2 reference primary assembly".

The c* entries are alternative segments of a chromosome, i.e. replacement parts of the jigsaw puzzle: both pieces fit a slot but to be consistent many folks just use the pieces from the original box(chr1-22,x,y,m). You can typically ignore these.
Richard Finney is offline   Reply With Quote
Old 02-15-2011, 02:38 PM   #3
rdu
Member
 
Location: USA

Join Date: Aug 2010
Posts: 29
Default

Thank you very much!

1:22 - 1:22
23 - X
24 - Y
25 - MT

Is it correct by what you meant?
rdu is offline   Reply With Quote
Old 02-15-2011, 03:34 PM   #4
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Not exactly.
Note that there is no 23,24,25 in your data.
There is an X , Y and MT for x chromosome, y chromosome and Mitochondria. Many folks come up with their own indexing using 23,24 and 25 for chrX, chrY and chrM in other situations. There is no standard.

Last edited by Richard Finney; 02-15-2011 at 03:37 PM.
Richard Finney is offline   Reply With Quote
Old 02-15-2011, 07:10 PM   #5
rdu
Member
 
Location: USA

Join Date: Aug 2010
Posts: 29
Default

Thanks again. It's really helpful.
rdu is offline   Reply With Quote
Old 02-16-2011, 12:27 AM   #6
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

You don't say where you got your output which contained the indices but you should probably go and look at their documentation to see how to do the translation.

In at least one aligner we tried we found that the output used these kinds of indices but that the chromosomes were arranged alphabetically, ie:

1
10
11
12
....
2
3
4
..
MT
X
Y

You'll probably find out pretty quickly if you're getting it wrong as you'll start to see locations off the end of the chromosome, but I'd certainly be happier if I had a definitive answer rather than guessing.
simonandrews is offline   Reply With Quote
Old 02-16-2011, 01:46 PM   #7
rdu
Member
 
Location: USA

Join Date: Aug 2010
Posts: 29
Default

My output is generated by AB WT Pipeline with .max format.

I tried by 1:22 - 1:22, 23 - X, 24 - Y, 25 - MT and got 755 reads annotated by gene ids from "X", 57 from "Y", and 27 from "MT". Is it a suspicious result?

Appreciate!
rdu is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:39 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO