SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using split/merge .bams for Cufflinks DerSeb Bioinformatics 2 03-15-2012 11:42 AM
Somatic calling comparing differnt platform BAMs Mamun Bioinformatics 2 02-07-2012 07:24 AM
Problems with SPP read.bams.tags EdinG Bioinformatics 0 10-12-2011 03:27 AM
1000genomes Accessibility giror Bioinformatics 3 09-06-2011 03:40 AM
TCGA dbgap: where's the bams? Richard Finney Bioinformatics 0 06-13-2011 09:18 AM

Reply
 
Thread Tools
Old 04-18-2012, 03:32 AM   #1
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 495
Default Combine 1000genomes bams to get better coverage?

Hi all,

I downloaded the bams from this 1000genomes ftp site:

ftp://ftp.1000genomes.ebi.ac.uk/vol1...878/alignment/

I only used the illumina data for my application. I found that the illumina data was about 20x which was not good enough for my application. I noticed that there are also bams from 454 and SoLid. Can I use samtools merge to get a combined bam such that I can get better overall coverage???

Thanks!

PS I am not sure if doing this will give me enough coverage even if successful. Does anyone know other places I can download high coverage human fastqs or bams?
ymc is offline   Reply With Quote
Old 04-18-2012, 03:35 AM   #2
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 495
Default

It seems like Broad Institute has bams for NA12878 at 40x internally. Is this data available to outsiders?
ymc is offline   Reply With Quote
Old 04-21-2012, 07:47 AM   #3
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

What are you trying to achieve. For variant calling many callers can consider more than one bam at once ?
laura is offline   Reply With Quote
Old 04-22-2012, 11:36 PM   #4
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 495
Default

Quote:
Originally Posted by laura View Post
What are you trying to achieve. For variant calling many callers can consider more than one bam at once ?
I am trying the now unsupported HLA Caller form the GATK package.

Supposedly you should get the following HLA calls if you use NA12878.bam from Broad and human_b36_both.fasta:
===============================================
Locus A1 A2 Geno Phase Frq1 Frq2 L Prob Reads1 Reads2 Locus EXP White Black Asian
A 0101 1101 -1229.5 -15.2 -0.82 -0.73 -1244.7 1.00 180 191 229 1.62 -1.99 -3.13 -2.07
B 0801 5601 -832.3 -37.3 -1.01 -2.15 -872.1 1.00 58 59 100 1.17 -3.31 -4.10 -3.95
C 0102 0701 -1344.8 -37.5 -0.87 -0.86 -1384.2 1.00 91 139 228 1.01 -2.35 -2.95 -2.31
DPA1 0103 0201 -842.1 -1.8 -0.12 -0.79 -846.7 1.00 72 48 120 1.00 -0.90 -INF -1.27
DPB1 0401 1401 -991.5 -18.4 -0.45 -1.55 -1010.7 1.00 64 48 113 0.99 -2.24 -3.14 -2.64
DQA1 0101 0501 -1077.5 -15.9 -0.90 -0.62 -1095.4 1.00 160 77 247 0.96 -1.53 -1.60 -1.87
DQB1 0201 0501 -709.6 -18.6 -0.77 -0.76 -729.7 0.95 50 87 137 1.00 -1.76 -1.54 -2.23
DRB1 0101 0301 -1513.8 -317.3 -1.06 -0.94 -1832.6 1.00 52 32 101 0.83 -1.99 -2.83 -2.34
==============================================

But if I use the aforementioned three bams and human_g1k_v37.fasta with updated HLA_EXONS.intervals, HLA_DICTIONARY.txt and HLA_POLYMORPHIC_SITES.txt, I got

=============================================
Locus A1 A2 Geno Phase Frq1 Frq2 L Prob Reads1 Reads2 Locus EXP White Black Asian
A 0101 1104 -1133.2 -40.7 -0.82 -6.00 -1173.9 1.00 133 138 177 1.53 -6.82 -7.31 -7.34
B 0820 5601 -1156.2 -43.5 -6.00 -2.15 -1201.4 1.00 62 71 111 1.20 -8.30 -8.70 -8.15
C 0102 0701 -1718.5 -150.9 -0.87 -0.86 -1871.5 1.00 46 106 155 0.98 -2.35 -2.95 -2.31
DPA1 0103 0201 -1443.8 -4.8 -0.12 -0.79 -1451.4 1.00 43 19 62 1.00 -0.90 -INF -1.27
DPB1 0401 1401 -1102.9 -35.2 -0.45 -1.55 -1139.0 1.00 41 9 52 0.96 -2.24 -3.14 -2.64
DQA1 0105 0501 -1549.3 -26.2 -1.24 -0.62 -1582.4 1.00 145 57 202 1.00 -2.62 -1.94 -2.72
DQB1 0203 0501 -1266.4 -145.1 -2.05 -0.76 -1413.4 1.00 33 73 127 0.83 -3.68 -2.80 -3.82
DRB1 0101 0301 -1683.0 -279.3 -1.06 -0.94 -1965.9 0.83 20 41 96 0.64 -1.99 -2.83 -2.34
DRB1 0120 0301 -1678.8 -279.3 -6.00 -0.94 -1963.3 0.17 20 41 96 0.64 -6.94 -7.15 -7.00
========================================

The result is close but not exactly. I suspect the reason might be the Broad NA12878.bam is 40x but the combined bam I used is about 35x

Last edited by ymc; 04-22-2012 at 11:38 PM.
ymc is offline   Reply With Quote
Old 04-28-2012, 03:50 AM   #5
glede
Junior Member
 
Location: Shanghai

Join Date: Sep 2011
Posts: 2
Default

hi, ymc

I also try sth. about HLA caller. I want to ask you a question. You say you have updated the file HLA_DICTIONARY.txt. How to get an updated HLA_DICTIONARY.txt? I find all the alleles sequences in the primary HLA_DICTIONARY.txt have the same length, but in the IGMT/HLA database the alleles' lengths are actually different. How to do that?

Thanks.
glede is offline   Reply With Quote
Old 04-29-2012, 08:01 PM   #6
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 495
Default

Quote:
Originally Posted by glede View Post
hi, ymc

I also try sth. about HLA caller. I want to ask you a question. You say you have updated the file HLA_DICTIONARY.txt. How to get an updated HLA_DICTIONARY.txt? I find all the alleles sequences in the primary HLA_DICTIONARY.txt have the same length, but in the IGMT/HLA database the alleles' lengths are actually different. How to do that?

Thanks.
I only updated the positions. I don't know if the allele sequences also need to be updated.
ymc is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO