Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ymc
    Senior Member
    • Mar 2010
    • 496

    Combine 1000genomes bams to get better coverage?

    Hi all,

    I downloaded the bams from this 1000genomes ftp site:

    ftp://ftp.1000genomes.ebi.ac.uk/vol1...878/alignment/

    I only used the illumina data for my application. I found that the illumina data was about 20x which was not good enough for my application. I noticed that there are also bams from 454 and SoLid. Can I use samtools merge to get a combined bam such that I can get better overall coverage???

    Thanks!

    PS I am not sure if doing this will give me enough coverage even if successful. Does anyone know other places I can download high coverage human fastqs or bams?
  • ymc
    Senior Member
    • Mar 2010
    • 496

    #2
    It seems like Broad Institute has bams for NA12878 at 40x internally. Is this data available to outsiders?

    Comment

    • laura
      Senior Member
      • Sep 2008
      • 151

      #3
      What are you trying to achieve. For variant calling many callers can consider more than one bam at once ?

      Comment

      • ymc
        Senior Member
        • Mar 2010
        • 496

        #4
        Originally posted by laura View Post
        What are you trying to achieve. For variant calling many callers can consider more than one bam at once ?
        I am trying the now unsupported HLA Caller form the GATK package.

        Supposedly you should get the following HLA calls if you use NA12878.bam from Broad and human_b36_both.fasta:
        ===============================================
        Locus A1 A2 Geno Phase Frq1 Frq2 L Prob Reads1 Reads2 Locus EXP White Black Asian
        A 0101 1101 -1229.5 -15.2 -0.82 -0.73 -1244.7 1.00 180 191 229 1.62 -1.99 -3.13 -2.07
        B 0801 5601 -832.3 -37.3 -1.01 -2.15 -872.1 1.00 58 59 100 1.17 -3.31 -4.10 -3.95
        C 0102 0701 -1344.8 -37.5 -0.87 -0.86 -1384.2 1.00 91 139 228 1.01 -2.35 -2.95 -2.31
        DPA1 0103 0201 -842.1 -1.8 -0.12 -0.79 -846.7 1.00 72 48 120 1.00 -0.90 -INF -1.27
        DPB1 0401 1401 -991.5 -18.4 -0.45 -1.55 -1010.7 1.00 64 48 113 0.99 -2.24 -3.14 -2.64
        DQA1 0101 0501 -1077.5 -15.9 -0.90 -0.62 -1095.4 1.00 160 77 247 0.96 -1.53 -1.60 -1.87
        DQB1 0201 0501 -709.6 -18.6 -0.77 -0.76 -729.7 0.95 50 87 137 1.00 -1.76 -1.54 -2.23
        DRB1 0101 0301 -1513.8 -317.3 -1.06 -0.94 -1832.6 1.00 52 32 101 0.83 -1.99 -2.83 -2.34
        ==============================================

        But if I use the aforementioned three bams and human_g1k_v37.fasta with updated HLA_EXONS.intervals, HLA_DICTIONARY.txt and HLA_POLYMORPHIC_SITES.txt, I got

        =============================================
        Locus A1 A2 Geno Phase Frq1 Frq2 L Prob Reads1 Reads2 Locus EXP White Black Asian
        A 0101 1104 -1133.2 -40.7 -0.82 -6.00 -1173.9 1.00 133 138 177 1.53 -6.82 -7.31 -7.34
        B 0820 5601 -1156.2 -43.5 -6.00 -2.15 -1201.4 1.00 62 71 111 1.20 -8.30 -8.70 -8.15
        C 0102 0701 -1718.5 -150.9 -0.87 -0.86 -1871.5 1.00 46 106 155 0.98 -2.35 -2.95 -2.31
        DPA1 0103 0201 -1443.8 -4.8 -0.12 -0.79 -1451.4 1.00 43 19 62 1.00 -0.90 -INF -1.27
        DPB1 0401 1401 -1102.9 -35.2 -0.45 -1.55 -1139.0 1.00 41 9 52 0.96 -2.24 -3.14 -2.64
        DQA1 0105 0501 -1549.3 -26.2 -1.24 -0.62 -1582.4 1.00 145 57 202 1.00 -2.62 -1.94 -2.72
        DQB1 0203 0501 -1266.4 -145.1 -2.05 -0.76 -1413.4 1.00 33 73 127 0.83 -3.68 -2.80 -3.82
        DRB1 0101 0301 -1683.0 -279.3 -1.06 -0.94 -1965.9 0.83 20 41 96 0.64 -1.99 -2.83 -2.34
        DRB1 0120 0301 -1678.8 -279.3 -6.00 -0.94 -1963.3 0.17 20 41 96 0.64 -6.94 -7.15 -7.00
        ========================================

        The result is close but not exactly. I suspect the reason might be the Broad NA12878.bam is 40x but the combined bam I used is about 35x
        Last edited by ymc; 04-22-2012, 10:38 PM.

        Comment

        • glede
          Junior Member
          • Sep 2011
          • 2

          #5
          hi, ymc

          I also try sth. about HLA caller. I want to ask you a question. You say you have updated the file HLA_DICTIONARY.txt. How to get an updated HLA_DICTIONARY.txt? I find all the alleles sequences in the primary HLA_DICTIONARY.txt have the same length, but in the IGMT/HLA database the alleles' lengths are actually different. How to do that?

          Thanks.

          Comment

          • ymc
            Senior Member
            • Mar 2010
            • 496

            #6
            Originally posted by glede View Post
            hi, ymc

            I also try sth. about HLA caller. I want to ask you a question. You say you have updated the file HLA_DICTIONARY.txt. How to get an updated HLA_DICTIONARY.txt? I find all the alleles sequences in the primary HLA_DICTIONARY.txt have the same length, but in the IGMT/HLA database the alleles' lengths are actually different. How to do that?

            Thanks.
            I only updated the positions. I don't know if the allele sequences also need to be updated.

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 07-02-2026, 11:08 AM
            0 responses
            9 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            13 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            20 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            54 views
            0 reactions
            Last Post SEQadmin2  
            Working...