Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine 1000genomes bams to get better coverage?

    Hi all,

    I downloaded the bams from this 1000genomes ftp site:

    ftp://ftp.1000genomes.ebi.ac.uk/vol1...878/alignment/

    I only used the illumina data for my application. I found that the illumina data was about 20x which was not good enough for my application. I noticed that there are also bams from 454 and SoLid. Can I use samtools merge to get a combined bam such that I can get better overall coverage???

    Thanks!

    PS I am not sure if doing this will give me enough coverage even if successful. Does anyone know other places I can download high coverage human fastqs or bams?

  • #2
    It seems like Broad Institute has bams for NA12878 at 40x internally. Is this data available to outsiders?

    Comment


    • #3
      What are you trying to achieve. For variant calling many callers can consider more than one bam at once ?

      Comment


      • #4
        Originally posted by laura View Post
        What are you trying to achieve. For variant calling many callers can consider more than one bam at once ?
        I am trying the now unsupported HLA Caller form the GATK package.

        Supposedly you should get the following HLA calls if you use NA12878.bam from Broad and human_b36_both.fasta:
        ===============================================
        Locus A1 A2 Geno Phase Frq1 Frq2 L Prob Reads1 Reads2 Locus EXP White Black Asian
        A 0101 1101 -1229.5 -15.2 -0.82 -0.73 -1244.7 1.00 180 191 229 1.62 -1.99 -3.13 -2.07
        B 0801 5601 -832.3 -37.3 -1.01 -2.15 -872.1 1.00 58 59 100 1.17 -3.31 -4.10 -3.95
        C 0102 0701 -1344.8 -37.5 -0.87 -0.86 -1384.2 1.00 91 139 228 1.01 -2.35 -2.95 -2.31
        DPA1 0103 0201 -842.1 -1.8 -0.12 -0.79 -846.7 1.00 72 48 120 1.00 -0.90 -INF -1.27
        DPB1 0401 1401 -991.5 -18.4 -0.45 -1.55 -1010.7 1.00 64 48 113 0.99 -2.24 -3.14 -2.64
        DQA1 0101 0501 -1077.5 -15.9 -0.90 -0.62 -1095.4 1.00 160 77 247 0.96 -1.53 -1.60 -1.87
        DQB1 0201 0501 -709.6 -18.6 -0.77 -0.76 -729.7 0.95 50 87 137 1.00 -1.76 -1.54 -2.23
        DRB1 0101 0301 -1513.8 -317.3 -1.06 -0.94 -1832.6 1.00 52 32 101 0.83 -1.99 -2.83 -2.34
        ==============================================

        But if I use the aforementioned three bams and human_g1k_v37.fasta with updated HLA_EXONS.intervals, HLA_DICTIONARY.txt and HLA_POLYMORPHIC_SITES.txt, I got

        =============================================
        Locus A1 A2 Geno Phase Frq1 Frq2 L Prob Reads1 Reads2 Locus EXP White Black Asian
        A 0101 1104 -1133.2 -40.7 -0.82 -6.00 -1173.9 1.00 133 138 177 1.53 -6.82 -7.31 -7.34
        B 0820 5601 -1156.2 -43.5 -6.00 -2.15 -1201.4 1.00 62 71 111 1.20 -8.30 -8.70 -8.15
        C 0102 0701 -1718.5 -150.9 -0.87 -0.86 -1871.5 1.00 46 106 155 0.98 -2.35 -2.95 -2.31
        DPA1 0103 0201 -1443.8 -4.8 -0.12 -0.79 -1451.4 1.00 43 19 62 1.00 -0.90 -INF -1.27
        DPB1 0401 1401 -1102.9 -35.2 -0.45 -1.55 -1139.0 1.00 41 9 52 0.96 -2.24 -3.14 -2.64
        DQA1 0105 0501 -1549.3 -26.2 -1.24 -0.62 -1582.4 1.00 145 57 202 1.00 -2.62 -1.94 -2.72
        DQB1 0203 0501 -1266.4 -145.1 -2.05 -0.76 -1413.4 1.00 33 73 127 0.83 -3.68 -2.80 -3.82
        DRB1 0101 0301 -1683.0 -279.3 -1.06 -0.94 -1965.9 0.83 20 41 96 0.64 -1.99 -2.83 -2.34
        DRB1 0120 0301 -1678.8 -279.3 -6.00 -0.94 -1963.3 0.17 20 41 96 0.64 -6.94 -7.15 -7.00
        ========================================

        The result is close but not exactly. I suspect the reason might be the Broad NA12878.bam is 40x but the combined bam I used is about 35x
        Last edited by ymc; 04-22-2012, 10:38 PM.

        Comment


        • #5
          hi, ymc

          I also try sth. about HLA caller. I want to ask you a question. You say you have updated the file HLA_DICTIONARY.txt. How to get an updated HLA_DICTIONARY.txt? I find all the alleles sequences in the primary HLA_DICTIONARY.txt have the same length, but in the IGMT/HLA database the alleles' lengths are actually different. How to do that?

          Thanks.

          Comment


          • #6
            Originally posted by glede View Post
            hi, ymc

            I also try sth. about HLA caller. I want to ask you a question. You say you have updated the file HLA_DICTIONARY.txt. How to get an updated HLA_DICTIONARY.txt? I find all the alleles sequences in the primary HLA_DICTIONARY.txt have the same length, but in the IGMT/HLA database the alleles' lengths are actually different. How to do that?

            Thanks.
            I only updated the positions. I don't know if the allele sequences also need to be updated.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            34 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            72 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            82 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X