Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Introducing BAM-matcher

    I'd like to introduce BAM-matcher (https://bitbucket.org/sacgf/bam-matcher), a simple tool for determining whether two BAM files contain reads sequenced from the same sample or patient by counting genotype matches at common SNPs.

    We wrote this tool for our sequencing facility to look for mislabelled samples. It checks whether two BAM files came from the same patient/individual by comparing genotypes at sites with global minor allele frequencies ~0.5 (using 1000 Genomes Project data).

    It works best when there are multiple samples from the same patient/individual, and bypasses the need for independent SNP array data.

    The tool is very simple to use and is very fast (~2 minutes per sample pair, but ~1 second with cached data). For genotype calling, it use external variant callers (at the moment supports GATK, Freebayes, or VarScan2).

    Download here: https://bitbucket.org/sacgf/bam-matcher

  • #2
    I have bam files for Whole exome sequencing(100 bp paired-end) and ChIP-seq (36bp single end) from the same sample, can I still use the tool?
    ChIP-seq only has signals where the protein or histone modification is enriched.
    while whole exome is targeted to exon regions.

    for bam files, do you recommend using bwa mem for mapping WES and ChIP-seq?

    Thanks,
    Ming

    Comment


    • #3
      Originally posted by crazyhottommy View Post
      I have bam files for Whole exome sequencing(100 bp paired-end) and ChIP-seq (36bp single end) from the same sample, can I still use the tool?
      ChIP-seq only has signals where the protein or histone modification is enriched.
      while whole exome is targeted to exon regions.

      for bam files, do you recommend using bwa mem for mapping WES and ChIP-seq?

      Thanks,
      Ming
      Hi Ming,

      Thanks for trying out BAM-matcher!

      Theoretically, should be fine with 36-bp reads, however, you may need to use more sites for comparison, as the overlapping coverage between WES and ChIP-seq may be quite small.

      We have successfully compared BAMs from WES and RNAseq, but the number of sites that can be compared is probably around 1/2 to 2/3 for WES-WES comparison.

      As ChIP-seq is DNA-based, I don't think allele-specificity will be a problem, unlike RNA-seq. But again, you may want to be careful of choosing the sites for comparison.

      If you can generate BED files of some high coverage regions for your WES and ChIP data, I can help you in selecting better sites for comparison.

      Re:mapping. we found bwa-mem to be better than bwa-aln for WES data (PE 100bp). However, for shorter reads (36bp ChIP), I believe bwa-aln would be better.

      This may be relevant:
      Summary: BWA-MEM is a new alignment algorithm for aligning sequence reads or long query sequences against a large reference genome such as human. It automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment. The algorithm is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases. For mapping 100bp sequences, BWA-MEM shows better performance than several state-of-art read aligners to date. Availability and implementation: BWA-MEM is implemented as a component of BWA, which is available at http://github.com/lh3/bwa. Contact: [email protected]


      Please let me know if you have any problems with setting up the configuration for BAM-matcher.


      Paul

      Comment


      • #4
        Originally posted by ppsw View Post
        Hi Ming,

        Thanks for trying out BAM-matcher!

        Theoretically, should be fine with 36-bp reads, however, you may need to use more sites for comparison, as the overlapping coverage between WES and ChIP-seq may be quite small.

        We have successfully compared BAMs from WES and RNAseq, but the number of sites that can be compared is probably around 1/2 to 2/3 for WES-WES comparison.

        As ChIP-seq is DNA-based, I don't think allele-specificity will be a problem, unlike RNA-seq. But again, you may want to be careful of choosing the sites for comparison.

        If you can generate BED files of some high coverage regions for your WES and ChIP data, I can help you in selecting better sites for comparison.

        Re:mapping. we found bwa-mem to be better than bwa-aln for WES data (PE 100bp). However, for shorter reads (36bp ChIP), I believe bwa-aln would be better.

        This may be relevant:
        Summary: BWA-MEM is a new alignment algorithm for aligning sequence reads or long query sequences against a large reference genome such as human. It automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment. The algorithm is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases. For mapping 100bp sequences, BWA-MEM shows better performance than several state-of-art read aligners to date. Availability and implementation: BWA-MEM is implemented as a component of BWA, which is available at http://github.com/lh3/bwa. Contact: [email protected]


        Please let me know if you have any problems with setting up the configuration for BAM-matcher.


        Paul
        Thanks Paul for your answer.
        For ChIP-seq, I can call peaks and find the regions with enough depth, DP >15 as suggested in bitbucket page.

        I can also find regions with enough reads for WES. So, I just need to intersect the regions and feed into BAM-matcher? I agree that the sites to compare will be fewer.

        For ChIP-seq, I usually map by bowtie1. bowtie2 is for long reads.
        For WES and WGS, I use BWA MEM.

        Do you think different mappers here will make any difference? (bowite1 for ChIP-seq and BWA-MEM for WES)

        Comment


        • #5
          Originally posted by crazyhottommy View Post
          Thanks Paul for your answer.
          For ChIP-seq, I can call peaks and find the regions with enough depth, DP >15 as suggested in bitbucket page.

          I can also find regions with enough reads for WES. So, I just need to intersect the regions and feed into BAM-matcher? I agree that the sites to compare will be fewer.

          For ChIP-seq, I usually map by bowtie1. bowtie2 is for long reads.
          For WES and WGS, I use BWA MEM.

          Do you think different mappers here will make any difference? (bowite1 for ChIP-seq and BWA-MEM for WES)

          The DP threshold is used to determine whether a site genotype comparison is to be carried out. i.e. a site will be ignored if the coverage in either sample is below that threshold.

          Sometimes we decrease this threshold a bit if the sample coverage is low. The genotype calls will be less reliable, but still can make a reliable determination.

          As the for regions, BAM-matcher doesn't use BED file. It uses a VCF file of pre-determined positions to compare. So if you know which regions are more likely to be highly covered (and common in both WES and ChIPseq), and if the number of sites compared is too low using the supplied VCF files, I can help you extract different sites to compare. The default sites were selected for coding regions.

          if you are interested in how bam-matcher works, I'd be happy to email you the manuscript we've submitted recently.

          As far as I know, we don't see a lot of difference between bwa-aln and bowtie, but I don't know for sure as we haven't done a lot of ChIP-seq.

          Comment


          • #6
            Originally posted by ppsw View Post
            The DP threshold is used to determine whether a site genotype comparison is to be carried out. i.e. a site will be ignored if the coverage in either sample is below that threshold.

            Sometimes we decrease this threshold a bit if the sample coverage is low. The genotype calls will be less reliable, but still can make a reliable determination.

            As the for regions, BAM-matcher doesn't use BED file. It uses a VCF file of pre-determined positions to compare. So if you know which regions are more likely to be highly covered (and common in both WES and ChIPseq), and if the number of sites compared is too low using the supplied VCF files, I can help you extract different sites to compare. The default sites were selected for coding regions.

            if you are interested in how bam-matcher works, I'd be happy to email you the manuscript we've submitted recently.

            As far as I know, we don't see a lot of difference between bwa-aln and bowtie, but I don't know for sure as we haven't done a lot of ChIP-seq.
            Thanks, I would like to read the manuscript. Please send to [email protected]

            I will be testing bam-matcher in the near future, and I will come back to you when I have questions.

            Thanks again,
            Ming

            Comment


            • #7
              pre-print is now available: http://bioinformatics.oxfordjournals...tw239.abstract

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X