Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ethnicity check

    Hello, I have sequenced 20 exomes with the Ion Proton system and have to do an ethnicity check on all samples as a quality control step. I have the reported ethnicities of all the samples.

    Is there anyway I can use the variants from these samples to compare with the variants of 1000 genomes dataset? For instance, can I run a genotype concordance on the variants from my samples and those of a 1000 genomes european/african/asian etc...


    Thanks in advance!

  • #2
    I asked the same question and found the Similarity tool on Gentalk's website helpful for this. You can download from here, and it was easy to use:

    https://gene-talk.de/qc (and see referenced paper in the post below)

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    I was able to assign exomes to 1000 Genomes ethnic groups, and this works well if the exome is from an ethnic group represented in the 1000 Genomes data. The problems arise when the exome is from an ethnic group not represented in the data, e.g. in my case Aboriginal Australian.

    Phillipino, Tongan, Pacific Islanders tend to loosely group with Asians, which I guess seems reasonable. British people group with GBP in most cases, but some group more closely with CEU.
    Last edited by rbagnall; 10-03-2014, 02:54 PM.

    Comment


    • #3
      This is great, thank you very much.
      I am getting an error message: "unable to access similarity.jar"
      I downloaded the QC software and the jar file is present.

      Did you come across this?

      Comment


      • #4
        You need to move to the similarity folder where the jar file is located:

        1. change directory to the similarity folder

        cd path/to/Similarity_05022013

        2. Make a new folder for results, called ethnicity

        mkdir ethnicity

        3. create a vcf file of variants from a single Bamfile, and write it into the ethnicity folder. Call variants in the '20110225.exome.consensus.bed' file that comes with the Similarity tool (I use GATK)

        java -jar /path/to/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 10 -R /path/to/GRCh.37.fasta -I /path/to/bamfile.bam -o ethnicity/file1.vcf -L 20110225.exome.consensus.bed -G StandardAnnotation -stand_emit_conf 10.0 -stand_call_conf 20.0 -dcov 200 -l INFO -rf BadCigar -glm SNP

        4. Run similarity jar on the file1.vcf

        java -Xmx6g -jar similarity.jar -d ethnicity -o file1.txt

        5. Make plot, as per the manuscript

        Rscript --vanilla R/MDS.R ethnicity/file1.txt ethnicity/file1.pdf
        Last edited by rbagnall; 10-04-2014, 03:27 AM.

        Comment


        • #5
          Thanks again, this is very helpful. I already have my vcf files which were called using the torrent variant caller that were created from bam files. So do I need to call the variants again with the provided .bed file using the -L argument?

          Thank you!

          Comment


          • #6
            You could restrict variants in the vcf file to the provided .bed file using Bedtools (intersectBed)

            Comment


            • #7
              Hi again, may I ask what version of java you used to run similarity.jar? I am using 1.7 and I am getting a 'java.lang.NullPointerException'.

              Also, did your VCFs contain the homozygous (0/0) reference calls, or just heteroyzgous variant (0/1) and homozygous variant (1/1)

              Thanks
              Last edited by Rabu; 10-06-2014, 10:36 AM.

              Comment


              • #8
                java version "1.7.0_02"

                My vcf files were single sample, so no 0/0 calls.

                Perhaps show the full command that you write.

                Comment


                • #9
                  Hi,
                  I seemed to get everything to work, I had a small issue in my command line. I ran similarity.jar without first intersecting my VCFs with the consensus.bed file provided and my genotype accuracies are quite low (<0.9999). I imagine that intersecting my VCFs with the consensus.bed improve the genotyping accuracy since the variants that do not match will not be included in the analysis. Is this correct?

                  Thanks again!
                  Last edited by Rabu; 10-08-2014, 10:04 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Investigating the Gut Microbiome Through Diet and Spatial Biology
                    by seqadmin




                    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                    02-24-2025, 06:31 AM
                  • seqadmin
                    Quality Control Essentials for Next-Generation Sequencing Workflows
                    by seqadmin




                    Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                    Nucleic Acid Quality Control
                    Preparing for NGS starts with isolating the...
                    02-10-2025, 01:58 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-03-2025, 01:15 PM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-28-2025, 12:58 PM
                  0 responses
                  167 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-24-2025, 02:48 PM
                  0 responses
                  525 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-21-2025, 02:46 PM
                  0 responses
                  256 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X