Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Annotating a genome based on CNV calls.

    I have a bed file of read locations that looks like

    chr1 193244 193246
    chr2 293244 293246
    etc.

    I want to identify where they lie on hg19 (what gene, intronic or exonic etc.) How do I automate this? The data spans a number of chr's, but I would still like a visual UCSC like output, as well as a txt one if possible.
    Last edited by pepsimax; 06-15-2012, 09:40 AM.

  • #2
    Hi

    Did you find a solution to this problem. I am having the same at the moment and would like to know.

    Thanks

    Uma

    Comment


    • #3
      VarioWatch (http://genepipe.ncgm.sinica.edu.tw/variowatch/main.do) provides visual output up to one thousand variants online in real time. It also provides text output for millions of variants. However, it does not take the bed file format. You can try it if the bed file format can be converted.

      Comment


      • #4
        I can recommend an easy way to cover CNV overlap with human genes.

        I - Create an initial genetrack.RefSeq.GRCh37.txt file
        Go to the UCSC genome table browser:


        There are many output options, here are the changes that you'll need to make:
        clade: Mammal
        genome: Human
        assembly: ''choose the appropriate assembly for the reference you're using''
        group: Genes abd Gene Prediction Tracks
        track: RefSeq Genes
        table: refGene
        region: ''choose the genome option''

        Choose the output filename:
        genetrack.RefSeq.GRCh37.txt

        Click the get output button.

        You now have your initial RefSeq file, which will not be sorted, and will contain non-standard contigs (contigs other than the standard 1-22,X,Y,MT)



        II - Remove non-standard contigs and sorting the file in karyotypic order:
        Create the extract.tcl. This file looks like so:

        #!/usr/bin/env tclsh

        # Remove contigs other than the standard 1-22,X,Y,MT
        # and sort the file in karyotypic order.

        proc ContentFromFile {{Fichier ""}} {
        if {[string equal $Fichier ""]} {return ""}
        set f [open $Fichier r]
        set Texte [read -nonewline $f]
        close $f
        return $Texte
        }

        proc LinesFromFile {{Fichier ""}} {
        return [split [ContentFromFile $Fichier] "\n"]
        }

        proc WriteTextInFile {texte fichier} {
        set fifi [open $fichier a]
        puts $fifi $texte
        close $fifi
        return 1
        }

        proc IncreasingSortOnElement4 {X Y {N 4}} {
        return [expr {[lindex $X $N]>[lindex $Y $N]}]
        }

        ## Checking and displaying parameter
        set geneFile [lindex $argv 0]
        if {![file exists $geneFile]} {
        puts "$geneFile doesn't exist. Exit"
        exit
        }

        ## Defining output file
        regsub ".txt" $geneFile "" outputFile
        set outputFile "$outputFile.sorted.txt"
        file delete -force $outputFile
        puts "...creation of $outputFile"

        foreach L [LinesFromFile $geneFile] {
        set Ls [split $L "\t"]
        if {[regexp "^#" $L]} {
        WriteTextInFile $L $outputFile
        set i_chr [lsearch -exact $L "chrom" ]; if {$i_chr == -1} {puts "Bad header line syntax. chrom column not found - Exit"; exit}
        continue
        }

        regsub -all " " [lindex $Ls $i_chr] "" chrom
        lappend linelist($chrom) "$L"
        }

        foreach val {1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M MT} {
        if {![info exists linelist(chr$val)]} {continue}
        WriteTextInFile [join [lsort -command IncreasingSortOnElement4 $linelist(chr$val)] "\n"] $outputFile
        }


        Then run the extract.tcl file:
        extract.tcl genetrack.RefSeq.GRCh37.txt
        -> create genetrack.RefSeq.GRCh37.sorted.txt


        III - Create the genetrack.RefSeq.GRCh37.sorted.bed file:
        cat genetrack.RefSeq.GRCh37.sorted.txt | awk -F"\t" '{print $3"\t"$5"\t"$6"\t"$13"\t"}' > genetrack.RefSeq.GRCh37.sorted.bed


        IV - Annotate your CNV bed file:
        Sample CNV bed file = CNVsample.bed
        chr7 5952473 5978460

        Using Bedtools, run the intersection like so:
        intersectBed -a CNVsample.bed -b genetrack.RefSeq.GRCh37.sorted.bed -wb > CNVsample.annotated

        chr7 5952473 5965603 chr7 5938340 5965603 CCZ1
        chr7 5965776 5978460 chr7 5965776 6010314 RSPH10B
        chr7 5965776 5978460 chr7 5965776 6010314 RSPH10B2



        Of course, you can use Bedtools to run the intersection with other bed files!

        Comment


        • #5
          AnnotSV: An integrated tool for Structural Variations annotation

          Hi,

          I'm annotating my CNV/SV human events with the AnnotSV tool.
          PMID: 29669011 DOI: 10.1093/bioinformatics/bty304

          It associates a complete panel of different datasets to provide high quality structural variations (SV) / CNV annotation :
          - Gene annotations
          - Promoters annotations
          - DGV Gold Standard annotations
          - DECIPHER gene annotations
          - 1000 genomes annotations
          - GC content annotations
          - Repeated sequences annotations
          - TAD annotations
          - OMIM annotations
          - Gene intolerance annotations
          - Haploinsufficiency annotations
          - Homozygous and heterozygous SNV/indel annotations
          - ...

          AnnotSV starts by detecting the genomic overlaps between the input and the annotation features.

          Moreover, interesting information, this tool constructs an annotation based on the full-length SV but also an annotation for each gene within the SV.

          Really easy to install and to use!

          Input format: VCF or BED

          Else, if you have CNV calls from different CNV callers, I advise you (before to annotate) to identify/merge the common CNV detected by your different callers. For that, I would consider CNV that share a 70% reciprocal overlap measured by length and position (> 70% shared length) (as done in DGV).
          Last edited by lgmSeq; 07-09-2018, 10:21 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Innovations in Spatial Biology
            by seqadmin


            Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

            3D Genomics
            While spatial biology often involves studying proteins and RNAs in their...
            Yesterday, 07:30 PM
          • seqadmin
            Advancing Precision Medicine for Rare Diseases in Children
            by seqadmin




            Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
            12-16-2024, 07:57 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 12-30-2024, 01:35 PM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-17-2024, 10:28 AM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-13-2024, 08:24 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-12-2024, 07:41 AM
          0 responses
          40 views
          0 likes
          Last Post seqadmin  
          Working...
          X