Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • circos karyotype file

    hello, I'm trying to run Circos for genome comparaison of my bacterial genomes, but i can't understand how does is work, how can I make the karyotype file from my fasta and GBK files? can anyone help me

    thanks

  • #2
    The karyotype file lays out the 'backbone' co-ordinates for your plot. For example, if your bacterial genome has been assembled into a single 3000000 bp contig, the karyotype file could just contain a single line:

    chr - chr1 1 0 3000000 black

    If your assembly contains multiple contigs, then you add another line for each contig and give its size. Then the other plotting options (coverage, etc.) are aligned to these karyotype backbones. For example, to add gene regions to the plot, you would have another file with those particular co-ordinates:

    chr1 1573 3503
    chr1 11322 13754
    chr1 14687 18718
    etc.

    Note that this format is pretty similar to a 'gtf' file. If one of those is available for your genome, you could just cut out the columns that give the contig name, and start and end positions of the gene.

    Also check out the circos tutorials (http://www.circos.ca/documentation/tutorials/) which are very useful.

    Good luck!

    Matt.

    Comment


    • #3
      thank you very much for your reply, but how can i creat a GTF file from fasta or GBK files .. i don't know the start or the end of my contigs ?

      Comment


      • #4
        The start of your contigs will be "0", and the end will be the length of the contig. For example, the karyotype file should look like:

        chr - chr1 1 0 3000000 black
        chr - chr2 2 0 256120 black
        chr - chr3 3 0 230776 black

        Do you know the length of your contigs? This should be available on NCBI or a little python script should do the job.

        If you want to indicate genes in your plot, then the GBK file should have this information. Again I would use python, and in particular biopython (biopython.org), because they have functions to read GBK files and output gene co-ordinates, etc.

        I think to use Circos effectively, knowledge of a scripting language is really useful to get the data into the right format..

        Best,

        Matt.

        Comment


        • #5
          hello Matt,
          thank you for taking time to reply,
          I'm still a bit confuse about circos, i coudn't find anywhere how to insert my sequences file.
          does it use only the karyotype file without any other information about the sequences, if yes how does it built the mapping circuls?

          thanks again

          Comment


          • #6
            Hi Meriem,

            Yes, the backbone of the plot is created using only the karyotype file. Other information, such as gene location, etc., are added using different files. You need a 'config' file, which specifies the location of the karyotype file and other information that you want to include.

            I would highly recommend going to the circos tutorials as provided in the link above, and working through the examples.

            Best,

            Matt.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Today, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            37 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X