Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculation of pan- and core-genome

    Hi,
    I was hoping someone here could point me in the direction of a good tool for calculation of pan and core genomes in prokaryotes! I am looking for one or several tools/scripts that does a number of things:
    I have a bucket full of bacterial genome data (in contigs mostly) of the same species and would like, based on various gropings of these, to determine initially overall pan and core genomes of the isolates.

    Besides getting just the number of genes in each group, it would also be very beneficial to some sort if genes list output for further analysis.

    Finally, I would like to see what difference there is between the calculated core/pan genome in 1 group compared to another defined set of isolates in another group - again not only a number of genes but an actual list of genes or gene sequences.

    The contigs have not been analysed for CDSs or annotated in any way, but this I can do in another pipeline prior to the pan core calculation if needed.

    Thanks!!!

  • #2
    Can I just clarify - do you have a bunch of reads which are labelled with which isolate of the same species they come from, and on the basis of that you want to pull out the pan (everything) and core (shared) genomes?

    Comment


    • #3
      Hi Zam,
      I have assembled the reads into contigs, and they have names_contigID to them indicating the species and specific isolate they come from. And yes, it from them that I would like to extract the information.

      Comment


      • #4
        Well then, one approach is to assemble a "multicoloured" graph of your data (one colour per isolate), and then dump contigs with information about how many isolates share each contig. Then you can split things however you like - pull out the contigs that everyone shares, 95% share, etc. Software for this is here:

        and the paper contain an example of something similar:


        >Finally, I would like to see what difference there is between the calculated core/pan >genome in 1 group compared to another defined set of isolates in another group - >again not only a number of genes but an actual list of genes or gene sequenc

        You can do any comparisons you like between any subsets you like in this manner. Feel free to contact me directly (zam AT well.ox.ac.uk)

        Comment


        • #5
          Good references for how to do the calculations are Kittichotriat W et al, PLoS ONE July 2011 and Tettelin H. et al PNAS 2005 102:13950-13955 if you want to try doing the analysis or scripting out your own tools. There's also Pan Seq that you can try, but I haven't really been able to get it to work all that well for my purposes.

          Comment


          • #6
            Thanks both of you!!

            And Zam, I may take you up on that offer. And congratulations on that paper.

            Comment


            • #7
              At the risk of being accused of shameless self-promotion, I will point out that this is something that Mauve and specifically progressiveMauve has supported for years. Have a look a the .backbone file output (documentation here).

              Comment


              • #8
                Koadman - Good for you! (I'm certainly in no position to criticise self-promotion)
                Stegger - thanks!

                Comment


                • #9
                  Please self-promote all you can, that just allow me to come back with potential questions to the right people

                  Comment


                  • #10
                    Originally posted by Zam View Post
                    Well then, one approach is to assemble a "multicoloured" graph of your data (one colour per isolate), and then dump contigs with information about how many isolates share each contig. Then you can split things however you like - pull out the contigs that everyone shares, 95% share, etc. Software for this is here:

                    and the paper contain an example of something similar:


                    >Finally, I would like to see what difference there is between the calculated core/pan >genome in 1 group compared to another defined set of isolates in another group - >again not only a number of genes but an actual list of genes or gene sequenc

                    You can do any comparisons you like between any subsets you like in this manner. Feel free to contact me directly (zam AT well.ox.ac.uk)
                    I am also trying to do analysis for PAN/CORE genome, but the above mentioned software is for someone who have good hands in linux based system.

                    Is there a simple way where non-bioinformatician can do this kind of analysis ?

                    Cheers !
                    Shashank

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X