Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • SDPA_Pet
    Senior Member
    • Apr 2013
    • 222

    Which R package can do this

    Hi, I have dataset of list name of genes and the gene hits. I would like to calculate over representation genes in R.

    I was wondering which R package can do this.

    Can anyone recommend some good R packages for analyzing and plotting metagenomics data (best for microbes).

    Thanks.
  • JackieBadger
    Senior Member
    • Mar 2009
    • 385

    #2
    Bioconductor should do the trick http://www.bioconductor.org

    Comment

    • dariober
      Senior Member
      • May 2010
      • 311

      #3
      Originally posted by SDPA_Pet View Post
      Hi, I have dataset of list name of genes and the gene hits. I would like to calculate over representation genes in R.

      I was wondering which R package can do this.
      I think what you want is to apply the hypergeometric test which in R is implememted in the function phyper

      Code:
      phyper(q, m, n, k, ...)
      
      x, q 	vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.
      m 	the number of white balls in the urn.
      n 	the number of black balls in the urn.
      k 	the number of balls drawn from the urn.
      ...

      Comment

      • SDPA_Pet
        Senior Member
        • Apr 2013
        • 222

        #4
        Originally posted by JackieBadger View Post
        Bioconductor should do the trick http://www.bioconductor.org
        Hi, I checked bioconductor which includes lots of package. Can you tell me which one can help me find over representation genes.

        Also, which package can help me draw a heat map.

        Thank you.

        Comment

        • mikep
          Member
          • Feb 2011
          • 45

          #5
          Originally posted by SDPA_Pet View Post
          Hi, I checked bioconductor which includes lots of package. Can you tell me which one can help me find over representation genes.

          Also, which package can help me draw a heat map.

          Thank you.
          For all those people who find it more convenient to bother you with their question rather than to Google it for themselves.


          However, you're probably better off with MEV

          Comment

          • Simon Anders
            Senior Member
            • Feb 2010
            • 995

            #6
            Maybe, if you explained more about what kind of data you have, you might get more helpful responses. "Metagnomics data" could be anything from a buch of FASTQ files to a list of species.

            Comment

            • SDPA_Pet
              Senior Member
              • Apr 2013
              • 222

              #7
              Hi Simon,

              Sorry about the confused. I have a table generated from metagenomic data. For each sample, I have two columns. One column the name of gene and the other the number of hits. Total, I have 10 samples. I would like to find out which genes are over representative.

              That is it.

              Comment

              • Blahah404
                Member
                • Dec 2011
                • 48

                #8
                In that case, dariober's suggestion of the hypergeometric test is appropriate.

                Comment

                • SDPA_Pet
                  Senior Member
                  • Apr 2013
                  • 222

                  #9
                  Hi Blahah, I am newbie. If I want to do hypergeometric test, which package I should use. Can you give me the R package name. That's all I want to know.

                  Someone just tells me use bio-conduct, but it includes hundreds of packages.

                  Comment

                  • Blahah404
                    Member
                    • Dec 2011
                    • 48

                    #10
                    @SDPA_Pet read @dariober's post above... he tells you the R function is phyper. You don't need a package - it's in the R base installation.

                    Comment

                    • SDPA_Pet
                      Senior Member
                      • Apr 2013
                      • 222

                      #11
                      OK, Thanks.

                      Comment

                      • SDPA_Pet
                        Senior Member
                        • Apr 2013
                        • 222

                        #12
                        BTW, what is the functional level I should do the analysis. I can do COG function level (the lowest) or I can do COG categories (the highest).

                        If I do lowest, there will be thousands of functional genes.

                        Comment

                        • SDPA_Pet
                          Senior Member
                          • Apr 2013
                          • 222

                          #13
                          Hi, I still don't know how to use the find over representative genes via phyper (someone recommends this command) or other R package. I attached a csv file as an example. Can anyone write a R scripts for me with my dataset.

                          In my dataset, the first row are my sample name. The first column is COG category ID. The numbers are gene counts.

                          Thank you.
                          Attached Files

                          Comment

                          • Simon Anders
                            Senior Member
                            • Feb 2010
                            • 995

                            #14
                            Most people here seemed to have jumped to the conclusion that you want to do an enrichment test, and there, in fact, the hypergeometric test (also known as Fisher's exact test) is the customary thing to do, usually with the R function 'fisher.test', which internally calls 'phyper'.

                            I really don't see how this applies here. Please explain your setting again: By number of "hits" in your table, you mean the number of sequencing reads that mapped to this gene, right?

                            Now, what do you mean by "overrepresented"? Are you looking for genes which appear more often in one kind of samples than in the other? (E.g.: You have 5 samples from shallow water, 5 from deep water: Which genes differ in their abundance between these two types?)

                            What kind of samples are we talking about?

                            Comment

                            • SDPA_Pet
                              Senior Member
                              • Apr 2013
                              • 222

                              #15
                              Hi Simon,

                              I am sorry I didn't explain it clearly.

                              The number of "hits" in your table, I mean the number of sequencing reads that mapped to this gene. ( you are right)
                              In the file that I attached, I am interested in the 2nd column (OSP_8 100 Spring Plain). I want to compare the 2nd column to the 3rd and 4th column.

                              "Over-representative": I want to find that which genes in the sample OSP_8 100 Spring Plain are more abundant (or different) than other 2 samples.

                              Do you know how to write the code?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              57 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              201 views
                              0 reactions
                              Last Post seqadmin  
                              Working...