SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Counting the total number of SNPs in a transcript dena.dinesh Bioinformatics 0 08-11-2015 06:47 AM
Wrong number of mapped reads in TopHat2 output Mchicken Bioinformatics 0 04-28-2015 02:36 AM
Counting number of reads mapped to an allele jyu429 General 5 10-18-2014 11:43 AM
Wrong exon number information from tophat-fusion angerusso RNA Sequencing 0 02-07-2014 11:46 AM
Question on counting number of reads per gene gen2prot Bioinformatics 3 06-25-2010 11:45 AM

Reply
 
Thread Tools
Old 01-26-2018, 06:37 PM   #1
Jackpd96
Junior Member
 
Location: UK

Join Date: Jan 2018
Posts: 1
Default Counting the number of paralogues for mouse genes gives me the wrong frequency

I am trying to count the number of paralogues for the mouse homologues of the human protein-coding genes using BioMart. But for example in the 'PLIN4' gene its counting 35,000 paralogues instead of 4.

We think it is because some genes have one to many paralogues which causes repeats. When I run a single gene its gives me back the correct number of paralogues. Is there a way to either remove these repeats from the results or a way around this so that BioMart doesn't output these repeats.

I have also thought of maybe running one gene at a time, then counting it by setting up some sort of loop so that it does all of the genes from the list automatically.

The code I have written so far is:


Code:
# Load the biomaRt package:

library(biomaRt)
ensembl_hsapiens <- useMart("ensembl", 
                          dataset = "hsapiens_gene_ensembl")
ensembl_mouse <- useMart("ensembl", 
                       dataset = "mmusculus_gene_ensembl")

# Get all human protein coding genes:

hsapien_PC_genes <- getBM(attributes = c("ensembl_gene_id", 
                                         "external_gene_name"), 
                          filters = "biotype", 
                          values = "protein_coding", 
                          mart = ensembl_hsapiens)


ensembl_gene_ID <- hsapien_PC_genes$ensembl_gene_id

# Get mouse homologues

mouse_homologues <- getBM(attributes = c("ensembl_gene_id", "external_gene_name", 
                                       "mmusculus_homolog_associated_gene_name"), 
                        filters = "ensembl_gene_id", 
                        values = c(ensembl_gene_ID), 
                        mart = ensembl_hsapiens)

# Get mouse external gene name 

mouse_homologues_external_gene_names <- mouse_homologues$mmusculus_homolog_associated_gene_name


mouse_paralogues <- getBM(attributes = c("hsapiens_homolog_associated_gene_name",
                                       "external_gene_name",
                                       "mmusculus_paralog_associated_gene_name"), 
                        filters = "external_gene_name", 
                        values = c(mouse_homologues_external_gene_names) , mart = ensembl_mouse)

# Remove genes with no paralogues 
mouse_paralogs_data <- mouse_paralogues[!is.na(mouse_paralogues$mmusculus_paralog_associated_gene_name)
                                          | mouse_paralogues$mmusculus_paralog_associated_gene_name==""), ]

# Count paralogues per gene

library(plyr)
count_mouse_paralogues <- count(mouse_paralogs_data, "external_gene_name")
View(count_mouse_paralogues)
Jackpd96 is offline   Reply With Quote
Reply

Tags
bioinformactics, biomart

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO