Hello everyone,
I'm working with the programming language R and want to use DEXSeq.
First I executed tophat on my reads to map them against a human reference genome.
In R I created a function to load the bam file of tophat:
I call this function for each replicate and this gives me a dataset with raw counts. These raw counts I get with these code:
Then I create a dataframe with all counts:
And a dataframe to describe the design:
And now I stuck...
To create a exonCountset, I want to use this function:
My question is, how do I get the gene IDs of these counts?
And another question I have is whether it is possible to use the tx_name instead of the tx_id.
I'm working with the programming language R and want to use DEXSeq.
First I executed tophat on my reads to map them against a human reference genome.
In R I created a function to load the bam file of tophat:
Code:
getCounts <- function( alignmentName, run, tx ){ fileName <- paste( "/pathToDir/", alignmentName, "/", alignmentName, "run", run, ".merged.bam", sep="" ) #my directory structure alignment <- readBamGappedAlignments( fileName ) newReadNames <- gsub( "([0-9(MT|X|Y)])" , "chr\\1" , rname( alignment ) ) alignment <- GRanges( seqnames = newReadNames, ranges = IRanges( start = start( alignment ), end=end(alignment)), strand=strand(alignment)) alignmentCounts <- suppressWarnings( countOverlaps( tx,alignment ) ) return( alignmentcounts ) }
Code:
txdb <- makeTranscriptDbFromUCSC( genome='hg19', tablename='ensGene' ) tx_by_exon <- transcriptsBy( txdb, 'exon' ) tx_by_gene <- transcriptsBy( txdb, 'gene') pos1_1 <- getCounts( "pos", 1, tx_by_exon) pos1_2 <- getCounts( "pos", 2, tx_by_exon) neg1_1 <- getCounts( "neg", 1, tx_by_exon) neg1_2 <- getCounts( "neg", 2, tx_by_exon)
Code:
allCounts <- data.frame( pos1_1 = pos1_1, pos1_2 = pos1_2, neg1_1 = neg1_1, neg1_2 = neg1_2, )
Code:
design <- data.frame( row.names = colnames(allCounts), condition = c( "pos", "pos", "neg", "neg" ), libType = c( "paired-end", "paired-end", "paired-end", "paired-end"))
To create a exonCountset, I want to use this function:
Code:
data <- newExonCountSet(allCounts, design, ?,rownames(allCounts))
And another question I have is whether it is possible to use the tx_name instead of the tx_id.