I am trying to use the CountOverlaps function from the GenomicRanges Library for my RNAseq PE data (aligned to hg19) but I keep getting an error - here is the code I used:
txdb <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "refGene")
exonRangesList <- exonsBy(txdb, "gene")
> exonRangesList[[1]]
files <- list.files(pattern = ".bam")
aligns <- readBamGappedAlignments(files[1])
> aligns <- readBamGappedAlignments(files[1])
> aligns
GappedAlignments of length 49018719
rname strand cigar qwidth start end width ngap
[1] chr1 - 50M 50 557 606 50 0
[2] chr1 + 50M 50 1177 1226 50 0
[3] chr1 - 50M 50 1187 1236 50 0
[4] chr1 - 50M 50 1203 1252 50 0
[5] chr1 + 50M 50 1207 1256 50 0
[6] chr1 - 50M 50 1336 1385 50 0
[7] chr1 + 50M 50 1337 1386 50 0
[8] chr1 + 50M 50 1337 1386 50 0
[9] chr1 - 50M 50 1447 1496 50 0
... ... ... ... ... ... ... ... ...
[49018711] chrM + 50M 50 16522 16571 50 0
[49018712] chrM + 50M 50 16522 16571 50 0
[49018713] chrM + 50M 50 16522 16571 50 0
[49018714] chrM + 50M 50 16522 16571 50 0
[49018715] chrM + 50M 50 16522 16571 50 0
[49018716] chrM - 50M 50 16522 16571 50 0
[49018717] chrM - 50M 50 16522 16571 50 0
[49018718] chrM - 50M 50 16522 16571 50 0
[49018719] chrM - 50M 50 16522 16571 50 0
seqlengths
chr1 chr2 chr3 ... chrX chrY chrM
247249719 242951149 199501827 ... 154913754 57772954 1657
countsInit50nm <- countOverlaps(exonRangesList, aligns)
Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, minoverlap = minoverlap, :
error in evaluating the argument 'x' in selecting a method for function 'queryHits': Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what = c("sequence", :
sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21 have incompatible seqlengths:
- in 'x': 249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 155270560, 146364022, 141213431, 135534747, 135006516, 133851895, 115169878, 107349540, 102531392, 90354753, 81195210, 78077248, 63025520, 59373566, 59128983, 51304566, 48129895
- in 'y': 247249719, 242951149, 199501827, 191273063, 180857866, 170899992, 158821424, 154913754, 146274826, 140273252, 135374737, 134452384, 132349534, 114142980, 106368585, 100338915, 88827254, 78774742, 76117153, 62435964, 57772954, 63811651, 49691432, 46944323
I then realized that my RNAsequencing was not strand specific and re-attempted this code
txGRanges <- unlist(txRangesList)
names(txGRanges) <- elementMetadata(txGRanges)[,"tx_id"]
strand(txGRanges) <- "*"
txRangesList <- split(txGRanges)
counts <- countOverlaps(txRangesList, aligns)
Warning message:
In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chr13, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl00 [... truncated]
Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, minoverlap = minoverlap, :
error in evaluating the argument 'x' in selecting a method for function 'queryHits': Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what = c("sequence", :
sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21 have incompatible seqlengths:
- in 'x': 249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 155270560, 146364022, 141213431, 135534747, 135006516, 133851895, 107349540, 102531392, 90354753, 81195210, 78077248, 63025520, 59373566, 59128983, 51304566, 48129895
- in 'y': 247249719, 242951149, 199501827, 191273063, 180857866, 170899992, 158821424, 154913754, 146274826, 140273252, 135374737, 134452384, 132349534, 106368585, 100338915, 88827254, 78774742, 76117153, 62435964, 57772954, 63811651, 49691432, 46944323
I would really appreciate some input on what the issue is here as I am new to bioconductor and trying to figure my way around this. Ultimately, I would like to run statistical analyses using the DEseq package.
Thanks!
txdb <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "refGene")
exonRangesList <- exonsBy(txdb, "gene")
> exonRangesList[[1]]
files <- list.files(pattern = ".bam")
aligns <- readBamGappedAlignments(files[1])
> aligns <- readBamGappedAlignments(files[1])
> aligns
GappedAlignments of length 49018719
rname strand cigar qwidth start end width ngap
[1] chr1 - 50M 50 557 606 50 0
[2] chr1 + 50M 50 1177 1226 50 0
[3] chr1 - 50M 50 1187 1236 50 0
[4] chr1 - 50M 50 1203 1252 50 0
[5] chr1 + 50M 50 1207 1256 50 0
[6] chr1 - 50M 50 1336 1385 50 0
[7] chr1 + 50M 50 1337 1386 50 0
[8] chr1 + 50M 50 1337 1386 50 0
[9] chr1 - 50M 50 1447 1496 50 0
... ... ... ... ... ... ... ... ...
[49018711] chrM + 50M 50 16522 16571 50 0
[49018712] chrM + 50M 50 16522 16571 50 0
[49018713] chrM + 50M 50 16522 16571 50 0
[49018714] chrM + 50M 50 16522 16571 50 0
[49018715] chrM + 50M 50 16522 16571 50 0
[49018716] chrM - 50M 50 16522 16571 50 0
[49018717] chrM - 50M 50 16522 16571 50 0
[49018718] chrM - 50M 50 16522 16571 50 0
[49018719] chrM - 50M 50 16522 16571 50 0
seqlengths
chr1 chr2 chr3 ... chrX chrY chrM
247249719 242951149 199501827 ... 154913754 57772954 1657
countsInit50nm <- countOverlaps(exonRangesList, aligns)
Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, minoverlap = minoverlap, :
error in evaluating the argument 'x' in selecting a method for function 'queryHits': Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what = c("sequence", :
sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21 have incompatible seqlengths:
- in 'x': 249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 155270560, 146364022, 141213431, 135534747, 135006516, 133851895, 115169878, 107349540, 102531392, 90354753, 81195210, 78077248, 63025520, 59373566, 59128983, 51304566, 48129895
- in 'y': 247249719, 242951149, 199501827, 191273063, 180857866, 170899992, 158821424, 154913754, 146274826, 140273252, 135374737, 134452384, 132349534, 114142980, 106368585, 100338915, 88827254, 78774742, 76117153, 62435964, 57772954, 63811651, 49691432, 46944323
I then realized that my RNAsequencing was not strand specific and re-attempted this code
txGRanges <- unlist(txRangesList)
names(txGRanges) <- elementMetadata(txGRanges)[,"tx_id"]
strand(txGRanges) <- "*"
txRangesList <- split(txGRanges)
counts <- countOverlaps(txRangesList, aligns)
Warning message:
In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chr13, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl00 [... truncated]
Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, minoverlap = minoverlap, :
error in evaluating the argument 'x' in selecting a method for function 'queryHits': Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what = c("sequence", :
sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21 have incompatible seqlengths:
- in 'x': 249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 155270560, 146364022, 141213431, 135534747, 135006516, 133851895, 107349540, 102531392, 90354753, 81195210, 78077248, 63025520, 59373566, 59128983, 51304566, 48129895
- in 'y': 247249719, 242951149, 199501827, 191273063, 180857866, 170899992, 158821424, 154913754, 146274826, 140273252, 135374737, 134452384, 132349534, 106368585, 100338915, 88827254, 78774742, 76117153, 62435964, 57772954, 63811651, 49691432, 46944323
I would really appreciate some input on what the issue is here as I am new to bioconductor and trying to figure my way around this. Ultimately, I would like to run statistical analyses using the DEseq package.
Thanks!
Comment