Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GAGE pathway analysis (R/Bioconductor): summarizeOverlaps error

    I am trying to use the GAGE pathway analysis tutorial with 3 samples of paired-end RNA-Seq reads (placed in a "tophat_all" directory) that I mapped with TopHat (.bam files have been indexed).

    An error occurred at the read count step:

    Code:
    > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
    > exByGn <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
    > library(GenomicAlignments)
    > fls <- list.files("tophat_all/", pattern="bam$", full.names =T)
    > bamfls <- BamFileList(fls)
    > flag <- scanBamFlag(isSecondaryAlignment=FALSE, isProperPair=TRUE)
    > param <- ScanBamParam(flag=flag)
    > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=FALSE, param=param)
    Error in validObject(.Object) : 
      invalid class "SummarizedExperiment0" object: 'assays' nrow differs from 'mcols' nrow

    When I tried the same with only one sample I get a different error:
    Code:
    > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=FALSE, param=param)
    Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
      'data' must be of a vector type, was 'NULL'
    Could someone explain me the error?
    Does the error come from my reads (potential unpaired reads, different length...)?

    Thanks !

    Notes:
    (1) I am using the last version of R, Biconductor and its packages. Could it come from the newer versions of the packages that would not fit the tutorial anymore?

    Code:
    > sessionInfo()
    R version 3.2.2 (2015-08-14)
    Platform: x86_64-pc-linux-gnu (64-bit)
    
    locale:
    [1] C
    
    attached base packages:
    [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
    [8] methods   base     
    
    other attached packages:
     [1] GenomicAlignments_1.6.1                
     [2] Rsamtools_1.22.0                       
     [3] Biostrings_2.38.0                      
     [4] XVector_0.10.0                         
     [5] SummarizedExperiment_1.0.1             
     [6] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
     [7] GenomicFeatures_1.22.4                 
     [8] AnnotationDbi_1.32.0                   
     [9] Biobase_2.30.0                         
    [10] GenomicRanges_1.22.1                   
    [11] GenomeInfoDb_1.6.1                     
    [12] IRanges_2.4.1                          
    [13] S4Vectors_0.8.2                        
    [14] BiocGenerics_0.16.1                    
    
    loaded via a namespace (and not attached):
     [1] zlibbioc_1.16.0      BiocParallel_1.4.0   tools_3.2.2         
     [4] DBI_0.3.1            lambda.r_1.1.7       futile.logger_1.4.1 
     [7] rtracklayer_1.30.1   futile.options_1.0.0 bitops_1.0-6        
    [10] RCurl_1.95-4.7       biomaRt_2.26.0       RSQLite_1.0.0       
    [13] XML_3.98-1.3
    (2) Here is the traceback after the error occurred:
    Code:
    > traceback()
    26: stop(msg, ": ", errors, domain = NA)
    25: validObject(.Object)
    24: initialize(value, ...)
    23: initialize(value, ...)
    22: new("SummarizedExperiment0", NAMES = names, elementMetadata = rowData, 
            colData = colData, assays = assays, metadata = as.list(metadata))
    21: new_SummarizedExperiment0(from@assays, names(from@rowRanges), 
            mcols(from@rowRanges), from@colData, from@metadata)
    20: asMethod(object)
    19: as(object, superClass)
    18: slot(x, "assays")
    17: is(slot(x, "assays"), "Assays")
    16: .valid.SummarizedExperiment0.assays_current(x)
    15: valid.func(object)
    14: validityMethod(as(object, superClass))
    13: anyStrings(validityMethod(as(object, superClass)))
    12: validObject(.Object)
    11: initialize(value, ...)
    10: initialize(value, ...)
    9: new("RangedSummarizedExperiment", rowRanges = rowRanges, colData = colData, 
           assays = assays, elementMetadata = elementMetadata, metadata = as.list(metadata))
    8: .new_RangedSummarizedExperiment(assays, rowRanges, colData, metadata)
    7: .local(assays, ...)
    6: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features, 
           colData = colData)
    5: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features, 
           colData = colData)
    4: .dispatchBamFiles(features, reads, mode, match.arg(algorithm), 
           ignore.strand, inter.feature = inter.feature, singleEnd = singleEnd, 
           fragments = fragments, param = param, preprocess.reads = preprocess.reads, 
           ...)
    3: .local(features, reads, mode, algorithm, ignore.strand, ...)
    2: summarizeOverlaps(exByGn, bamfls, mode = "Union", ignore.strand = TRUE, 
           single.end = FALSE, param = param)
    1: summarizeOverlaps(exByGn, bamfls, mode = "Union", ignore.strand = TRUE, 
           single.end = FALSE, param = param)
    Last edited by user 31888; 11-16-2015, 05:06 PM.

  • #2
    This error suggests that one of your two input arguments is empty. Try the following two commands to see if anything looks odd:
    Code:
    > str(exByGn)
    > str(bamfls)

    Comment


    • #3
      Thanks ginger for your reply !

      Here are the outputs:
      Code:
      > str(exByGn)
      Formal class 'GRangesList' [package "GenomicRanges"] with 5 slots
        ..@ unlistData     :Formal class 'GRanges' [package "GenomicRanges"] with 6 slots
        .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
        .. .. .. .. ..@ values         : Factor w/ 93 levels "chr1","chr2",..: 19 8 20 18 1 23 3 14 15 11 ...
        .. .. .. .. ..@ lengths        : int [1:19380] 15 2 12 17 17 4 1 11 10 21 ...
        .. .. .. .. ..@ elementMetadata: NULL
        .. .. .. .. ..@ metadata       : list()
        .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
        .. .. .. .. ..@ start          : int [1:272776] 58858172 58858719 58859832 58860934 58861736 58862757 58863649 58864294 58864658 58864770 ...
        .. .. .. .. ..@ width          : int [1:272776] 224 288 663 1084 282 297 273 270 36 96 ...
        .. .. .. .. ..@ NAMES          : NULL
        .. .. .. .. ..@ elementType    : chr "integer"
        .. .. .. .. ..@ elementMetadata: NULL
        .. .. .. .. ..@ metadata       : list()
        .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
        .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 2 1 2 1 2 1 2 1 2 1 ...
        .. .. .. .. ..@ lengths        : int [1:11717] 15 2 46 5 11 87 1 4 1 8 ...
        .. .. .. .. ..@ elementMetadata: NULL
        .. .. .. .. ..@ metadata       : list()
        .. .. ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
        .. .. .. .. ..@ rownames       : NULL
        .. .. .. .. ..@ nrows          : int 272776
        .. .. .. .. ..@ listData       :List of 2
        .. .. .. .. .. ..$ exon_id  : int [1:272776] 250809 250810 250811 250812 250813 250814 250815 250816 250817 250818 ...
        .. .. .. .. .. ..$ exon_name: chr [1:272776] NA NA NA NA ...
        .. .. .. .. ..@ elementType    : chr "ANY"
        .. .. .. .. ..@ elementMetadata: NULL
        .. .. .. .. ..@ metadata       : list()
        .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
        .. .. .. .. ..@ seqnames   : chr [1:93] "chr1" "chr2" "chr3" "chr4" ...
        .. .. .. .. ..@ seqlengths : int [1:93] 249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 ...
        .. .. .. .. ..@ is_circular: logi [1:93] NA NA NA NA NA NA ...
        .. .. .. .. ..@ genome     : chr [1:93] "hg19" "hg19" "hg19" "hg19" ...
        .. .. ..@ metadata       : list()
        ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
        .. .. ..@ rownames       : NULL
        .. .. ..@ nrows          : int 23459
        .. .. ..@ listData       : Named list()
        .. .. ..@ elementType    : chr "ANY"
        .. .. ..@ elementMetadata: NULL
        .. .. ..@ metadata       : list()
        ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
        .. .. ..@ end            : int [1:23459] 15 17 29 46 63 67 68 79 89 110 ...
        .. .. ..@ NAMES          : chr [1:23459] "1" "10" "100" "1000" ...
        .. .. ..@ elementType    : chr "integer"
        .. .. ..@ elementMetadata: NULL
        .. .. ..@ metadata       : list()
        ..@ elementType    : chr "GRanges"
        ..@ metadata       :List of 1
        .. ..$ genomeInfo:List of 19
        .. .. ..$ Db type                                 : chr "TxDb"
        .. .. ..$ Supporting package                      : chr "GenomicFeatures"
        .. .. ..$ Data source                             : chr "UCSC"
        .. .. ..$ Genome                                  : chr "hg19"
        .. .. ..$ Organism                                : chr "Homo sapiens"
        .. .. ..$ Taxonomy ID                             : chr "9606"
        .. .. ..$ UCSC Table                              : chr "knownGene"
        .. .. ..$ Resource URL                            : chr "http://genome.ucsc.edu/"
        .. .. ..$ Type of Gene ID                         : chr "Entrez Gene ID"
        .. .. ..$ Full dataset                            : chr "yes"
        .. .. ..$ miRBase build ID                        : chr "GRCh37"
        .. .. ..$ transcript_nrow                         : chr "82960"
        .. .. ..$ exon_nrow                               : chr "289969"
        .. .. ..$ cds_nrow                                : chr "237533"
        .. .. ..$ Db created by                           : chr "GenomicFeatures package from Bioconductor"
        .. .. ..$ Creation time                           : chr "2015-10-07 18:11:28 +0000 (Wed, 07 Oct 2015)"
        .. .. ..$ GenomicFeatures version at creation time: chr "1.21.30"
        .. .. ..$ RSQLite version at creation time        : chr "1.0.0"
        .. .. ..$ DBSCHEMAVERSION                         : chr "1.1"
      > str(bamfls)
      Formal class 'BamFileList' [package "Rsamtools"] with 4 slots
        ..@ listData       :List of 3
        .. ..$ sample_1.bam :Reference class 'BamFile' [package "Rsamtools"] with 8 fields
        .. .. ..$ .extptr         :<externalptr> 
        .. .. ..$ path            : chr "tophat_all//sample_1.bam"
        .. .. ..$ index           : chr "tophat_all//sample_1.bam.bai"
        .. .. ..$ yieldSize       : int NA
        .. .. ..$ obeyQname       : logi FALSE
        .. .. ..$ asMates         : logi FALSE
        .. .. ..$ qnamePrefixEnd  : chr NA
        .. .. ..$ qnameSuffixStart: chr NA
        .. .. ..and 14 methods.
        .. ..$ sample_2.bam    :Reference class 'BamFile' [package "Rsamtools"] with 8 fields
        .. .. ..$ .extptr         :<externalptr> 
        .. .. ..$ path            : chr "tophat_all//sample_2.bam"
        .. .. ..$ index           : chr "tophat_all//sample_2.bam.bai"
        .. .. ..$ yieldSize       : int NA
        .. .. ..$ obeyQname       : logi FALSE
        .. .. ..$ asMates         : logi FALSE
        .. .. ..$ qnamePrefixEnd  : chr NA
        .. .. ..$ qnameSuffixStart: chr NA
        .. .. ..and 14 methods.
        .. ..$ sample_3.bam:Reference class 'BamFile' [package "Rsamtools"] with 8 fields
        .. .. ..$ .extptr         :<externalptr> 
        .. .. ..$ path            : chr "tophat_all//sample_3.bam"
        .. .. ..$ index           : chr "tophat_all//sample_3.bam.bai"
        .. .. ..$ yieldSize       : int NA
        .. .. ..$ obeyQname       : logi FALSE
        .. .. ..$ asMates         : logi FALSE
        .. .. ..$ qnamePrefixEnd  : chr NA
        .. .. ..$ qnameSuffixStart: chr NA
        .. .. ..and 14 methods.
        ..@ elementType    : chr "BamFile"
        ..@ elementMetadata: NULL
        ..@ metadata       : list()
      Last edited by user 31888; 11-17-2015, 01:27 PM. Reason: corrected from 'sample_2.bai' to 'sample_2.bam.ai'

      Comment


      • #4
        There's something odd about sample_2.bam:

        Code:
          .. .. ..$ path            : chr "GAGE//sample_2.bam"
          .. .. ..$ index           : chr "GAGE//sample_2.bai"
        I think this should be:

        Code:
          .. .. ..$ path            : chr "GAGE//sample_2.bam"
          .. .. ..$ index           : chr "GAGE//sample_2[b].bam.[/b]bai"
        If that's not the problem, I can't see anything else obviously out of place that might indicate an empty list.

        Comment


        • #5
          My mistake, sorry.

          It is actually 'sample_2.bam.bai' (I edited my previous post)

          Comment


          • #6
            Stupid thought: could it be something like EOF or other formatting issue that differs between my sample.bam files and TxDb.Hsapiens.UCSC.hg19.knownGene?

            (I am using a Linux cluster)

            Comment


            • #7
              Hello everyone

              Hi! user 31888, Does your issue being solved? if it does ,colud you please provide me some advice? I meet the same question ,Thank you
              Last edited by Enmity_LD; 01-19-2016, 07:01 PM.

              Comment


              • #8
                maybe see:
                https://support.bioconductor.org/p/74677/#75002

                But I ended up doing a fresh install of pretty much everything (R, Python, Bioconductor and associated packages), and it worked.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                7 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                7 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X