Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to apply overlap to each row of a "GRangesList"

    Dear Experts,

    I have a "GRangesList" object like this:

    GRangesList of length 1:
    $TYPE1
    GRanges with 700000 ranges and 1 metadata column:
    seqnames ranges strand | id
    <Rle> <IRanges> <Rle> | <factor>
    [1] chr1 [ 0, 10000] * | Factor1
    [2] chr1 [ 9600, 20000] * | Factor2
    [3] chr1 [ 24000, 30000] * | Factor2



    And I am trying to overlap each row of this List to a each row of a GRanges object like this:

    GRanges with 200 ranges and 1 metadata column:
    seqnames ranges strand | name
    <Rle> <IRanges> <Rle> | <factor>
    rs1 chr1 [0, 1] * | rs1
    rs2 chr1 [9700, 9701] * | rs2


    My goal is to get a data frame containing a count for the overlap of each of the GRanges object with each of the GRangesList like this:

    rs1_Factor1 0 0 0
    rs_Factor2 0 1 0

    I can do this for one value at a time of the Factors of GRangesList like this:

    hits=countOverlaps(obj1, objList[1], type=c("within"))

    But how do I apply this to each row of GRangesList?

    I have tried unsuccessfully with mapply (Error in (function (query, subject, maxgap = 0L, minoverlap = 1L, type = c("any") error in evaluating the argument 'query' in selecting a method for function 'countOverlaps': Error in dots[[1L]][[1L]] : this S4 class is not subsettable)


    Thanks so much!

    -fra

  • #2
    You wanted lapply() rather than mapply:
    Code:
    lapply(objList, function(x,y) {countOverlaps(y,x,type="within")}, obj1)
    Having said that, I'm pretty sure you can give countOverlaps() a GRangesList and that that'll be faster than applying a function.
    Last edited by dpryan; 09-27-2015, 11:23 PM. Reason: Forgot a ")"

    Comment


    • #3
      Hi, thanks, the command as you have it gives me an error but this works: lapply(objList, function(x) countOverlaps(obj1, x, type=c("within")))

      However it does not give me what I am looking for because it loops through the lists and it gives me the overlap between each list element and the GRange object, but I would like to loop through each row of the list not each element...

      Comment


      • #4
        I could try do this with a double loop with the first loop for each elements of the list and the second loop for each row of the list like this:

        for (i in names(objList)){
        for (j in length(objList[[i]])) {
        t=as.data.frame(countOverlaps(obj1, objList[[i]][j,], type=c("within")))
        }

        But is there really no better way?

        Comment


        • #5
          Why not just:
          Code:
          countOverlaps(obj1, unlist(objList), type="within")
          That would seem to give you what you want.

          Comment


          • #6
            I had tried that but it doesn't work either... That gives me the overlap of obj1 and objList and not the overlap of the obj1 with each row of the list object (what I am trying tot get is the overlap of elements from obj1 with the first row of the first element in the list, then the overlap of the elements of obj1 with the second row of the first element in the list, etc...)
            Last edited by francy; 09-28-2015, 12:50 PM.

            Comment


            • #7
              Just give a small example of what you would like (i.e., give an example GRanges object, a GRangesList object and the output you would like).

              Comment


              • #8
                Yes sure, I am sorry for not having done that before.
                Here are the objList and obj1:

                TYPE1 <- GRanges(seqnames = c("chr1", "chr1", "chr1"), ranges=IRanges(start=c(0,9600,24000),
                end=c(10000, 20000, 30000)), id=c("Factor1", "Factor2", "Factor3"))

                TYPE2 <- GRanges(seqnames = c("chr2", "chr2", "chr2"), ranges=IRanges(start=c(0,9000,14000),
                end=c(13000, 20500, 30100)), id=c("Factor1", "Factor2", "Factor3"))

                objList <- GRangesList("TYPE1" = TYPE1, "TYPE2" = TYPE2)

                obj1 <- GRanges(seqnames = c("chr1", "chr1"), ranges=IRanges(start=c(0,9700), end=c(1, 9701)), id=c("rs1", "rs2"))


                And this is what I have working now with loops (to find the overlap of obj1 with each row of objList):

                nameList=names(objList)

                output=data.frame(row.names=c("rs1","rs2"))
                for (name in nameList) {
                id= objList[[name]]$id
                for (i in 1:length(id)) {
                dftemp=as.data.frame(countOverlaps(obj1, objList[[name]][i,], type=c("within")))
                output=cbind.data.frame(output,dftemp)
                }
                }

                There must be a better way though, since the loops take a very long time...
                Thank you!

                Comment


                • #9
                  I'm not sure what the point of that is, but
                  Code:
                  findOverlaps(obj1, unlist(objList), type='within')
                  would give you the same information faster. The output is also easier to deal with than what will likely be a gigantic and unwieldy (not to mention sparse) data frame.

                  Comment


                  • #10
                    Hi again...the output of that again gives me this, which is not what I am looking for.

                    > findOverlaps(obj1, unlist(objList), type='within')
                    Hits object with 3 hits and 0 metadata columns:
                    queryHits subjectHits
                    <integer> <integer>
                    [1] 1 1
                    [2] 2 1
                    [3] 2 2
                    -------
                    queryLength: 2
                    subjectLength: 6

                    This is what I am trying to obtain (please see the example in R above):

                    > output
                    Factor1_TYPE1 Factor2_TYPE1 Factor3_TYPE1 Factor1_TYPE2 Factor2_TYPE2 Factor3_TYPE2
                    rs1 1 0 0 0 0 0
                    rs2 1 1 0 0 0 0
                    Last edited by francy; 09-29-2015, 03:35 AM.

                    Comment


                    • #11
                      Yes, as I said, what I wrote produces the same information much faster. If you want the data frame then just make a matrix of 0s and change values to 1 according to the output of findOverlaps.

                      Comment


                      • #12
                        Hi dpryan, thank you, can you please explain better how I would go from the output of findOverlap or countOverlap with 3 entries indicating overlap with queryHits and 3 entries with overlap of subjectHits to the output with 6 entries indicating binary overlap with each of the queryHits? I am sorry for the confusion...thank you very much for your help.

                        Comment


                        • #13
                          Something like:

                          Code:
                          o <- findOverlaps(obj1, unlist(objList), type='within')
                          m <- matrix(0, nrow=length(obj1), ncol=length(unlist(objList)))
                          m[cbind(queryHits(o), subjectHits(o))] <- 1
                          Something along those lines.

                          Edit: BTW, you might need to use a sparse matrix, depending on how much memory you have and how large your objects are.

                          Comment


                          • #14
                            Ah I see! That is AMAZINGLY faster than my loop... Thank you so much dpryan for explaining this trick, very very thankful!!!

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            30 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            32 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            28 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X