Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • deKoch13
    Member
    • Mar 2019
    • 12

    UpSet R plot, input data format wrong?

    Hi!

    I processed 3 BAM files that were generated from 3 different pipelines, so in total 9 BAM files by writing scripts in bash and python. I extracted the mapped reads from the BAM files and stored them in python sets. Then, I performed pair-wise intersection operations to see which reads are common in which BAM files (despite different pipelines).

    The output 3x3 matrix was written into a tsv file:

    14659 14659 14647
    14659 15731 15709
    14647 15709 15709

    Numbers correspond to the number of reads that are in one intersection between 2 files.

    Now, I wanted to load the marix into R and create an UpSet R plot. I know that a Venn Diagram would also work, but later on, I will have more pipelines to compare and so I chose UpSet R plots. I tried this code:

    upset(test_df, sets = 'reconstructed', 'shuffled', 'trimmed',
    number.angles = 30, point.size = 3.5, line.size = 2,
    mainbar.y.label = "Read Intersections", sets.x.label = "Blabla",
    text.scale = c(1.3, 1.3, 1, 1, 2, 0.75), mb.ratio = c(0.55, 0.45),
    order.by = 'sets', keep.order = TRUE)

    But an error occured:
    Error in start_col:end_col : argument of length 0

    Unfortunately, I am only a beginner in R w/o experience.
    Maybe, someone has more experience in R or the UpSet package.

    Greetings!
  • Meyana
    Member
    • Sep 2017
    • 40

    #2
    I run UpSetR by inputting individual sets as a list and then the program calculates overlap itself (I am not aware whether it allows you to "manually" input the overlaps, never tried that).

    #make input
    list.Input = list(set1=data1,set2=data2,set3=data3)
    #run upsetr
    upset(fromList(list.Input),sets=c("set1","set2","set3"))

    .. and then just adding additional commands (keep.order, nintersects, etc...) as needed.

    Comment

    • deKoch13
      Member
      • Mar 2019
      • 12

      #3
      Tried it out, but...

      Thank you, Meyana.

      I tried your idea, but it still won't work.
      How do your input data look like?

      I just input 3 text files that each contain one column (read identifier from BAM files).
      The upset output plot shows me the three sets, but no intersections.
      Any suggestions?

      Many greetings

      Comment

      • Meyana
        Member
        • Sep 2017
        • 40

        #4
        My data1/data2/data3 are just vectors of the observations, which I then store in the list listInput, nothing special. The data observations themselves can have any format, mine look something like "A344D".

        Did you store your data in the list?

        Comment

        • deKoch13
          Member
          • Mar 2019
          • 12

          #5
          This is what I've done:

          #imported
          library(UpSetR)

          #make input
          list.Input = list(set1 = "trimmed_bismark_bt2_pe.bam_mapped_reads.txt",
          set2 = "shuffled_bismark_bt2_pe.bam_mapped_reads.txt",
          set3 = "econstructed_bismark_bt2_pe.bam_mapped_reads.txt")

          upset(fromList(list.Input), sets = c("set1", "set2", "set3"),
          number.angles = 30, point.size = 3.5, line.size = 2,
          mainbar.y.label = "Read Intersections", sets.x.label = "Blabla",
          text.scale = c(1.3, 1.3, 1, 1, 2, 0.75), mb.ratio = c(0.55, 0.45),
          order.by = 'freq', keep.order = TRUE)

          So, I think that I stored the sets in a list. I also checked it with print(class(list.Input)).
          Maybe, the package does not accept my input... three text files, one column each, just read identifier...

          Comment

          • Meyana
            Member
            • Sep 2017
            • 40

            #6
            Your code works fine on my data.
            Could you post a snippet of your data?

            Comment

            • deKoch13
              Member
              • Mar 2019
              • 12

              #7
              Works now!

              Hi Meyana,

              it works now!
              But you were absolutely right generating a set list and use the fromList function.
              I was not aware that fromList creates a binary data frame that is compatible with the UpSet package.

              Just for other forum users, my functional code:

              library(UpSetR)

              trimmed_df <- read.csv(file = "tri.txt", header = FALSE, sep = "\n")
              shuffled_df <- read.csv(file = "shu.txt", header = FALSE, sep = "\n")
              reconstructed_df <- read.csv(file = "rec.txt", header = FALSE, sep = "\n")

              trimmed <- as.vector(trimmed_df$V1)
              shuffled <- as.vector(shuffled_df$V1)
              reconstructed <- as.vector(reconstructed_df$V1)

              read_sets = list(
              trimmed_reads = trimmed,
              shuffled_reads = shuffled,
              reconstructed_reads = reconstructed)

              upset(fromList(read_sets),
              sets = c("trimmed_reads", "shuffled_reads", "reconstructed_reads"),
              number.angles = 20, point.size = 2.5, line.size = 1.5,
              mainbar.y.label = "read intersection", sets.x.label = "read set size",
              text.scale = c(1.5, 1.5, 1.25, 1.25, 1.5, 1.5), mb.ratio = c(0.65, 0.35),
              group.by = "freq", keep.order = TRUE)

              Again, thank you Meyana!

              Comment

              • Meyana
                Member
                • Sep 2017
                • 40

                #8
                Great, happy to see it working for you!

                In addition to the UpSetR package, there's also the SuperExactTest package, which you may also find interesting (though the graphical output is not the prettiest)

                Comment

                • guri
                  Junior Member
                  • May 2019
                  • 1

                  #9
                  Upset error

                  hi,

                  I have tried using upset plot for three vcf files from different pipelines. I extracted the variant column (SNPs) and used these csv files (with one column) for R import. I have used this code:

                  set1 <- read.csv("set1.vcf", sep="")
                  set2 <- read.csv("set2.vcf", sep="")
                  set3 <- read.csv("set3.vcf", sep="")

                  set1 <- as.vector(set1$V1)
                  set2 <- as.vector(set2$v1)
                  set3 <- as.vector(set3$V1)

                  read_sets = list(set1_reads = set1,
                  set2_reads = set2,
                  set3_reads = set3)

                  upset(fromList(read_sets),
                  sets = c("set1_reads", "set2_reads", "set3_reads"),
                  number.angles = 20, point.size = 2.5, line.size = 1.5,
                  mainbar.y.label = "read intersection", sets.x.label = "read set size",
                  text.scale = c(1.5, 1.5, 1.25, 1.25, 1.5, 1.5), mb.ratio = c(0.65, 0.35),
                  group.by = "freq", keep.order = TRUE)

                  It gives an intersection plot but when the number of SNPs from upset plot are really low when I compared these with vcf-compare results using same vcf files. I am not sure why I am getting different numbers with upset plot.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  26 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  37 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  60 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...