Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UpSet R plot, input data format wrong?

    Hi!

    I processed 3 BAM files that were generated from 3 different pipelines, so in total 9 BAM files by writing scripts in bash and python. I extracted the mapped reads from the BAM files and stored them in python sets. Then, I performed pair-wise intersection operations to see which reads are common in which BAM files (despite different pipelines).

    The output 3x3 matrix was written into a tsv file:

    14659 14659 14647
    14659 15731 15709
    14647 15709 15709

    Numbers correspond to the number of reads that are in one intersection between 2 files.

    Now, I wanted to load the marix into R and create an UpSet R plot. I know that a Venn Diagram would also work, but later on, I will have more pipelines to compare and so I chose UpSet R plots. I tried this code:

    upset(test_df, sets = 'reconstructed', 'shuffled', 'trimmed',
    number.angles = 30, point.size = 3.5, line.size = 2,
    mainbar.y.label = "Read Intersections", sets.x.label = "Blabla",
    text.scale = c(1.3, 1.3, 1, 1, 2, 0.75), mb.ratio = c(0.55, 0.45),
    order.by = 'sets', keep.order = TRUE)

    But an error occured:
    Error in start_col:end_col : argument of length 0

    Unfortunately, I am only a beginner in R w/o experience.
    Maybe, someone has more experience in R or the UpSet package.

    Greetings!

  • #2
    I run UpSetR by inputting individual sets as a list and then the program calculates overlap itself (I am not aware whether it allows you to "manually" input the overlaps, never tried that).

    #make input
    list.Input = list(set1=data1,set2=data2,set3=data3)
    #run upsetr
    upset(fromList(list.Input),sets=c("set1","set2","set3"))

    .. and then just adding additional commands (keep.order, nintersects, etc...) as needed.

    Comment


    • #3
      Tried it out, but...

      Thank you, Meyana.

      I tried your idea, but it still won't work.
      How do your input data look like?

      I just input 3 text files that each contain one column (read identifier from BAM files).
      The upset output plot shows me the three sets, but no intersections.
      Any suggestions?

      Many greetings

      Comment


      • #4
        My data1/data2/data3 are just vectors of the observations, which I then store in the list listInput, nothing special. The data observations themselves can have any format, mine look something like "A344D".

        Did you store your data in the list?

        Comment


        • #5
          This is what I've done:

          #imported
          library(UpSetR)

          #make input
          list.Input = list(set1 = "trimmed_bismark_bt2_pe.bam_mapped_reads.txt",
          set2 = "shuffled_bismark_bt2_pe.bam_mapped_reads.txt",
          set3 = "econstructed_bismark_bt2_pe.bam_mapped_reads.txt")

          upset(fromList(list.Input), sets = c("set1", "set2", "set3"),
          number.angles = 30, point.size = 3.5, line.size = 2,
          mainbar.y.label = "Read Intersections", sets.x.label = "Blabla",
          text.scale = c(1.3, 1.3, 1, 1, 2, 0.75), mb.ratio = c(0.55, 0.45),
          order.by = 'freq', keep.order = TRUE)

          So, I think that I stored the sets in a list. I also checked it with print(class(list.Input)).
          Maybe, the package does not accept my input... three text files, one column each, just read identifier...

          Comment


          • #6
            Your code works fine on my data.
            Could you post a snippet of your data?

            Comment


            • #7
              Works now!

              Hi Meyana,

              it works now!
              But you were absolutely right generating a set list and use the fromList function.
              I was not aware that fromList creates a binary data frame that is compatible with the UpSet package.

              Just for other forum users, my functional code:

              library(UpSetR)

              trimmed_df <- read.csv(file = "tri.txt", header = FALSE, sep = "\n")
              shuffled_df <- read.csv(file = "shu.txt", header = FALSE, sep = "\n")
              reconstructed_df <- read.csv(file = "rec.txt", header = FALSE, sep = "\n")

              trimmed <- as.vector(trimmed_df$V1)
              shuffled <- as.vector(shuffled_df$V1)
              reconstructed <- as.vector(reconstructed_df$V1)

              read_sets = list(
              trimmed_reads = trimmed,
              shuffled_reads = shuffled,
              reconstructed_reads = reconstructed)

              upset(fromList(read_sets),
              sets = c("trimmed_reads", "shuffled_reads", "reconstructed_reads"),
              number.angles = 20, point.size = 2.5, line.size = 1.5,
              mainbar.y.label = "read intersection", sets.x.label = "read set size",
              text.scale = c(1.5, 1.5, 1.25, 1.25, 1.5, 1.5), mb.ratio = c(0.65, 0.35),
              group.by = "freq", keep.order = TRUE)

              Again, thank you Meyana!

              Comment


              • #8
                Great, happy to see it working for you!

                In addition to the UpSetR package, there's also the SuperExactTest package, which you may also find interesting (though the graphical output is not the prettiest)

                Comment


                • #9
                  Upset error

                  hi,

                  I have tried using upset plot for three vcf files from different pipelines. I extracted the variant column (SNPs) and used these csv files (with one column) for R import. I have used this code:

                  set1 <- read.csv("set1.vcf", sep="")
                  set2 <- read.csv("set2.vcf", sep="")
                  set3 <- read.csv("set3.vcf", sep="")

                  set1 <- as.vector(set1$V1)
                  set2 <- as.vector(set2$v1)
                  set3 <- as.vector(set3$V1)

                  read_sets = list(set1_reads = set1,
                  set2_reads = set2,
                  set3_reads = set3)

                  upset(fromList(read_sets),
                  sets = c("set1_reads", "set2_reads", "set3_reads"),
                  number.angles = 20, point.size = 2.5, line.size = 1.5,
                  mainbar.y.label = "read intersection", sets.x.label = "read set size",
                  text.scale = c(1.5, 1.5, 1.25, 1.25, 1.5, 1.5), mb.ratio = c(0.65, 0.35),
                  group.by = "freq", keep.order = TRUE)

                  It gives an intersection plot but when the number of SNPs from upset plot are really low when I compared these with vcf-compare results using same vcf files. I am not sure why I am getting different numbers with upset plot.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM
                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-14-2024, 06:13 AM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-08-2024, 08:03 AM
                  0 responses
                  71 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-07-2024, 08:13 AM
                  0 responses
                  80 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-06-2024, 09:51 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X