SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bbmap callvariants.sh multisample vcf format, wrong number of fields Greg Bioinformatics 5 04-25-2018 12:00 AM
Bowtie Exec format error. Wrong Architecture. tboothby Bioinformatics 2 12-01-2011 11:25 AM
bowtie - invalid CIGAR string - wrong sam format genome Bioinformatics 2 02-16-2011 01:36 PM
BWA concise format output -edit distance wrong biterbilen Bioinformatics 2 11-06-2009 02:55 PM
About Data Format and Eland Input foolishbrat Bioinformatics 2 01-22-2009 09:15 AM

Reply
 
Thread Tools
Old 03-21-2019, 07:39 AM   #1
deKoch13
Member
 
Location: HD

Join Date: Mar 2019
Posts: 12
Default UpSet R plot, input data format wrong?

Hi!

I processed 3 BAM files that were generated from 3 different pipelines, so in total 9 BAM files by writing scripts in bash and python. I extracted the mapped reads from the BAM files and stored them in python sets. Then, I performed pair-wise intersection operations to see which reads are common in which BAM files (despite different pipelines).

The output 3x3 matrix was written into a tsv file:

14659 14659 14647
14659 15731 15709
14647 15709 15709

Numbers correspond to the number of reads that are in one intersection between 2 files.

Now, I wanted to load the marix into R and create an UpSet R plot. I know that a Venn Diagram would also work, but later on, I will have more pipelines to compare and so I chose UpSet R plots. I tried this code:

upset(test_df, sets = 'reconstructed', 'shuffled', 'trimmed',
number.angles = 30, point.size = 3.5, line.size = 2,
mainbar.y.label = "Read Intersections", sets.x.label = "Blabla",
text.scale = c(1.3, 1.3, 1, 1, 2, 0.75), mb.ratio = c(0.55, 0.45),
order.by = 'sets', keep.order = TRUE)

But an error occured:
Error in start_col:end_col : argument of length 0

Unfortunately, I am only a beginner in R w/o experience.
Maybe, someone has more experience in R or the UpSet package.

Greetings!
deKoch13 is offline   Reply With Quote
Old 03-21-2019, 09:57 PM   #2
Meyana
Member
 
Location: Japan

Join Date: Sep 2017
Posts: 37
Default

I run UpSetR by inputting individual sets as a list and then the program calculates overlap itself (I am not aware whether it allows you to "manually" input the overlaps, never tried that).

#make input
list.Input = list(set1=data1,set2=data2,set3=data3)
#run upsetr
upset(fromList(list.Input),sets=c("set1","set2","set3"))

.. and then just adding additional commands (keep.order, nintersects, etc...) as needed.
Meyana is offline   Reply With Quote
Old 03-22-2019, 01:05 AM   #3
deKoch13
Member
 
Location: HD

Join Date: Mar 2019
Posts: 12
Default Tried it out, but...

Thank you, Meyana.

I tried your idea, but it still won't work.
How do your input data look like?

I just input 3 text files that each contain one column (read identifier from BAM files).
The upset output plot shows me the three sets, but no intersections.
Any suggestions?

Many greetings
deKoch13 is offline   Reply With Quote
Old 03-22-2019, 01:13 AM   #4
Meyana
Member
 
Location: Japan

Join Date: Sep 2017
Posts: 37
Default

My data1/data2/data3 are just vectors of the observations, which I then store in the list listInput, nothing special. The data observations themselves can have any format, mine look something like "A344D".

Did you store your data in the list?
Meyana is offline   Reply With Quote
Old 03-22-2019, 01:31 AM   #5
deKoch13
Member
 
Location: HD

Join Date: Mar 2019
Posts: 12
Default

This is what I've done:

#imported
library(UpSetR)

#make input
list.Input = list(set1 = "trimmed_bismark_bt2_pe.bam_mapped_reads.txt",
set2 = "shuffled_bismark_bt2_pe.bam_mapped_reads.txt",
set3 = "econstructed_bismark_bt2_pe.bam_mapped_reads.txt")

upset(fromList(list.Input), sets = c("set1", "set2", "set3"),
number.angles = 30, point.size = 3.5, line.size = 2,
mainbar.y.label = "Read Intersections", sets.x.label = "Blabla",
text.scale = c(1.3, 1.3, 1, 1, 2, 0.75), mb.ratio = c(0.55, 0.45),
order.by = 'freq', keep.order = TRUE)

So, I think that I stored the sets in a list. I also checked it with print(class(list.Input)).
Maybe, the package does not accept my input... three text files, one column each, just read identifier...
deKoch13 is offline   Reply With Quote
Old 03-24-2019, 04:37 PM   #6
Meyana
Member
 
Location: Japan

Join Date: Sep 2017
Posts: 37
Default

Your code works fine on my data.
Could you post a snippet of your data?
Meyana is offline   Reply With Quote
Old 03-25-2019, 02:40 AM   #7
deKoch13
Member
 
Location: HD

Join Date: Mar 2019
Posts: 12
Default Works now!

Hi Meyana,

it works now!
But you were absolutely right generating a set list and use the fromList function.
I was not aware that fromList creates a binary data frame that is compatible with the UpSet package.

Just for other forum users, my functional code:

library(UpSetR)

trimmed_df <- read.csv(file = "tri.txt", header = FALSE, sep = "\n")
shuffled_df <- read.csv(file = "shu.txt", header = FALSE, sep = "\n")
reconstructed_df <- read.csv(file = "rec.txt", header = FALSE, sep = "\n")

trimmed <- as.vector(trimmed_df$V1)
shuffled <- as.vector(shuffled_df$V1)
reconstructed <- as.vector(reconstructed_df$V1)

read_sets = list(
trimmed_reads = trimmed,
shuffled_reads = shuffled,
reconstructed_reads = reconstructed)

upset(fromList(read_sets),
sets = c("trimmed_reads", "shuffled_reads", "reconstructed_reads"),
number.angles = 20, point.size = 2.5, line.size = 1.5,
mainbar.y.label = "read intersection", sets.x.label = "read set size",
text.scale = c(1.5, 1.5, 1.25, 1.25, 1.5, 1.5), mb.ratio = c(0.65, 0.35),
group.by = "freq", keep.order = TRUE)

Again, thank you Meyana!
deKoch13 is offline   Reply With Quote
Old 03-25-2019, 03:58 PM   #8
Meyana
Member
 
Location: Japan

Join Date: Sep 2017
Posts: 37
Default

Great, happy to see it working for you!

In addition to the UpSetR package, there's also the SuperExactTest package, which you may also find interesting (though the graphical output is not the prettiest)
Meyana is offline   Reply With Quote
Old 05-10-2019, 10:46 AM   #9
guri
Junior Member
 
Location: USA

Join Date: May 2019
Posts: 1
Default Upset error

hi,

I have tried using upset plot for three vcf files from different pipelines. I extracted the variant column (SNPs) and used these csv files (with one column) for R import. I have used this code:

set1 <- read.csv("set1.vcf", sep="")
set2 <- read.csv("set2.vcf", sep="")
set3 <- read.csv("set3.vcf", sep="")

set1 <- as.vector(set1$V1)
set2 <- as.vector(set2$v1)
set3 <- as.vector(set3$V1)

read_sets = list(set1_reads = set1,
set2_reads = set2,
set3_reads = set3)

upset(fromList(read_sets),
sets = c("set1_reads", "set2_reads", "set3_reads"),
number.angles = 20, point.size = 2.5, line.size = 1.5,
mainbar.y.label = "read intersection", sets.x.label = "read set size",
text.scale = c(1.5, 1.5, 1.25, 1.25, 1.5, 1.5), mb.ratio = c(0.65, 0.35),
group.by = "freq", keep.order = TRUE)

It gives an intersection plot but when the number of SNPs from upset plot are really low when I compared these with vcf-compare results using same vcf files. I am not sure why I am getting different numbers with upset plot.
guri is offline   Reply With Quote
Reply

Tags
r programming, upset plots

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO