Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 1 04-18-2018 09:50 PM
Automate your Next Generation Sequencing Sample Preparation LauraBurnett Events / Conferences 0 04-23-2012 12:14 PM
[NGS - analysis of gene expression data] Machine Learning + RNAseq data Chuckytah Bioinformatics 7 03-05-2012 03:16 AM
can I automate running dindel? libiyagirl Bioinformatics 5 03-04-2011 11:27 AM
Graphing Genome Distribution of Position Data DrD2009 Bioinformatics 12 01-29-2011 11:06 PM

Thread Tools
Old 06-29-2012, 09:12 AM   #1
Location: Houston TX USA

Join Date: Jun 2012
Posts: 13
Default How do I automate the graphing of these data?

Hi, everybody,

I have result files generated by blastn which then were sorted based on the second field. A typical file looks like:

360 miR156a
1 miR156a
9 miR156a
1 miR156a
10 miR156a
7 miR156a
1 miR156a
705 miR157a
2 miR157a
1 miR157a
5 miR157a
4 miR157a
67 miR157a
5 miR157a
11 miR157a
2 miR157a
34 miR159
3 miR162
3 miR166a
17 miR166a
4 miR166a
103 miR167a
1 miR167a
... .....

The first column is the deepseq read counts for each unique sequence. The 2nd column is the miR IDs that the sequence was aligns to.
I would like to:
Sum the total read counts for each miR IDs (e.g. for miR156a, sum row1-row7);
Generate a bar graph to show the total read counts for each miR ID.

I have more than 20 files like this. I would like to use an automated way of doing this. The R package came to my minds.
But I have not used R before. Can you guys give me some tips or suggestions as about which R package or tools to use? (I can then learn those and figure out)

If possible, generate a table that summarize all the total reads info from the 20 files.
The table that I would like to have is as follows:

miRID sample1 sample2 sample3 ......... sample 20
miR156 103 300 450 .......... 33
miR157 205 300 ..........
miR167 .....
.... .......

Thanks a lot!!


Last edited by yangjianhunt; 06-29-2012 at 09:14 AM.
yangjianhunt is offline   Reply With Quote
Old 06-29-2012, 12:15 PM   #2
Senior Member
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319

For 1), the bar plot part is easy in R; just use barplot() !

Summing the counts can be done in a lot of different ways. Here is one that is maybe a bit cryptic but will teach you the table() command. Assume you have the table you pasted in a text file called mirna.txt. Try to run the following in R, with the mirna.txt file in the current working directory:

m <- read.table("mirna.txt")
q <- table(m)
totcounts <- as.numeric(rownames(q)) %*% q

There are of course more transparent ways of summing the counts, but I'm too lazy to type them out :-)
kopi-o is offline   Reply With Quote
Old 06-29-2012, 02:02 PM   #3
Location: Houston TX USA

Join Date: Jun 2012
Posts: 13

Thanks a lot, kopi-o.

This looks awesome. I will try it out.

yangjianhunt is offline   Reply With Quote
Old 07-03-2012, 07:20 AM   #4
Location: Houston TX USA

Join Date: Jun 2012
Posts: 13
Default solved

I eventually used:
list.files () function to get all the files
lapply () to achieve processing for multiple functions.
read.table () to read data.frame from each file
tapply (SeqCounts, miRNA, sum) to get a counting for each "class"
write.table () to write data into a file, append=TRUE
also used paste() and cat () to write a name before each appendage.
barplot () to draw polt

It took me a couple of days to learn the introductory basics of R. But it was fun and will be useful in the future I hope.

Again, thanks to Kopi-o for point the way: I haven't learned how to used the table () function yet...But I feel confident to be able to learn it now.
yangjianhunt is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 07:13 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO