SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
N50 and N90 contig size refer to? edge Bioinformatics 27 07-13-2015 03:13 PM
Insert size != Fragment size? Boel Bioinformatics 6 12-12-2013 08:28 AM
SRMA Problem SAMRecord contig does not match the current reference sequence contig gavin.oliver Bioinformatics 5 07-05-2011 05:28 AM
k-mer size impacts coverage distribution (animated gif inside!) seb567 Bioinformatics 0 11-06-2010 05:20 PM
Bimodal insert size distribution Pepe Bioinformatics 1 03-03-2010 04:10 AM

Reply
 
Thread Tools
Old 04-04-2012, 10:57 AM   #1
kopardev
Member
 
Location: VA, USA

Join Date: Oct 2011
Posts: 18
Default Contig size distribution vizualizer

Is there something like amos Hawkeye for any assembly fasta file??
kopardev is offline   Reply With Quote
Old 04-04-2012, 12:24 PM   #2
kopardev
Member
 
Location: VA, USA

Join Date: Oct 2011
Posts: 18
Default

I am looking to visualize an assembly. That is make a histogram of contig sizes and possibly color by number of N's. Is there a way to do this quickly??
kopardev is offline   Reply With Quote
Old 04-05-2012, 01:18 AM   #3
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

i do not know a program but that should be can done in R quite easily...
sphil is offline   Reply With Quote
Old 04-05-2012, 04:55 AM   #4
kopardev
Member
 
Location: VA, USA

Join Date: Oct 2011
Posts: 18
Default

Sphil.. do you have a R script for this lying around that you can post here.

I got this after hours of google-ing:


library(lattice)
data<-read.table("contigs.bases.sizes")
y=t(data[2])
myhist <- function(x, ..., breaks="Sturges",
main = paste("Histogram of", xname),
xlab = xname,
ylab = "Frequency") {
xname = paste(deparse(substitute(x), 500), collapse="\n")
h = hist(x, breaks=breaks, plot=FALSE)
plot(h$breaks, c(NA,h$counts), type='S', main=main,
xlab=xlab, ylab=ylab, axes=FALSE, ...)
axis(1)
axis(2)
lines(h$breaks, c(h$counts,NA), type='s')
lines(h$breaks, c(NA,h$counts), type='h')
lines(h$breaks, c(h$counts,NA), type='h')
lines(h$breaks, rep(0,length(h$breaks)), type='S')
invisible(h)
}
myhist(y,log="y",breaks=100,xlab="contig length(bp)")
q()


But this does not give any control over size of the histogram bin or any way to color bins by number of N's in them. I am not a R expert, can someone help me out?
Thanks!
kopardev is offline   Reply With Quote
Old 04-05-2012, 05:11 AM   #5
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

Hey,

unfortunately i do no have one. Maybe one of my collegues. I am out of office right now. I will contact you if so...


best,


Phil
sphil is offline   Reply With Quote
Old 04-05-2012, 05:57 AM   #6
sisch
Member
 
Location: Dusseldorf, Germany

Join Date: Jun 2011
Posts: 29
Default

Hey,

I still often do this the old-school way, using the count_fasta.pl perl-script of Joseph Fass. http://wiki.bioinformatics.ucdavis.e...Count_fasta.pl

Usage for bin-width of 20:
Code:
perl count_fasta.pl -i 20 infile.fasta
This gives you a histogram table you can use to visualize in your preferred spreadsheet software. Furthermore it provides information about N50 and GC content, which is useful for judging the assembly quality.

Best,
Simon

Edit:
No support for differential coloring within the script, of course.
sisch is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO