SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
solexa genomic adapter and pictures? seqgirl123 Illumina/Solexa 38 05-16-2013 05:00 AM
Just for fun: Bioinformatics ART - Have you made any cool pictures? dsenalik General 0 09-12-2011 05:56 PM
PubMed: Enhanced mismatch mutation analysis: simultaneous detection of point mutation Newsbot! Literature Watch 0 12-14-2010 02:20 AM
Mutation discovery wrapapu Genomic Resequencing 1 03-24-2010 01:19 AM

Reply
 
Thread Tools
Old 01-06-2013, 09:02 PM   #1
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default why aren't they using mutation pictures

why aren't they using mutation pictures

typically you get a set of m organisms with similar RNA or DNA of length n1
and you want to determine the relationship, the mutations.
The p1 positions where all (or all but c) are the same are not interesting
and can be omitted. Let n=n1-p1.
Then you choose one reference sequence, the common ancestor, typically
the average of the sequences.
Then you make the binary m*n mutation-matrix whose rows are the
sequences,organisms , whose columns are the positions and there
is a 1 ar position (i,j) iff organism i is different from
the reference at position j.
Then you draw the mutation picture where 1s are black pixels and 0 are white
pixels like this:
Then the rows and columns can be re-ordered to show the best grouping
of the pixels into connected areas. Related organisms and mutations are
placed next to each other.
http://magictour.free.fr/seq/mitogi3.GIF
This is straightforward, isn't it ? How else could it be done.How to visualize
the evolution of that set.
Yet I never saw it.I don't even know how others do call this "mutation picture"
Instead people are doing "phylo"-trees to assign the organisms to groups.
with lines and sequence names in it. But this takes more space, is not
rectangular and doesn't so well visualize the distances between the groups.

What's the reason that we don't see the mutation pictures in papers
or webpages ? Is it the way how our science is organized with grants,papers
and peer-reviews and there is just no money to be made from it ?

I don't understand.
(I'm mainly doing influenza sequences)

-------------------------------------------------

how to name it, how to find it , have you better suggestions ?
what are suitable keywords to find it
in fact a google image search for "mutation picture" matrix positions sequences
gave this as top hit, my own previous post
http://seqanswers.com/forums/showthread.php?t=25554

Last edited by gsgs; 01-06-2013 at 10:14 PM.
gsgs is offline   Reply With Quote
Old 01-07-2013, 10:28 AM   #2
aaronh
Member
 
Location: California

Join Date: Sep 2008
Posts: 45
Default

I like your image for small datasets but don't think it is very useful for large scale analysis. People did just show the segregating sites in old papers when they had only one gene to sequence. This kind of representation is good for one or a few genes but the number of sites/pixel will hide more than reveal once you get to 10s of kb. Plus the question most people really want to answer is the relationship between the sequences and phylogenetic clustering is the best way to get that answer.
aaronh is offline   Reply With Quote
Old 01-07-2013, 10:51 AM   #3
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

I've been using it a lot for influenza mutations, sets of some thousand sequences,
only listing positions with more than 1 sequence mutating there.

Over phylo trees it has the advantage that you can quickly detect reassortments,
recombinations, that the size is smaller, that you can see how/where
distant groups are related,that you can see in what regions the mutations cluster
(no sorting of columns for that purpose)

For human DNA that recombines a lot you can see the recombination frequency
and mixing of haplogroups in the regions.
gsgs is offline   Reply With Quote
Old 01-07-2013, 11:35 AM   #4
aaronh
Member
 
Location: California

Join Date: Sep 2008
Posts: 45
Default

Sure for a exploration of the data, it could be useful. But most of the time, a raw view is not going to make your major point and it is still limited to a max number of segregating sites. The only place I've seen it used well is in "On the Origin and Spread of an Adaptive Allele in Deer Mice".
aaronh is offline   Reply With Quote
Old 11-25-2015, 11:08 PM   #5
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

when I have many sequences, I just pick 1000 or such at random,
order them horizontally (nucleotide-positions) and vertically (sequences)
to get the black 2d-"regions", which show the clusterings
and evolution better (IMO) than the phylo-trees
gsgs is offline   Reply With Quote
Old 11-27-2015, 01:24 AM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

We've got around 800 mitochondrial DNA sequences, and I've generated plots like this. Unfortunately, they never look good.

There are patches of the genome that are hypervariable, and then long stretches of conserved sequences. You can treat each variant as the same length in the mutation plot, but then that overemphasises the hypervariable regions and doesn't give you a good idea of the genomic context. You can put a single mutation at the precise variant points, but then that makes it difficult to see rare variants in the middle of a conserved region. You can expand out the variant blocks to occupy the entirety of the sequence up to half-way to the next variant, but that overemphasises the variant-poor regions and hides the hypervariable regions.

Then there's the issue of clustering. Without proper clustering the mutation plots are a jumbled mess. Unfortunately for most cases, it's impossible to assign a perfect linear order to genotyped individuals based on variant similarity, and a lot of time can be wasted trying to get it looking a little bit better than what an automated method can do.
gringer is offline   Reply With Quote
Old 11-27-2015, 02:01 AM   #7
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

hmm, to improve it, just "multiply" every mutation by the hypervariability of the region.
Create new artificial nucleotide positions that may get new artificial mutations to extend the
nonvariable ones as compared to the variable ones. Use random numbers to decide this,
so to reduce the total number of locations.
For sorting (=clustering ?) (horizontally and vertically) I use a "traveling salesman" algorithm
to minimize the sum of the neighbor-distances. So far that worked well for me, i.e. with influenza.

You could sort the sequences in 3 or more dimensions and then show 2d-projection-pictures (binary quaders) rotating on keys or mouse-clicks in any direction ....
I haven't tried that yet

---edit-----------
or just do it sepately for the regions, k pictures for the k regions in the k-th
hypervariability-group

Last edited by gsgs; 11-27-2015 at 02:23 AM.
gsgs is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO