Exploring the wild beasts of the layout jungle

There was a bug submitted to the CDK sourceforge tracker (bug number 2783741) with a list of molecules that are laid out badly. I had a look at some of them with the help of bioclipse. For example, this calix[4]arene:

or this:

which is a clearer case of something going wrong. More difficult is structures that are fully 3D, like:

Can you guess what it is? :) Try the 3D version (also made with bioclipse, using the CDK 3D layout):

It's a paracyclophane! The phenyl rings are lost in the 2D layout because there is a bigger 'ring'. Perhaps a chemist would look at the 3D structure and think that those chains are linkers, not parts of a ring, but the algorithm doesn't know this.

I think that it is difficult to have general rules for this. Of course, any fully 3D structure will be difficult to lay out in 2D (if it is not embeddable in the plane then it is impossible) so things like this:

are truly awful.

Comments

Egon Willighagen said…

These larger ring systems are a problem indeed. I recently added some templates to cover some larger ring systems of some size, giving them the more common 120 degree angles, but more are needed...

Or, an algorithm to layout ring systems of 10 bonds and larger into a 120 angle layout... sort of like a multi-phenyl system, but then without the inner atoms, and just the edge...

This is when I found that the template matcher took into account bond orders, basically matching these templates only match the cylco-foo-ane, and not the variants with -[di|tri|etc]-enes, -ynes, etc...

1 May 2009 at 06:23

Unknown said…

The current standard in SDG is set by the guys from CCG (http://dx.doi.org/10.1021/ci050550m). Those large rings, btw, can be easily laid-out with a honey-comb-embedding algorithm and need no templates.

4 May 2009 at 12:10

Egon Willighagen said…

Yes, indeed, no honey-cumb templates needed once we have an implementation of that algorithm.

4 May 2009 at 12:15

Adamantane, Diamantane, Twistane

After cubane, the thought occurred to look at other regular hydrocarbons. If only there was some sort of classification of chemicals that I could use look up similar structures. Oh wate, there is . Anyway, adamantane is not as regular as cubane, but it is highly symmetrical, looking like three cyclohexanes fused together. The vertices fall into two different types when colored by signature: The carbons with three carbon neighbours (degree-3, in the simple graph) have signature (a) and the degree-2 carbons have signature (b). Atoms of one type are only connected to atoms of another - the graph is bipartite . Adamantane connects together to form diamondoids (or, rather, this class have adamantane as a repeating subunit). One such is diamantane , which is no longer bipartite when colored by signature: It has three classes of vertex in the simple graph (a and b), as the set with degree-3 has been split in two. The tree for signature (c) is not shown. The graph is still bipartite accordin

Király's Method for Generating All Graphs from a Degree Sequence

After posting about the Hakimi-Havel theorem, I received a nice email suggesting various relevant papers. One of these was by Zoltán Király called " Recognizing Graphic Degree Sequences and Generating All Realizations ". I have now implemented a sketch of the main idea of the paper, which seems to work reasonably well, so I thought I would describe it. See the paper for details, of course. One focus of Király's method is to generate graphs efficiently , by which I mean that it has polynomial delay. In turn, an algorithm with 'polynomial delay' takes a polynomial amount of time between outputs (and to produce the first output). So - roughly - it doesn't take 1s to produce the first graph, 10s for the second, 2s for the third, 300s for the fourth, and so on. Central to the method is the tree that is traversed during the search for graphs that satisfy the input degree sequence. It's a little tricky to draw, but looks something like this: At the top

General Graph Layout : Putting the Parts Together

An essential tool for graph generation is surely the ability to draw graphs. There are, of course, many methods for doing so along with many implementations of them. This post describes one more (or perhaps an existing method - I haven't checked). Firstly, lets divide a graph up into two parts; a) the blocks, also known as ' biconnected components ', and b) trees connecting those blocks. This is illustrated in the following set of examples on 6 vertices: Trees are circled in green, and blocks in red; the vertices in the overlap between two circles are articulation points. Since all trees are planar, a graph need only have planar blocks to be planar overall. The layout then just needs to do a tree layout on the tree bits and some other layout on the embedding of the blocks. One slight wrinkle is shown by the last example in the image above. There are three parts - two blocks and a tree - just like the one to its left, but sharing a single articulation point. I had

Some Stuff

Search This Blog