cross-posted from: https://mander.xyz/post/41224832
Just finished another visualization of entire taxonomy tree. Previous is buried here: GBIF ToL.
Main concept is very simple: each taxon is a point, and each taxon has a clockwise-bent arc from it’s parent taxon.
Trick is to place those points in a meaningful way. At first, I was using force-directed algorithm to do it. In general, it succeeded in grouping points by clades, but introduced a lot of branch overlapping (check how purple Echinodermata is “intruded” into Arthopoda in GBIF version).
Force-directed algorithms can layout not only trees, but basically any graph, and I thought: maybe tree-specific algorithm will produce a better result? I’ve found out there is a cool Voronoi Treemap algorithm which for any given tree can build a set of nested polygons, a polygon for each node in a tree. Not only it eliminates branch overlapping problem, but also it ensures those branches fit into convex polygons and you can even add gaps between adjacent branches. So I’ve built a CLI wrapper around a Java implementation I’ve found on GitHub.
At first, I’ve used it for NCBI database, but I didn’t use gaps and haven’t published interactive version yet (but there are PNGs in Wikimedia Commons). Then, I’ve made a treemap for ITIS. Points are points and polygons have been used for mouse hover feature. When I was making force-directed GBIF, I had to separately compute those polygons for each clade of given ranks. Now both points and polygons are computed by an algorithm, which is nice.
What do you think?


I will try to check if there are any discrepancies between visualization and ITIS db, and between ITIS db and other taxonomic sources.
Hymenoptera in ITIS has two direct children: Apocrita and Symphyta with 1904 and 39 genera each.
Hymenoptera in insecta.pro has three direct children (third is a dead end, will ignore): Apocrita and Symphyta with 6768 and 153 genera.
Apocrita is ~45 times larger than Symphyta in both databases, ITIS is representative in this case. In visualisation each clade gets as much space as it needs to fit all its leaf nodes (taxa without children). Apocrita probably got ~45 times more space than Symphyta, which is what I’d expect.
Also, I’ve tried ti find Symphyta in lifemap, but NCBI page (LifeMap is based on NCBI) for Hymenoptera has a comment about Symphyta being a paraphyletic group and therefore NCBI doesn’t have this suborder at all.
There are ~100 species in Microgaster clade and ~62 in Symphyta, not a big difference, they got comparable amount of space, I think it is also as expected.
As LLM would’ve said, “you’ve got to the heart of how Voronoi Treemaps work”. In GBIF they do not keep track of intermediate taxa at all, therefore Apidae and Halictidae in their system are equally related to Hymenoptera, there’s nothing else to group them together. While in ITIS, they do have a lot of taxa with intermediate rank, including Aculeata and Apoidea. These two additional links prevented spreading of Apidae and Halictidae to the opposite ends of Hymenoptera as it is in GBIF. I’ve decided to color points only by six main ranks, and I’ve made zoom to jump between these ranks, therefore intermediate polygons are somewhat obscure, but they already did their job, and you’ve noticed that, cool! When I will continue to work on these maps, probably I will not consider using GBIF as data source because of this exact detail you’ve mentioned - some branches can be placed further from each other than you expect.
Yes, I also was looking at them, usually it’s artificial groups like “unclassified Lepidoptera” with a lot of taxa which doesn’t even have a name, they have a code instead, like “BOLD:ACO0165”. You can find such groups in GBIF as well, e.g. in Lepidoptera there is a huge ball in the center with a lot of unnamed taxa squeezed together. This is somewhat similar. I think next time I will nuke them because they are not interesting, take a lot of space and don’t add up to the structure and readability.
Also, you can checkout this foamtree demo which is also a treemap, but it displays polygons instead of points, and you have to move through all the intermediate taxa by double clicking to get anywhere. To the right you can switch to Metazoa. They don’t use space as efficiently, Korarchaeota has a single known specie but got a huge polygon anyway. I am not related to this foamtree, they’re trying to sell visualisation library and to showcase it they’ve made a demo with taxonomy tree.
I guess one cannot achieve everything with a visualization like this and has to prioritize what’s more important.
Ah true, it’s used in iNaturalist so I’m used to it as a group. But yeah, these are probably just all basal hymenopterans.
I do not doubt that the visualization is correct, but rather that the underlying data is. The vast majority of symphyta species seems to be missing in the dataset, making the disparity so apparent.
Ah yes, forgot about that. Probably better to exclude them, I agree.
I wonder if there is a way to somehow combine datasets to fill in the gaps. Like adding more intermediate ranks to the gbif dataset by using the other ones. Looking at your tables, one could probably quite easily achieve this (although probably with some gaps). I didn’t find your code though, was wondering how you have written this :)
Or maybe use the style of the gbif visualization with the itis dataset?
Oof, I didn’t like this at all! It’s very hard to find anything in there. Tried to go for Araceae and could only find it by searching below for subfamilies. Apparently Araceae isn’t in their dataset as a rank, although other plant families are?
It would’ve been zero fun and same amount of success. Basically, creating a new taxonomy database while a lot of them already exist. I didn’t expect there are so many taxonomy databases, almost all of them being backed by scientific organizations and being freely accessible and downloadable. Other areas (books, movies, history) are not even close to this diversity of data sources.
Apart from Gephi Commander (already on Github), which is used for generating PNG tiles when you already have x and y for every taxon, there is also a CLI tool to build Voronoi (assign x,y) and another CLI tool to split those points across zoom levels and PBF vector tiles. Neo4j as a database and Powershell to bring all of this to life.
Not a fan either. There was another tool looking similar to Voronoi, made by a person working in scientific organization, but I can’t find it right now… There is a lot of interesting on this topic.
Oh right, now I see that you made very different network graphs based on all kinds of example data. I come from the opposite direction. I worked with a lot of ecological datasets, analyzing and plotting them. But I haven’t messed around with network graphs a lot. Maybe I’ll try to do my own version in R or python (I don’t know any java, so I cannot really understand your code). Because I’m really fascinated by the idea of having a nice rendering of the tree of life!