PostgreSQL Ltree and LATERAL

Let’s assume you want to find all node leaves. To keep it simple for all examples we will use DB from previous post. Leaf is a node without children. If you have parent_id it seems easy to find all leaves

SELECT * FROM "properties" WHERE "id" NOT IN
  (SELECT DISTINCT "parent_id" FROM "properties" WHERE "parent_id" IS NOT NULL);

We just excluding all nodes that are parents for any node and that’s it. But what if you are prefer to use pure ltree without parent_id? And here come trebles. All that I can find by googling this problem is how to find node with most children count:

SELECT subpath("path", 0, 1), count(*)
  FROM "properties"
  GROUP BY 1 ORDER BY 2 DESC;

And to be honest, I’m envy guy who invented it. Pretty simple. With help of subpath we are retrieving root element for each node then group by it and count how much such elements. But that is no our problem.

As I said before, leaf is a node without children. We can find all children nodes with ltree <@ ltree.

ltree <@ ltree boolean is left argument a descendant of right (or equal)?

It will return all children and self. As leaf have zero children and self it will be one result for leaf ancestors. As we have statement for filtering leaves we need to check each node. As I just learned LATERAL queries my first thought was to use it. From documentation:

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s)

Each row, see?

SELECT * FROM (
  SELECT "path", "children_count" FROM "properties",
  LATERAL (
    SELECT COUNT("path") AS "children_count"
    FROM  "properties" AS "subproperties"
    WHERE "properties"."path" @> "subproperties"."path"
  ) AS "subproperties"
) AS "path_with_children" WHERE "path_with_children"."children_count" = 1
GROUP BY 1, 2;

But it seems that there are too much overhead. Something is wrong here. Let’s try to use INNER JOIN with statement instead of additional WHERE.

SELECT * FROM "properties"
  INNER JOIN LATERAL (
    SELECT COUNT("path") AS "children_count"
    FROM  "properties" AS "subproperties"
    WHERE "properties"."path" @> "subproperties"."path"
  ) AS "path_with_children" ON "path_with_children"."children_count" = 1 ;

Seems better but still complex. Wait, is it really necessary to calculate table and then filter values from it? Seems like we can just filter each node with simple query.

SELECT count("path") = 1 FROM "properties" WHERE "path" <@ '1'; -- f
SELECT count("path") = 1 FROM "properties" WHERE "path" <@ '1.4'; -- t
SELECT count("path") = 1 FROM "properties" WHERE "path" <@ '1.2.3'; -- f

And apply it for each node:

SELECT * FROM "properties" AS "main" WHERE (
  SELECT count("path") = 1 FROM "properties" WHERE "path" <@ "main"."path");

Simple and elegant way, what we need!

When I show it to Andrei, he noticed that it will not work for nodes with same path. And if it has not so much sense for path with ids it’s normal for path with hierarchical structures.

To reproduce this bug on our data we will add node with same path but another value.

insert into properties (name, path, parent_id) VALUES ('H', '1.3.7', 9);

 id | name | path  | parent_id
----+------+-------+-----------
  4 | D    | 1.4   |         1
  5 | E    | 1.2.5 |         2
  6 | F    | 1.2.6 |         2

We missed two leaves with path 1.3.7 because <@ operator will return all ancestors including self and nodes with same pathes. To fix it we just need to add DISTINCT keyword in right place.

SELECT * FROM "properties" AS "main" WHERE (
  SELECT count(DISTINCT "path") = 1 FROM "properties" WHERE "path" <@ "main"."path");

 id | name | path  | parent_id
----+------+-------+-----------
  4 | D    | 1.4   |         1
  5 | E    | 1.2.5 |         2
  6 | F    | 1.2.6 |         2
  7 | G    | 1.3.7 |         3
  8 | H    | 1.3.7 |         9

As consiquence: if your solution is too complex usualy it’s not right solution.

P.S. If you want to see nice example of using LATERAL keyword you can check this article.