Most efficient method of self referencing tree using Entity Framework

So I have a SQL table which is basically

ID, ParentID, MenuName, [Lineage, Depth]

The last two columns are auto-computed to help with searching so we can ignore them for now.

I'm creating a drop down menu system with multiple categories.

Unfortunately EF I don't think plays nice with Self referencing tables more than 1 level deep. So I'm left with a few options

1) Create query, order by depth and then create a custom class in C#, populating it one depth at a time.

2) Find some way to eager load the data in EF, I don't think it is possible for an unlimited amount of levels, only a fixed amount.

3) Some other way I'm not even sure about.

Any inputs would be welcomed!

Answers


I have successfully mapped hierarchical data using EF.

Take for example an Establishment entity. This can represent a company, university, or some other unit within a larger organizational structure:

public class Establishment : Entity
{
    public string Name { get; set; }
    public virtual Establishment Parent { get; set; }
    public virtual ICollection<Establishment> Children { get; set; }
    ...
}

Here is how the Parent / Children properties are mapped. This way, when you set the Parent of 1 entity, the Parent entity's Children collection is automatically updated:

// ParentEstablishment 0..1 <---> * ChildEstablishment
HasOptional(d => d.Parent)
    .WithMany(p => p.Children)
    .Map(d => d.MapKey("ParentId"))
    .WillCascadeOnDelete(false); // do not delete children when parent is deleted

Note that so far I haven't included your Lineage or Depth properties. You are right, EF doesn't work well for generating nested hierarchical queries with the above relationships. What I finally settled on was the addition of a new gerund entity, along with 2 new entity properties:

public class EstablishmentNode : Entity
{
    public int AncestorId { get; set; }
    public virtual Establishment Ancestor { get; set; }

    public int OffspringId { get; set; }
    public virtual Establishment Offspring { get; set; }

    public int Separation { get; set; }
}

public class Establishment : Entity
{
    ...
    public virtual ICollection<EstablishmentNode> Ancestors { get; set; }
    public virtual ICollection<EstablishmentNode> Offspring { get; set; }

}

While writing this up, hazzik posted an answer that is very similar to this approach. I'll continue writing up though, to provide a slightly different alternative. I like to make my Ancestor and Offspring gerund types actual entity types because it helps me get the Separation between the Ancestor and Offspring (what you referred to as Depth). Here is how I mapped these:

private class EstablishmentNodeOrm : EntityTypeConfiguration<EstablishmentNode>
{
    internal EstablishmentNodeOrm()
    {
        ToTable(typeof(EstablishmentNode).Name);
        HasKey(p => new { p.AncestorId, p.OffspringId });
    }
}

... and finally, the identifying relationships in the Establishment entity:

// has many ancestors
HasMany(p => p.Ancestors)
    .WithRequired(d => d.Offspring)
    .HasForeignKey(d => d.OffspringId)
    .WillCascadeOnDelete(false);

// has many offspring
HasMany(p => p.Offspring)
    .WithRequired(d => d.Ancestor)
    .HasForeignKey(d => d.AncestorId)
    .WillCascadeOnDelete(false);

Also, I did not use a sproc to update the node mappings. Instead we have a set of internal commands that will derive / compute the Ancestors and Offspring properties based on the Parent & Children properties. However ultimately, you end up being able to do some very similar querying as in hazzik's answer:

// load the entity along with all of its offspring
var establishment = dbContext.Establishments
    .Include(x => x.Offspring.Select(y => e.Offspring))
    .SingleOrDefault(x => x.Id == id);

The reason for the bridge entity between the main entity and its Ancestors / Offspring is again because this entity lets you get the Separation. Also, by declaring it as an identifying relationship, you can remove nodes from the collection without having to explicitly call DbContext.Delete() on them.

// load all entities that are more than 3 levels deep
var establishments = dbContext.Establishments
    .Where(x => x.Ancestors.Any(y => y.Separation > 3));

You could use supporting hierarchy table to do eager loading of unlimited levels of tree.

So, you need to add two collections Ancestors and Descendants, both collection should be mapped as many-to-many to supporting table.

public class Tree 
{
    public virtual Tree Parent { get; set; }
    public virtual ICollection<Tree> Children { get; set; }
    public virtual ICollection<Tree> Ancestors { get; set; }
    public virtual ICollection<Tree> Descendants { get; set; }
}

Ancestors will contain all ancestors (parent, grand-parent, grand-grand-parent, etc.) of the entity and Descendants will contain all the descendants (children, grand-children, grand-grand-children, etc) of the entity.

Now you have to map it with EF Code First:

public class TreeConfiguration : EntityTypeConfiguration<Tree>
{
    public TreeConfiguration()
    {
        HasOptional(x => x.Parent)
            .WithMany(x => x.Children)
            .Map(m => m.MapKey("PARENT_ID"));

        HasMany(x => x.Children)
            .WithOptional(x => x.Parent);

        HasMany(x => x.Ancestors)
            .WithMany(x => x.Descendants)
            .Map(m => m.ToTable("Tree_Hierarchy").MapLeftKey("PARENT_ID").MapRightKey("CHILD_ID"));

        HasMany(x => x.Descendants)
            .WithMany(x => x.Ancestors)
            .Map(m => m.ToTable("Tree_Hierarchy").MapLeftKey("CHILD_ID").MapRightKey("PARENT_ID"));
    }    
}

Now with this structure you could do eager fetch like following

context.Trees.Include(x => x.Descendants).Where(x => x.Id == id).SingleOrDefault()

This query will load entity with id and all of it descenadnts.

You could populate the supporting table with following stored procedure:

CREATE PROCEDURE [dbo].[FillHierarchy] (@table_name nvarchar(MAX), @hierarchy_name nvarchar(MAX))
AS
BEGIN
    DECLARE @sql nvarchar(MAX), @id_column_name nvarchar(MAX)
    SET @id_column_name = '[' + @table_name + '_ID]'
    SET @table_name = '[' + @table_name + ']'
    SET @hierarchy_name = '[' + @hierarchy_name + ']'

    SET @sql = ''
    SET @sql = @sql + 'WITH Hierachy(CHILD_ID, PARENT_ID) AS ( '
    SET @sql = @sql + 'SELECT ' + @id_column_name + ', [PARENT_ID] FROM ' + @table_name + ' e '
    SET @sql = @sql + 'UNION ALL '
    SET @sql = @sql + 'SELECT e.' + @id_column_name + ', e.[PARENT_ID] FROM ' + @table_name + ' e '
    SET @sql = @sql + 'INNER JOIN Hierachy eh ON e.' + @id_column_name + ' = eh.[PARENT_ID]) '
    SET @sql = @sql + 'INSERT INTO ' + @hierarchy_name + ' ([CHILD_ID], [PARENT_ID]) ( '
    SET @sql = @sql + 'SELECT [CHILD_ID], [PARENT_ID] FROM Hierachy WHERE [PARENT_ID] IS NOT NULL '
    SET @sql = @sql + ') '

    EXECUTE (@sql)
END
GO

Or even you could map supporting table to a view:

CREATE VIEW [Tree_Hierarchy]
AS
    WITH Hierachy (CHILD_ID, PARENT_ID) 
    AS 
    (
        SELECT [MySuperTree_ID], [PARENT_ID] FROM [MySuperTree] AS e
        UNION ALL
        SELECT e.[MySuperTree_ID], e.[PARENT_ID] FROM [MySuperTree] AS e 
            INNER JOIN Hierachy AS eh ON e.[MySuperTree_ID] = eh.[PARENT_ID]
    )

    SELECT [CHILD_ID], [PARENT_ID] FROM Hierachy WHERE [PARENT_ID] IS NOT NULL
GO

I've already spent a while trying to fix a bug in your solution. The stored procedure really don't generate children, grandchildren, etc. Below you will find fixed stored procedure:

CREATE PROCEDURE dbo.UpdateHierarchy AS
BEGIN
  DECLARE @sql nvarchar(MAX)

  SET @sql = ''
  SET @sql = @sql + 'WITH Hierachy(ChildId, ParentId) AS ( '
  SET @sql = @sql + 'SELECT t.Id, t.ParentId FROM dbo.Tree t '
  SET @sql = @sql + 'UNION ALL '
  SET @sql = @sql + 'SELECT h.ChildId, t.ParentId FROM dbo.Tree t '
  SET @sql = @sql + 'INNER JOIN Hierachy h ON t.Id = h.ParentId) '
  SET @sql = @sql + 'INSERT INTO dbo.TreeHierarchy (ChildId, ParentId) ( '
  SET @sql = @sql + 'SELECT DISTINCT ChildId, ParentId FROM Hierachy WHERE ParentId IS NOT NULL '
  SET @sql = @sql + 'EXCEPT SELECT t.ChildId, t.ParentId FROM dbo.TreeHierarchy t '
  SET @sql = @sql + ') '

  EXECUTE (@sql)
END

Mistake: wrong reference. Translating @hazzik code it was:

  SET @sql = @sql + 'SELECT t.ChildId, t.ParentId FROM dbo.Tree t '

but should be

  SET @sql = @sql + 'SELECT h.ChildId, t.ParentId FROM dbo.Tree t '

also I've added code that allows you to update TreeHierarchy table not only when you will populate it.

  SET @sql = @sql + 'EXCEPT SELECT t.ChildId, t.ParentId FROM dbo.TreeHierarchy t '

And the magic. This procedure or rather TreeHierarchy allows you to load Children just by including Ancestors (not Children and not Descendants).

 using (var context = new YourDbContext())
 {
      rootNode = context.Tree
           .Include(x => x.Ancestors)
           .SingleOrDefault(x => x.Id == id);
 } 

Now the YourDbContext will return a rootNode with loaded children, children of rootName's children (grandchildren), and so on.


I knew that there must be something wrong with this solution. It is not simple. Using this solution, EF6 require another package of hacks to manage a simple tree (fe. deletions). So finally I've found a simple solution but combined with this approach.

First of all leave entity simple: just Parent and list of Children is enough. Also mapping should be simple:

 HasOptional(x => x.Parent)
    .WithMany(x => x.Children)
    .Map(m => m.MapKey("ParentId"));

 HasMany(x => x.Children)
    .WithOptional(x => x.Parent);

Then add migration (code first: migrations: package console: Add-Migration Hierarchy) or in other ways a stored procedure:

CREATE PROCEDURE [dbo].[Tree_GetChildren] (@Id int) AS
BEGIN
WITH Hierachy(ChildId, ParentId) AS (
    SELECT ts.Id, ts.ParentId 
        FROM med.MedicalTestSteps ts
    UNION ALL 
    SELECT h.ChildId, ts.ParentId 
        FROM med.MedicalTestSteps ts
        INNER JOIN Hierachy h ON ts.Id = h.ParentId
) 
SELECT h.ChildId
    FROM Hierachy h
    WHERE h.ParentId = @Id
END

Then when you will try to receive your tree nodes from database just do it in two steps:

//Get children IDs
var sql = $"EXEC Tree_GetChildren {rootNodeId}";
var children = context.Database.SqlQuery<int>(sql).ToList<int>();

//Get root node and all it's children
var rootNode = _context.TreeNodes
                    .Include(s => s.Children)
                    .Where(s => s.Id == id || children.Any(c => s.Id == c))
                    .ToList() //MUST - get all children from database then get root
                    .FirstOrDefault(s => s.Id == id);

It all. This query helps you to get a root node and load all children. Without playing with introducing Ancestors and Descendants.

Remember also when you will try to save sub-node, then do it just this way:

var node = new Node { ParentId = rootNode }; //Or null, if you want node become a root
context.TreeNodess.Add(node);
context.SaveChanges();

Do it that way, not by adding children to root node.


Another implementation option that I've recently worked on...

My tree is very simple.

public class Node
{
    public int NodeID { get; set; }
    public string Name { get; set; }
    public virtual Node ParentNode { get; set; }
    public int? ParentNodeID { get; set; }
    public virtual ICollection<Node> ChildNodes { get; set; }
    public int? LeafID { get; set; }
    public virtual Leaf Leaf { get; set; }
}
public class Leaf
{
    public int LeafID { get; set; }
    public string Name { get; set; }
    public virtual ICollection<Node> Nodes { get; set; }
}

My requirements, not so much.

Given a set of leaves and a single ancestor, show children of that ancestor who have descendants that have leaves within the set

An analogy would be a file structure on disk. The current user has access to a subset of files on the system. As the user opens nodes in the file system tree, we only want to show that user nodes that will, eventually, lead them to the files they can see. We don't want to show them file paths to files they do not have access to (for security reasons, e.g., leaking the existence of a document of a certain type).

We want to be able to express this filter as an IQueryable<T>, so we can apply it to any node query, filtering out unwanted results.

To do this, I created a Table Valued Function that returns the descendants for a node in the tree. It does this via a CTE.

CREATE FUNCTION [dbo].[DescendantsOf]
(   
    @parentId int
)
RETURNS TABLE 
AS
RETURN 
(
    WITH descendants (NodeID, ParentNodeID, LeafID) AS(
        SELECT NodeID, ParentNodeID, LeafID from Nodes where ParentNodeID = @parentId
        UNION ALL
        SELECT n.NodeID, n.ParentNodeID, n.LeafID from Nodes n inner join descendants d on n.ParentNodeID = d.NodeID
    ) SELECT * from descendants
)

Now, I'm using Code First, so I had to use

https://www.nuget.org/packages/EntityFramework.Functions

in order to add the function to my DbContext

[TableValuedFunction("DescendantsOf", "Database", Schema = "dbo")]
public IQueryable<NodeDescendant> DescendantsOf(int parentID)
{
    var param = new ObjectParameter("parentId", parentID);
    return this.ObjectContext().CreateQuery<NodeDescendant>("[DescendantsOf](@parentId)", param);
}

with a complex return type (couldn't reuse Node, looking into that)

[ComplexType]
public class NodeDescendant
{
    public int NodeID { get; set; }
    public int LeafID { get; set; }
}

Putting it all together allowed me, when the user expands a node in the tree, to get the filtered list of child nodes.

public static Node[] GetVisibleDescendants(int parentId)
{
    using (var db = new Models.Database())
    {
        int[] visibleLeaves = SuperSecretResourceManager.GetLeavesForCurrentUserLol();

        var targetQuery = db.Nodes as IQueryable<Node>;

        targetQuery = targetQuery.Where(node =>
                node.ParentNodeID == parentId &&
                db.DescendantsOf(node.NodeID).Any(x => 
                                visibleLeaves.Any(y => x.LeafID == y)));

        // Notice, still an IQueryable.  Perform whatever processing is required.
        SortByCurrentUsersSavedSettings(targetQuery);

        return targetQuery.ToArray();
    }
}

It's important to note that the function is executed on the server, not in the application. Here's the query that gets executed

SELECT 
    [Extent1].[NodeID] AS [NodeID], 
    [Extent1].[Name] AS [Name], 
    [Extent1].[ParentNodeID] AS [ParentNodeID], 
    [Extent1].[LeafID] AS [LeafID]
    FROM [dbo].[Nodes] AS [Extent1]
    WHERE ([Extent1].[ParentNodeID] = @p__linq__0) AND ( EXISTS (SELECT 
        1 AS [C1]
        FROM ( SELECT 
            [Extent2].[LeafID] AS [LeafID]
            FROM [dbo].[DescendantsOf]([Extent1].[NodeID]) AS [Extent2]
        )  AS [Project1]
        WHERE  EXISTS (SELECT 
            1 AS [C1]
            FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
            WHERE [Project1].[LeafID] = 17
        )
    ))

Note the function call within the query above.


@danludwig thanks for your answer

I write some function for update Node, It work's perfect. My code is it good or I should write it in other way?

    public void Handle(ParentChanged e)
    {
        var categoryGuid = e.CategoryId.Id;
        var category = _context.Categories
            .Include(cat => cat.ParentCategory)
            .First(cat => cat.Id == categoryGuid);

        if (null != e.OldParentCategoryId)
        {
            var oldParentCategoryGuid = e.OldParentCategoryId.Id;
            if (category.ParentCategory.Id == oldParentCategoryGuid)
            {
                throw new Exception("Old Parent Category mismatch.");
            }
        }

        (_context as DbContext).Configuration.LazyLoadingEnabled = true;

        RemoveFromAncestors(category, category.ParentCategory);

        var newParentCategoryGuid = e.NewParentCategoryId.Id;
        var parentCategory = _context.Categories
            .First(cat => cat.Id == newParentCategoryGuid);

        category.ParentCategory = parentCategory;

        AddToAncestors(category, category.ParentCategory, 1);

        _context.Commit();
    }

    private static void RemoveFromAncestors(Model.Category.Category mainCategory, Model.Category.Category ancestorCategory)
    {
        if (null == ancestorCategory)
        {
            return;
        }

        while (true)
        {
            var offspring = ancestorCategory.Offspring;
            offspring?.RemoveAll(node => node.OffspringId == mainCategory.Id);

            if (null != ancestorCategory.ParentCategory)
            {
                ancestorCategory = ancestorCategory.ParentCategory;
                continue;
            }
            break;
        }
    }

    private static int AddToAncestors(Model.Category.Category mainCategory,
        Model.Category.Category ancestorCategory, int deep)
    {
        var offspring = ancestorCategory.Offspring ?? new List<CategoryNode>();
        if (null == ancestorCategory.Ancestors)
        {
            ancestorCategory.Ancestors = new List<CategoryNode>();
        }

        var node = new CategoryNode()
        {
            Ancestor = ancestorCategory,
            Offspring = mainCategory
        };

        offspring.Add(node);

        if (null != ancestorCategory.ParentCategory)
        {
            deep = AddToAncestors(mainCategory, ancestorCategory.ParentCategory, deep + 1);
        }

        node.Separation = deep;

        return deep;
    }

Need Your Help

Intellij ctrl+w equivalent shortcut in Eclipse

eclipse intellij-idea

In IntelliJ, if the editor cursor is in the middle of a word like this w|ord and you press Ctrl+W, it will highlight the whole word which allows you to highlight the whole word without moving the ...