Calculating differences between datatable and temp store
I'm setting up a new system which is used for datatable comparison. Here's what happens:
My software queries a third party database in which we have no option to write data or alter the structure. The result is a DataTable. At this moment, this datatable is synced and then saved to disk as XML. The next sync, the query again returns a DataTable. We then load the stored datatable from disk and start calculating a diff: new, altered and deleted records. Only the diff records are synced and the stored XML is replaced by the latest query.
This works ok, but performance is poor for larger queries. So we want to switch and store the data in a SQL Server (2008 and up) database instead. For this we're looking for best practices. One thing we thought of is to calculate some sort of hash for a record and use that to check if the record was changed. The system has to work generic for all types of DataTables, so no rigid column structure.
Does SQL Server support such a thing? Or is it better to calculate the hash in code and store that into SQL? We figured it's not necessary to store the entire record; new records will be in the source table. For modified records it's enough to use the hash to find changes and for removed records it's enough to store the primary key of the record to be able to delete the record in the target system.
SQL Server allows you to define triggers on queries that INSERT, UPDATE, and DELETE records in the database. SQL Server always deletes and then re-inserts records instead of updating. It passes in two virtual tables to the trigger procedure: "Deleted" and "Inserted". Comparing the primary keys between these two will tell you which operation is happening.
I would recommend creating a transaction history table that inserts a record for each primary key passing through the trigger. Then you would later run a sync process to delete and re-insert the needed records for the synced DataTable and clear out the transaction history.