A better deletion approach than soft delete

A better deletion approach than soft delete


3 min read

You might have heard about soft deletes in backend applications. We add a column in DB table indicating whether the record can be marked for deletion. We still have those records in DB, just that we may not consider them. Seems like a clean approach at first glance with just one column change.

Now, lets consider the below use-cases:

  • You want to delete records, but also want to retain them for n number of days, just for a safer side against accidental deletion.

  • You want to exclude some records (permanent retain) under explicit requirements, even if they match the criteria of an eligible record to be deleted.

  • You don't want to delete the actual resource before returning the resource back. Basically, deletion before returning the value is not desired.

  • You don't want to modify existing table schema(s) to accommodate soft delete key.

  • You want the tables as loosely coupled as possible without having to worry about deletion logic.

You would immediately notice that soft delete keys can bring some bottleneck in the system if you're going to consider some of the above requirements. So, lets discuss the alternate approach.

Think about having two tables: deletion_data and deletion_exclude

deletion_data table keeps the data that are to be deleted. This table has schema-independent format. Primary key of this table would be {{tableName} +'-'+ {primaryKeyOftable}} . So, if Employee table has a primary key e1 , then the primary key would be Employee-e1. We can also have a column called deletion_schedule indicating the time at which the batch job will execute permanent deletion. This comes handy if you want custom handling in deletion. Then, there would be another column query_json which would have the command to insert the data (row) back to the original table. This is for the customised backup mechanism to work.

deletion_exclude table keeps the primary keys of data that are to be excluded for deletion. Primary key for this table is same as that of first table. When the deletion batch jobs runs on main table, it will check this deletion_exclude table and filter out the ones mentioned there. Remaining eligible ones are moved to deletion_data table for permanent deletion. There can be a separate batch job that will do permanent deletion on this deletion_data table alone.

Advantages of the above approach are:

1) No need to change the schema of all tables with soft delete column(s).

2) Deletion tables are schema-independent. So, it can be used for deleting any table.

3) In case of soft delete keys, we might need to use IN clause against main table when there are multiple values possible. This is a performance bottleneck. In our case, we just need to fetch primary keys from deletion_exclude table and then use it as a reduction function to yield the final data that are moved to deletion_data table.

4) Permanent deletion is moved away from the main table of the application.

5) In case the move operation (main table to deletion_data table) is failed and the record is lost, we can use the query_json column to get the query/command to get the data.

A potential disadvantage would be:

1) If main table schema is changed, then we need to make changes in data-restoration code. Because the command could be supporting only the previous script version.