Maybe the DELETE command should be banned from SQL. It was probably invented when the disks were tiny and it might not be suitable for our times, especially nowadays when we talk about things like BIG DATA.
But, one can argue that not deleting can be dangerous, too! One can claim that by refusing to delete we might end up with a lot of old, duplicated, bad quality, uncleaned, irrelevant and untrustworthy data.
Maybe the worse thing that can happen to data is to refuse to delete it enough. Perhaps because of this lack of enough deletes and fear of deletes we end up with all the 'bad data' found in some database systems today.
Whatever you choose to do, delete or not delete, one thing's for sure and that is that you should know what you are deleting or not deleting. Choosing confidently what to delete comes from understanding and knowing your data model, and the implications a delete will have on the data model and never because the data is not required for your project!
5 comments:
A lot of stored personal data is illegal.Sexual orientation for eample. What about dead customers or closed accouts in social networks ? I think, we need delete and bulk delete commands, because a lot of data models are 'old school'. No event/actor modell, no encrypted history data partitions,...
Best regards Karsten
I have worked on systems which don't need any of the data from 3 days ago. It is totally irrelevant and pretty much useless. All value has been mined and extracted by them. However, the system needs to be fast. Very fast - tens of millions of transactions every day, with a total end-to-end transaction time of less than 25ms. That's client-to-webserver-to-db-and-back. Deleting data keeps the database very small and the system very fast. Still think delete is useless? I don't.
I can see why, in some system, you might want to keep everything. But you would need to design such that "deleted" data is logically (and maybe physically) separate from non-deleted data otherwise it'll just get in the way, slow things down.
ChandlerDBA
I agree with you in most of the cases.
Information Lifecycle Management (ILM) comes to mind straight away. Where you age out old data on cheap disks elsewhere but still accessible.
I understand tuning can be hard on big sets of data, but on the other hand with In Memory databases nowadays most of tuning on I/O becomes irrelevant. You have a point, you need to look at the database from 2 angles still. Is it OLTP transaction based full ACID system? Is it DSS a data warehouse? Then you decide your architecture accordingly.
In my post I just wanted to look at it from theoretical/logical point of you. Should I delete or not? Without at all being constrained by physical limits. The only limit is hardware! :-)
Thank you for your comment.
Best Regards
Kubilay
Karsten
What do you mean data-modles are 'old school'? Can you explain a bit?
Algebra is old school but it still works.
Many thanks for your comment.
Best Regards
Kubilay
Post a Comment