Storing user data: foreign keys or strings?

So you need to store user data in your SQL database. Do you store a foreign key to your identity table? An Active Directory (or other) login string? Their full name?

Let me first say that while I have some experience with audit records in various environments, this post doesn’t relate to anything specific. Rather, it stems from a discussion on the topic with some colleagues, so I thought it would be useful to publish my thoughts on the matter.

In my mind there are rarely any absolutes… and there are usually options, even when you think there are none :).

What about if data must be deleted? Are multiple versions being stored of your records? Do you need auditing? How detailed do the audit records need to be? What business rules, or legislative requirements impact the audit process? Would warehousing be required to provide workable audit reports? How does scale impact the architecture and audit requirements? What about non-repudiation requirements?

Take the windows audit log as an example. It stores the user name string of a failed login attempt. It doesn’t store the SID of the user, because it is not possible to be 100% certain that the SID will resolve.

Business rules or legislation may dictate that deletes must be recorded, but the content itself does not need to be kept. In some cases the data (not metadata) may actually need to be removed, while the metadata is retained. Some of these rules apply to legal document archives, where documents have destruction dates. Documents could either be physical or virtual, but the audit records for those documents are often kept beyond the life of the record itself. The record could easily be an SQL record (or records) that are deleted, in which case creating an audit record would make sense. If the records are entirely in SQL, in most cases using an ‘IsDeleted’ column is always a good option that provides a lot of flexibility.

I think there are circumstances where it wouldn’t be unreasonable to store minimal metadata information in the simplest form possible that is separate to ‘regular’ data. This may be a completely separate database environment, dedicated to audit only, where transactions (at the application or DB layer) simply funnel off a copy of certain metadata to the audit service. This could be for a number of reasons, for example if storage really is an issue (due to storage of BLOBs, or simply the number of records), or scale, or even non-repudiation. In the latter case, it may even make sense to store a checksum along with the audit record, in which case you certainly don’t want any foreign keys in that record unless you’re check summing the related data as well. Of course checksums and hashes are best when re-used, so if you’re relying on this blog post you probably shouldn’t be rolling your own solution :).

In the end it all comes down to business requirements, and whether or not you need to tackle scaling issues.

In the vast majority of cases, starting with the simplest data structure makes the most sense, and you would not implement these strategies from day one.

Have you implemented any complex auditing scenarios? Please post your experiences in the comments.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s