For many businesses that won't be straightforward, it will be rocket science.
Even if your business does have the understanding and technical ability to do it, it adds a whole extra layer of complexity and unreliability to what used to be simple, plain text logs -- exactly the kind of information you probably need to access quickly and reliably if you're in the middle of fixing a major fault, for example.
> Even if your business does have the understanding and technical ability to do it, it adds a whole extra layer of complexity and unreliability to what used to be simple, plain text logs -- exactly the kind of information you probably need to access quickly and reliably if you're in the middle of fixing a major fault, for example.
But do you need to keep them forever? If you delete logs after 30 days you are unlikely to be impacted anyways.
We have server logs going back for years. Moreover, those older server logs have provided valuable information on several occasions for detecting abuse of our systems, attempted fraud, etc, so they are demonstrably useful for legitimate business purposes.
We also have backups going back on a staged basis, more frequent backups from the recent past, less frequent going back further. This has been useful more than once for retrieving older information that someone had accidentally modified or deleted and not noticed immediately, so is also demonstrably useful for legitimate business purposes.
Both of these appear to be at risk of conflict with the right to erasure under GDPR, in that for example old backups of emails will inevitably contain customer correspondence that can't readily be isolated.
How can you possibly know that without knowing anything about our business, what our logs contain, or what kinds of threats we face where evidence from the logs supports us in detecting abuse, countering formal disputes, or even legal proceedings?
The trouble is that the definition of PII could be interpreted so broadly as to include almost anything useful ever logged on a server, because if a log record references any data that could be associated with a specific individual, including in combination with other data, then it counts. Given what we already know about de-anonymisation of supposedly anonymous data sets, even just based on quite simple patterns and correlations in the data, any approach based on pseudonymisation is likely to be simplistic and open to challenge.
In any case, what is reasonable for a business to want to do with that sort of data? Can we analyse which content has been most popular on a web site over the past week/month/year/decade? Can we analyse which content a particular paying customer has been accessing, in order to promote a new plan as they approach a limit or suggest a more cost-effective one if they aren't using what they're paying for right now? Can we analyse access patterns for an account over a period of time to detect shared use of an account contrary to our terms? There are plenty of legitimate business purposes for which you might want to know the entire history of access for a particular account and which do not violate the privacy of the user in any unjustified or unfairly exploitative way, but again, any method based on log pseudonymisation will immediately fail to satisfy those requirements.
Nothing about this discussion is even remotely as clear and straightforward as some in this discussion are suggesting.