-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential improvements for perfect replayability #615
Comments
If you're thinking about privacy, it's probably a good idea to check out this nice feature called "Sensitive Value Attribute" brought into PHP 8.2. I can easily see a lot of other languages adopting their idea in the near future. Example taken from: https://stitcher.io/blog/new-in-php-82#redact-parameters-in-back-traces-rfc. function login(
string $user,
#[\SensitiveParameter] string $password
) {
// …
throw new Exception('Error');
}
login('root', 'root'); Stack Trace: Fatal error: Uncaught Exception: Error in login.php:8
Stack trace:
#0 login.php(11): login('root', Object(SensitiveParameterValue))
#1 {main}
thrown in login.php on line 8 Note the |
Huh! I've occasionally seen some special type-system usage for untrusted user input, but I never seen it for sensitive data, and never considered making the language aware of it so it could do things like that. That's a really interesting notion! <thinking out loud> Marking them as sensitive would indeed let us strip them out of the logs, but that wouldn't quite let us do the recording. Some code paths might depend on the values of the sensitive data, I would think. But I might be wrong. Most sensitive data is things like SSNs, passwords, etc. which is pass-through most of the time. There's a related idea (not sure where) for replayability which will let us represent "opaque types" in the language, whose values are only moved around but never read for any calculations (cant add them, hash them, etc). Opaque types are elided from the recording, and because they never factor into any calculations, we can deterministically replay the entire recording without knowing their values. So, we could have a special Sensitive which basically acts like an opaque str. It could be complicated to completely strip this, because normally replayability records anything crossing the FFI boundary, and things will probably cross the network boundary before we can even put them into a Sensitive. I wonder if we could mark the entire incoming buffer as Sensitive, to move the boundary slightly and record anything coming out of it. Come to think of it, representing boundaries is literally what regions are for. Perhaps we can have a </thinking out loud> I think we can do something with this! In today's replayability, we record anything going from the Let's add a global "
This could let us run recordings on the client without recording any PII. There's some open questions about how deserializing functions will move data from Thanks a bunch @spartanatreyu for the intel and inspiration! |
To make replaying compliant with privacy in production: you could have a HTTP server by default run without tracing, but if you see a request with a special flag (indicating "this user is trying to help reproduce a bug, and fully consents to data recording as part of that)" come in, you enable recording from there, capturing all known state and also a trace from there until the request is done, where you stop recording, and output the file, to be sent off to a logging server
(will need to finish figuring out how to record certain areas / threads / time slices of programs)
When recording mode is letting a whitelisted FFI call go through, let's use a checksum to make sure that the incoming data is actually the same as when the recorded run happened.
(Thanks to 5225225 for these ideas!)
The text was updated successfully, but these errors were encountered: