Thursday, January 05, 2006

Re: st: Protecting data sets

On Jan 5, 2006, at 8:30 AM, Fred Wolfe wrote: > We sometimes loan our set of research stata dta files and programs > we have developed to investigators to work with. We have some > concern that the files could be passed on to others that shouldn't > have access to them or that they might just sit on a computer and > be accessible to other. It occurred to us that one way to protect > files would be to have a pass word and date requirement. For > example, after 6 month the dta files would not be accessible > without a (changing) password (which we would have to supply). I > wonder if anyone else has thought about issues such as these or if > Stata Corp has some plans or ideas on the subject.

In order to share sensitive files with another individual, you must trust that individual to handle those files properly. This includes disposing of them securely when he or she is finished using them. The expiration date you suggest could be easily subverted in a number of ways (e.g., one could simply read the data into memory and then save a copy, or alternatively, simply open a log file, list the data, and then read it back out of the log).

Sharing files via symmetric encryption (i.e., using a common password) is inherently problematic. First, the password itself must be shared (and stored) in a secure manner, and this can be difficult to do. Second, you need to generate a different password each time you want to share with a different person or set of persons. These problems are avoided with public key (or asymmetric) encryption. We use public key encryption extensively to store and distribute data files securely for several collaborative projects.

Take a look at From there you can download GnuPG -- a widely used open source program for public key encryption. To learn more about public key cryptography, you might try the "Links" page on that site and/or Wikipedia. If you're instead interested in a commercial solution, see

-- Phil

P.S. One drawback with using gpg to encrypt data files is that you must decrypt them manually each time you want to use them, and then remember to wipe the decrypted version when you're done. A possible way to facilitate this would be for -use- to recognize that a file is encrypted and then call gpg automatically to decrypt it (prompting the user for his or her passphrase), piping the decrypted stream directly into Stata (this way, a decrypted version of the file is never written to disk and thus you don't have to worry about remembering to delete it). I believe GnuPG can be built on all platforms supported by Stata. Perhaps I'll raise this as a "wish" at the next Users Group Meeting.

-- Phil

* * For searches and help try: * * *


Links to this post:

Create a Link

<< Home

This page is powered by Blogger. Isn't yours?