Pgsodium: Modern cryptography for PostgreSQL using libsodium

michelpp · on June 13, 2020

Author here. I'd like to put it out there that pgsodium is relatively new work that hasn't seen any serious battle testing. At this point you should consider it more risky to use than pgcrypto, for now.

That being said pgsodium is far less code than pgcrypto, it contains no algorithms or magic numbers, it's a straight no frills interface to libsodium which is a very well tested library based on solid cryptography.

Also pgsodium has some features that pgcrypto doesn't have:

  - Public Key Signatures
  - Anonymous Sealed Boxes
  - Key Derivation Functions
  - Key Exchange
  - Server Managed Secret Keys

Feel free to drop me any questions here!

stblack · on June 13, 2020

This is HN, where commenters liberally espouse variants and competitors.

Props to you, the author of pgSodium, for being the first to mention pgCrypto.

The extra features of PgSodium look handy. Is featureset the main reason you created PG sodium? Or are there other reasons, like performance, or interfaces, for example?

New entrant risk aside, are there other reasons to maybe prefer pgCrypto at the moment?

michelpp · on June 13, 2020

I wrote pgjwt, a JWT implementation for Postgres, my primary motivation is that pgcrypto does not have public key signing, which is needed for JWT. Once I started experimenting with the one feature it sort of snowballed from there, I still haven't implemented public signing in pgjwt!

To keep using pgcrypto? Well its got a lot of history, so its easy to find examples of its use. The documentation is decent. And it comes with most cloud providers where you are generally restricted from installing extensions like pgsodium.

To not use pgcrypto? I'll use the words of a core PostgreSQL developer said recently on the pgsql-hackers mailing list:

"I'd strongly advise against having any new infrastructure depend on pgcrypto. Its code quality imo is well below our standards and contains serious red flags like very outdated copies of cryptography algorithm implementations. I think we should consider deprecating and removing it, not expanding its use. It certainly shouldn't be involved in any potential disk encryption system at a later stage."

Varriount · on June 13, 2020

> In general it is a bad idea to store secrets in the database itself

Perhaps my ignorance is showing but do hashed passwords count as secrets? Because those are usually stored in databases.

tialaramex · on June 13, 2020

I see lots of people are arguing that it's not a secret. Here's a thought experiment to try if you're considering taking that side of the argument:

If a web site you use offers all their password hashes as a free download, would their users be unaffected? Would you come to HN to write about how that's actually fine because those aren't secrets after all?

To be clear there are credentials that work that way. The RSA key your browser is automatically using to check this is https://news.ycombinator.com/ is public, it is right there for us all to see. Knowing it doesn't help you impersonate Hacker News.

More easily compared to your password would be U2F/ WebAuthn credentials. Here are my 100% genuine WebAuthn credentials for a particular web site, Base64 encoded, this is what the site stores in its database to authenticate me:

id: tmVUNeAST4JhA5HMN61Ddk1G0FVK+O8K3gY+/z/HE6WWyjny9cDCWY1LsmNcLP63qNdugU2itZRyBM1LNAJfFA1TZ1qxOoiqXMK7R3KqPg++UwIrdqdr0na4BWP2uPm1 public key: pQECAyYgASFYIGMcwjIjCbidUjT3hSEvZOAme++M+uQ1+I36am+GgiICIlgguso4MWdGHQRs82kcQJjaGN7Lf8NUHNYld+XktlD8hBg=

You can't do anything with that, you can't even use it on your own web site to authenticate me, it's useless

But passwords and even a password hash are not like that, they have to be kept secret or else an attacker gains a significant advantage.

systemvoltage · on June 13, 2020

And...you didn't explain the most important part of your ramble:

> they have to be kept secret or else an attacker gains a significant advantage.

Can you please explain? Why is this the case? Sorry to be snarky, but after reading your elaborate response, I was left thirsty.

tialaramex · on June 13, 2020

Without the hashes an attacker's only way to find out the password for an account is to guess. They need to try logging into the account with a guessed password.

Maybe they have a very good guess, and it'll be right first time, more likely they need a few tries, even thousands or millions of tries to find the right password.

Every time they try, that's another opportunity to detect the attack, to stop them or slow them down. That makes the attack less likely to be ultimately successful. Even if the system doesn't go out of its way to try to do any of those things, chances are you just can't do millions of guesses this way in a reasonable time.

But if they have a password hash for the account they can check as many guesses as they can afford compute power against that hash. Millions per second on a cheap home PC is common, maybe it'd be as low as thousands per second with the best possible choice of password hash, tuned to be aggressively expensive. Much more and "authentication" becomes annoyingly expensive for the legitimate system, so it is never done in practice.

If the password to be guessed is ten random alphanumerics from a decent password manager then it doesn't matter, neither approach gets the attacker in. But while my passwords are like that, my mother's passwords certainly aren't and neither are those of most normal users.

pvg · on June 13, 2020

The fact you can take faster guesses at them to try and recover the secret doesn't change the fact that they are, in themselves, not secrets. The whole purpose of the hashing is to make them not-secrets. They may be more sensitive in some way but they are not meaningfully 'secrets'.

tialaramex · on June 13, 2020

> The whole purpose of the hashing is to make them not-secrets.

I put that thought experiment there for a reason.

The choice to use password hashing is in effect an admission that they'll still be secrets. We are only using a specialist password hash because we know at least some fraction of our users will be using poor quality human memorable passwords. Less fUTz2uIHExKCHbLxMgNHWhnU and more Sup3rman.

If we believed users were choosing random alphanumerics like fUTz2uIHExKCHbLxMgNHWhnU then we don't need a password hash, you can just use a plain cryptohash like SHA256. It's safe, it's faster, nothing not to like about it. You can publish that hash, ain't no way anybody is going to reverse it to get the password back because there are too many possibilities to try.

Try it: cbc08cf5e1039d686879535794dc3616b020aed6ff92c52bdfe0f33360eb167b

But, we can't do that because we know real users aren't using such passwords. The users are picking human memorable passwords and so even after they've been stretched as much as we can stomach too many of them are still weak and so they are still secret.

pvg · on June 13, 2020

The thought experiment is bad for the same reason, which you appear to agree with. The hashes are not secrets, it's just that passwords are pretty bad secrets. If I gave you an RSA modulus in which you know one factor is chosen from a range of a few trillion primes, the public key could be broken if public. The fact that typical user-chosen passwords are iffy secrets doesn't make the hashes secrets as well.

The choice to use password hashing is in effect an admission that they'll still be secrets. We are only using a specialist password hash

No, as you say yourself, we'd still hash if the passwords were random strings. Which turns them into non-secrets. The specialized hashing is an admission that many passwords are guessable. It sorts of sounds like you want to forklift your own terminology into this using the fact that user-selected passwords can be guessed.

tialaramex · on June 13, 2020

> If I gave you an RSA modulus in which you know one factor is chosen from a range of a few trillion primes, the public key could be broken if public.

And so public CAs are obliged to forbid such practices to the extent they're able to detect them. As a result on the whole users who try to use crap moduli get denied.

Go try it, mint yourself an RSA pair such that the public modulus is 11 x n or something easy, and ask Let's Encrypt to issue for that key in a name you own, they'll refuse explaining that your proposed key is crap. Here's a nickel kid go buy yourself a better semi-prime.

In contrast the vast majority of password protected user accounts don't even check Pwned passwords. I can sign up for a lot of stuff literally using Sup3rman as my password.

> The fact that typical user-chosen passwords are iffy secrets doesn't make the hashes secrets as well.

Yes, in practice it does.

> as you say yourself, we'd still hash if the passwords were random strings. Which turns them into non-secrets.

That works on random strings, but it doesn't work on non-random strings.

No amount of processing can magically make the non-random strings be random and so Argon2id("Sup3rman") is going to need to be treated as a secret or else any fool can just try the most popular passwords and reverse it.

e12e · on June 13, 2020

Yes, it does. If I have a properly salted/hashed password dump, I can easily check common passwords against all users.

Assume 5m users. The odds of finding any user with password: "Password2020" is pretty high.

But you'd need 5m requests against the service to do the same thing.

There's no rate limit on local, offline attacks.

Are salted hashes better than, hashed, or plain text passwords? Yes!

But there's still more attacks possible with access to the data.

This gets much worse, if the password/secret is limited, say a 4 digit pin - or a social scurity/government id number with error correcting code (eg the Norwegian government ID is 11 digits, but 2 are error correcting, and 6 are a date - so it's pretty much impossible to store them hashed in a secure fashion. You only need to try 365.25 * ~60 years * 1000 numbers to find/recover all - a far cry from 10^11 - which is pretty low anyway).

If the passwords are all truly random, well over 64 bits of entropy, and never re-used and/or leaked... then properly hashed, a dump would not leak much information.

But there'll always be re-use, and at that point, exposing the salted hash along with a linked login name will expose information (of course, sometimes the login will be the same as well - but it's still valuable to know login is possible before trying to login to a service).

pvg · on June 13, 2020

You should perhaps read the rest of this thread if you haven't had a chance since you seem to be rehashing (whee!) an argument that was just had.

michelpp · on June 13, 2020

Yes they are secret, which is why databases usually don't store them, they store a salted hash. The secret , the key in a sense, is in the users head, or password manager I guess these days. An attacker cannot get the password from a correctly hashed and encrypted password because that information is not in there.

This is different than storing a key in the database that can turn encrypted data into unencrypted data. This is what I meant by not storing secrets. Instead of storing keys, pgsodium use encourages storing key ids (of a given length in a context). These keys can be used to reveal plaintext, so they shouldn't be stored but only derived.

So the trick to avoiding storing keys is to store the key id and derive the key, being careful not to leak it in the logs. This can be done by passing it directly from pgsodium_derive() straight into a function that takes a key. I'm considering adding a "mirror" api that doesn't take bytea keys but instead only key ids and always derives. I'm still kicking that idea around.

But your point is good, I should be less cavalier with my definitions of secret and key. I'm working on improvement for the "encrypted column" section that will explain more of this, and show how pgsodium can work in hand with Row Level Security. Stay tuned.

Varriount · on June 13, 2020

Ah, I should have added "salted" to my original description.

What are your opinions of the `crypt` and `gen_salt` functions from the pgcrypto module[0]?

https://www.postgresql.org/docs/current/pgcrypto.html#id-1.1...

michelpp · on June 13, 2020

They're just fine, they do the job. Note that in PostgreSQL 10 they updates the user password storage and challenge mechanism to SCRAM

https://info.crunchydata.com/blog/how-to-upgrade-postgresql-...

paulryanrogers · on June 13, 2020

Passwords usually get hashed in app layer so DB never has the key. In the same way encryption is usually done entirely outside DBs as they tend to leak or get breached. So the ciphertext without any potential access to keys is generally acceptable for storing in a DB, ideally isolated for one purpose.

systemvoltage · on June 13, 2020

This is the correct answer. Hashing can happen at many different layers in the app or even as a ephemeral Postgres function call in the database. Storing hashes is not storing secrets. If one has to worry about exposing the lock to public, then there is something wrong with the security of that lock. Use a better lock.

aapeli · on June 13, 2020

Correctly hashed (with salt and a memory+time hard hash) passwords are taken to be brute-force hard to crack.

In that sense it's as safe to publish such hashed passwords on the internet, in the same way a website's public key is published on the internet. In fact, it's good practice to set hash parameters such that it's slower to brute-force passwords than asymmetric keys (e.g. TLS certs).

However, the big difference is that TLS private keys are randomly generated, and of a fixed length, whereas passwords are user chosen. So an attacker could do a dictionary attack and probably uncover a number of passwords using that (e.g. just try out "password" on all the hashed passwords). Hashed passwords are only as hard to crack as the passwords themselves.

jopsen · on June 13, 2020

Publishing a salted hash (with if it's memory+time hard hash) is the same as allowing unlimited login attempts.

Limiting login attempts by ip, username, and time is the best way to mitigate attacks.

Even a weak password is hard to crack with 5 attempts per day :)

hanche · on June 13, 2020

Given a salted hash, you can test passwords many orders of magnitude faster than you can do online. As some attackers can control a botnet, limiting attempts by ip has limited value. If you limit by username and time, you open the door to a denial-of-service attack: I could lock you out of your account by simply trying to log in as you repeatedly.

There are few easy answers in security.

Nzen · on June 13, 2020

Presumably, one could retort that pragmatic practice isn't the optimal practice. Which is to say, if a service could derive authorization without secrets, that would present a smaller security attack surface than one that needed plain or hashed secrets (assuming a risk profile that includes data exfiltration). The service could, for example, outsource that to an oauth provider [0], albiet at the cost of accepting the provider's level of service as it's absolute maximum.

[0] https://blog.codinghorror.com/the-god-login/

Others · on June 13, 2020

I feel like hashed passwords are somewhere in between “completely okay to publish” and secret.

With good crypto, it should be difficult to make a password hash useful even if it is public. (Although weak passwords can make this worse if salts are also made public.) However as part of a defense in depth strategy, you probably don’t want to release them publicly.

I feel like there are different levels of secrecy at play here. I think the article means things more like ssh keys or api keys specifically.

e12e · on June 13, 2020

No, the purpose of the hash is to be able to verify a password. It has to be reasonably quick - it's in the login path.

This is different from a public key - its unfeasible to derive a secret key from a public key.

Its by design trivial to verify a correct password guess against a salted hash.

In general, you won't have a lot of candidate secret keys to try against public keys - but all you need to get candidate passwords is to offer up a service that "check if your password is secure/compromised/etc".

You might not crack root@box,but can easily verify that you have access to ceo@box...

orev · on June 13, 2020

Properly hashed/salted passwords are considered safe to store in a database. You still wouldn’t want them to be stolen, but they are not in plaintext. There are other times when you do need to store secrets that you need to be able to decrypt so you can use them.

ithkuil · on June 13, 2020

It all depends on two factors:

a) minimal required entropy of the passwords themselves.

b) how computation expensive is the hash function itself.

The threat model in password bashing scenario is that the attacker wants to discover the password and then use it to impersonate the user.

Thus the attacker needs to guess the password, by searching though the space of all possible passwords (using heuristics such as dictionary, common letter/number replacements, common position of symbols mandatory in many password policies) and the the candidate password needs to be hashed and compared to the known good hash.

By using an expensive hash function such as bcrypt you can limit the ability of exhaustively search the space of all possible passwords. But if a user uses a weak password, the attack will succeed and leaking the hashes will make the search go several orders of magnitude faster than online brute forcing (I.e real login attempts, which can be throttled)

e12e · on June 13, 2020

c) the fact that in the real world, some users will re-use "secure" passwords.

ithkuil · on June 13, 2020

In the real world this reduces password entropy, thus it falls under (a) :-)

EDIT: perhaps it would be easier to just think of it in terms of predictability. Often tech jargon gets in the way.

e12e · on June 13, 2020

Yes and no. You could argue the infamous Debian openssl bug reduced entropy [1], but it really is about compromised keys. Or in this case compromised passwords.

[1] in a sense, this is of course obviously and literally true (that it is about entropy). But I think the more interesting distinction is between leaked secrets, and "bad" secrets - in that light the "Debian keys" all form a group of "leaked" secret keys. There's nothing inherently wrong with the generated keys - but you'd have to improbably unlucky to "accidentally" generate any given compromised key using a good rng. In the same way, the password "TGSCfkL$seG2@tn5DowiFJ$nPj%KW#vMT" that bitwarden just generated for me isn't inherently bad - until now, that it's been published/leaked.

So I'm not sure I agree that it's all/only about entropy.

I think it illustrates well that secrets and especially those managed by people (such as passwords and passphrases) have systemic component: the system seen as a whole incorporates users, other systems as well as any one given system (Gmail is affected by leaked Hotmail accounts, and vice-versa).

ithkuil · on June 14, 2020

I'm not disagreeing.

I just wanted to highlight to to break a password you need a) to guess it b) to verify it.

Leaking the hashes affects (b)

eyelidlessness · on June 13, 2020

Nothing is considered safe, and everything is considered borrowed time if you're storing encrypted data. Retrievable data isn't any less encrypted than hashes, it's just a bidirectional encryption.