Does the end justify the means?
We all, without exception, have suffered from AI invading our privacy.
It’s a silent attacker, but it’s present in almost everything we do nowadays.
Sadly unbeknownst to many, AI controls social media, and social media controls and drives our lives, to a point that it knows us better than we know ourselves.
It knows our secret desires, our secret personality traits… even before we even acknowledge some of them.
For years, it’s had a completely free entrance to the deepest secrets of our lives, things not even our closest friends or even family know.
Suddenly, now the world has decided that privacy must be protected at all costs.
About time!
Consequently, a series of regulations across the world are putting into serious hazard the progress of AI, a technology that needs data more than anything.
AI, without data, is basically nuttin’.
Nuttin’!
Inevitably, AI, after a somehow mildly successful year in 2022, has arrived at a crossroads, and it needs to act quickly if it wants to continue to disrupt our world the way it has been promoted.
The answer to its problems? Surprisingly, the answer is cryptography.
Privacy is non-negotiable
“The end justifies the means.”
— Niccolo Machiavelli
This now immortal quote is attributed to the historical figure of Machiavelli, a man that influenced many of the most powerful men and women in human history, with clear examples like Napoleon.
Considering that the man who wrote that now immortal quote thought that rulers should be “brutal, calculating, and, when necessary immoral”, I feel pretty confident in myself when I claim that the end, more often than not, doesn’t justify the means.
AI? Great, but privacy-centric
AI shouldn’t be allowed to progress if that means stamping over our rights.
It’s time to grow AI, but not without control. However, this is not a simple matter.
AI needs your data to know you and thus train algorithms that learn from us, to serve us — or to manipulate us, depending on the use case.
Consequently, we find ourselves in an utter contradiction: How do we allow AI to grow if we put hurdles on the element — data — that allows it to progress?
Luckily for us, the solution is now in the palm of our hands.
The power of proving without showing
Imagine you’re a research scientist trying to cure cancer.
For whatever reason, you’ve made an incredible discovery by which, with certain patient data, you could potentially predict cancer in its very early stages and, thus, revert it from spreading or at least slow it down.
Sounds enticing, right? Well, unsurprisingly, the element that can allow for this is none other than data.
But there’s a problem.
Patient data is protected by very strict confidentiality clauses. For that matter, hospitals can’t simply share patient data like candy. Is not surprising, then, that this data remains siloed in the hospital’s data centers.
This fact, inevitably, hinders the capacity of AI to disrupt the healthcare sector, a sector that has enormous potential for AI-based use cases that would save lives around the world.
Up until now, this problem was unsolvable. How do we allow to leverage this data without compromising the right to privacy of patients?
Please let’s not get blinded by Machiavelli’s quote, humans have every right to not disclose their illnesses to society.
Therefore, privacy is non-negotiable, and private data can’t be used to train the AI models that, ironically, could potentially save the lives of those patients whose privacy we’re protecting.
Death for the sake of privacy seems like nonsense, but let’s not forget that we haven’t got certainty that these models would actually work.
Thus, the more suitable dichotomy is “Surrendering privacy for the sake of life… maybe?”.
As no evidence can clearly suggest that the AI models would work (+90% of AI models fail to achieve the expected results) we can’t forsake privacy in return for a “maybe”.
Naturally, this problem is unsolvable unless we might a way to treat data while preserving privacy.
Seems impossible, right?
Well, it isn’t, anymore.
The case of federated learning
At first, one can think about federated learning.
With federated learning, we are capable of training AI models in a delocalized manner so data doesn’t need to be shared.
However, federated learning has several disadvantages that seriously cripple its capacity to deliver.
Model consolidation of weights and parameters has still to be done in a centralized manner, and the different research teams have to trust that other teams are properly executing their training (teams can tamper data to trick results, manipulate weights, etc.).
This unsurprisingly leads to discrepancies and power battles between researchers, as a trust-based working model will always make people skeptical of what others are doing.
Needless to say, researchers are tempted to tamper with their models to achieve greater success among their peers.
But, what if we had a way to ensure homogeneous execution by enabling trustless environments where teams could collaborate with each other in a decentralized manner, or in an even more powerful scenario, being capable of sharing that data among parties while protecting privacy?
The answer is zero
Zero-knowledge proofs are a cryptographic primitive that allows proving, with high certainty, the veracity of a statement, without showing any additional information besides the fact that the statement is true.
In other words, is managing to convince an entity about a certain statement being true, while not revealing any other information.
Quickly, a question comes to mind, how can we prove something without showing why it is true?
Let’s see this with a quick example:
You have two pens, a green one and a red one. They are identical in everything else; form, shape, touch, weight, etc, but one thing; they can be differentiated by color.
You want to prove to me that, besides their uncountable similarities, they are actually different.
But there’s a catch, I’m daltonic.
Thus, I have no possibility to identify they are different pens because I am incapable of seeing that they are different colors.
Consequently, the only way I could know with full certainty of their difference is if you told me that they are, indeed, different colors. Naturally, you’re inclined to think that this is the only way you can convince me they are different, right?
Well, you’re wrong. You can make me play a game.
You give me the two pens and tell me to put each one of them in a hand and put both hands behind my back. This way, I have the capacity to switch pens in my hands without you seeing the switch.
As I am missing a critical piece of information, right now I am convinced they are the same and that this will simply be a guessing game for you.
I show my hands and you automatically detect I’ve switched pens between hands. I’m intrigued, but nevertheless not convinced.
There was a 50% chance you would guess my switch, right? Automatically, you offer to play the game again.
The result?
This time I didn’t switch the pens, but you managed to guess right. Now, I’m starting to feel annoyed, as now you had a 25% of guessing right and you still managed to guess right.
We play the game five more times.
All five times you’ve guessed right, one time after the other.
Now I’m actually amazed, as you’ve guessed the correct result seven straight times, each with a 50% guessing chance, which means that, if you were to be guessing, the chances of you guessing right seven straight times were 0.78%.
Therefore, although we can’t reach full certainty, you have somehow convinced me that those two pens are different, without revealing why they’re different.
And that is a zero-knowledge proof, the capacity to prove a statement with an extremely-high certainty, without revealing any other information besides the fact that the statement is, indeed, true.
Understanding this, and comprehending the actual limitations of AI, how can we bridge both concepts to achieve the desired synergized outcome?
The options are limitless
With zk-proofs, suddenly AI has the tools to protect privacy while still training the models in a business-as-usual style.
Let’s see this with a handful of examples:
- Zero-knowledge proofs can allow multi-team projects to collaborate in a trustless environment, as each team can train their data separately and include a zk-proof that proves to the rest of the teams that the model has been trained accordingly, and with very high certainty that results haven’t been tampered with.
- Data can be shared among silos by completely anonymizing it while including zk-proofs that give high certainty that the data is, indeed, real. This is particularly important because data anonymity nowadays more often than not results in a heavy loss of data granularity that affects model performance. With zk-proofs, data can be shared with high-level granularity but anonymized without fear of it being false.
- Zk-proofs also allow training data while storing the important parameters in a blockchain. Blockchains are expensive to store data, but you can include the most relevant data with the highest security requirements on-chain, while outsourcing execution off-chain. While including a zk-proof, you can verify that off-chain computations have been performed according to the necessary standards while avoiding having to store and execute data in the blockchain.
- Zk-proofs can also be used with Fully Homomorphic Encryption, a technique that allows data to be treated and used for training while remaining encrypted, to allow for highly-confidential data to be shared among separate teams in a safe environment.
This revolution will allow technology to progress with ease while throwing down the barriers that ethics imposes for technology.
All Comments