decentriq's Confidential ML Inference allows deploying machine learning models such that they can be used in a privacy-preserving way. When performing inference with Confidential ML Inference, the data and the model are provably kept confidential from all parties, including decentriq and any infrastructure provider.
Through Python APIs, Confidential ML Inference integrates seamlessly into workflows without compromising speed and scalability. This opens up fundamentally new ways for model owners to protect the privacy of their users while enjoying all advantages of a cloud deployment.
Confidential ML Inference should be used by companies who wish to deploy their ML models in the cloud and wish to offer their users the highest data privacy guarantees.
To perform inference, both model and input data must be on the same computer. In many cases however, the input data are privacy-sensitive and the model users are reluctant to share them.
Current options fall into two categories and both require one party to compromise on security and privacy.
Input data shared with model owner - The first option is for the data owner to share their input data with the model owner. This option has privacy risks for the data owner but also data breach risks for the model owner.
Model shared with model user - The second option is to deploy the model on the model user's premises. However, local deployments cause friction, are not scalable or may not be possible at all due to missing infrastructure on model user side.
decentriq offers a model inference framework based on confidential computing using the latest advancements in trusted hardware. Confidential ML Inference enables organizations to run ML inference with sensitive input data on public cloud infrastructure without ever exposing the model or the input data to any party, including decentriq and any infrastructure provider.
Model owners upload ML models through a dedicated Python API, data owners can obtain inference results by using a web application or a Python API as well. All models supported by the TensorFlow framework can be used. Inference execution time is similar to native non-GPU execution.