TEEval - Trust-Worthy Evaluation of Private LLM on Private Datasets

Dataset

Add new dataset

Model

Add new model

GPU TEE

Secure evaluation environment for trustworthy benchmarking

Protecting both model weights and evaluation integrity during benchmarking

Secure

Leaderboard

Please select a dataset to view the leaderboard

Rank	Model	Status	Accuracy
1	GPT-4	Completed	94.2%
2	Claude 3	Completed	93.8%
3	Llama 3	Completed	91.5%
4	Mistral	Completed	89.7%
5	PaLM	Pending	-

Rank	Model	Status	Perplexity
1	GPT-4	Completed	14.2
2	Claude 3	Completed	16.3
3	Llama 3	Completed	18.7
4	Mistral	Completed	20.4
5	PaLM	Pending	-

Verify Evaluation Results

Upload a result file to verify the evaluation certificate

Upload Result File No file selected

Verification Certificate

Dataset

Please upload to verify...

Hash: Please upload to verify...

Model

Please upload to verify...

Architecture: Please upload to verify...

Hash: Please upload to verify...

Evaluation Results

Please upload to verify...

Certification

Evaluated on: Please upload to verify...

Certificate ID: Please upload to verify...

TEE Verified

Verified

TEEval

VERIFIED

How It Works

Trusted Execution Environment (TEE) on GPU

A GPU Trusted Execution Environment (TEE) provides hardware-level isolation for secure computations. This technology creates a secure enclave where sensitive data and models can be processed without exposure to the host system or other processes. Each evaluation runs in its own isolated environment, ensuring complete privacy and integrity of both private datasets and model weights throughout the entire evaluation process.

Seamless End-to-End Security with Zero Configuration

All communications with the TEE are secured through end-to-end HTTPS encryption across the TEE boundary. Unique certificates are generated within the TEE itself, ensuring that no external system can intercept your data. This approach provides recipient write-only access that guarantees no information leakage. The entire process is verifiable directly in your browser without any additional setup, protecting the entire data lifecycle from submission to results retrieval.

Plug-and-Play Evaluation with Modular Design

TEEval's flexible, modular architecture supports various datasets and model types with minimal configuration. The system seamlessly integrates with popular frameworks like HuggingFace and PyTorch, making it easy to evaluate your models regardless of their source. Our credential management system allows you to continue using familiar tools while easily transitioning to secure verification versions.