Proposal Details

Proposal #261

Passed

Proposal title

Verifiable Compute for Akash Network

Submit time

Deposit end time

Voting start time

Voting end time

Tally result

83.82%

Proposal #261 description

Introduction

Verifiable computing is an entire class of algorithms or systems, where a particular portion of the compute stack is verifiable/provable in a trustless manner to participants within a decentralized network. Verifiable computing can take many forms, including: Verifiable provisioning of hardware: This corresponds to the case where we desire to verify the nature and extent to which a piece of hardware is provisioned for the Akash network.

Specifically, if a 4090 GPU were to be incorporated in the Akash network, verifiable provisioning ensures that it indeed matches its hardware specifications, and it is genuinely allocated for functions on the Akash network.

Verifiable execution of program/software: This corresponds to the case where a program (any AI program, ranging from inference to training) is correctly executed on a node/set of nodes in the Akash network. For example, that a particular piece of code was executed correctly in a cluster of 4090s on the Akash network. Verifiable execution of programs/software also comes in multiple flavors, including: Non-real-time: An offline verification mechanism that presents a proof in non-real-time, where the proof has no time or size constraints. Optimistic, real-time proofs: An optimistic proof mechanism that can be verified or contested in (near) real time. Zero knowledge, real-time proofs: A zero knowledge proof mechanism (that does not reveal anything about the inputs but can still be verified, in (near) real time.

In this proposal, for the first year of this project, we focus on only the first type of verifiability: That of provisioning of hardware. After the completion of this first portion of the project, a further proposal will be submitted on non-real-time and subsequently, real-time verifiable computing within the Akash network. Please review the discussions on Github here.

Benefits to Akash Network

The need for verifiable provisioning of hardware is significant for a variety of reasons, including the elimination/reduction of Sybil attacks, and of other forms of misrepresentation and abuse in the network.

Verifiable Hardware Provisioning

Verifiable hardware provisioning can be achieved in a variety of ways: by using schemes uniquely associated with particular types of hardware, by using access patterns and footprints associated with a particular make and model, and other ways. However, these schemes are dependent on hardware configurations and do not necessarily generalize well. In order to develop a scalable, universal solution, we take a trusted enclave (trusted execution environment) approach as follows: Akash providers that intend to be “hardware verifiable” are equipped with a TEE, configured by Akash (such as Trusty [1], for more information on TEE, see tutorial [2]). Such a TEE contains a physically unclonable function (a PUF, see [3]) that can securely sign transactions. To ensure uniformity, this TEE will be designed to be a USB A/C dongle that can be attached to any hardware configuration.

We will verify that the USB A/C dongle can be attached to any hardware configuration and provide a detailed set of instructions to install and use this dongle to enable each provider to become “hardware verifiable” on Akash.

This TEE will periodically perform the following two tasks, based on an internal pseudo-random timer:

Identification task

Following a pseudo-random clock, the TEE will query every GPU in the specific Akash provider on its status and device-level details.

Provisioning task

Periodically and randomly, a random machine learning task will be assigned to the GPUs within this provider. These provisioning tasks are based on existing, well known benchmarks on the performance of GPUs to certain deep learning tasks, including particular types of models [4], more general deep learning models [5] and other tasks that are well known benchmarks on existing GPUs [6].

After the conclusion of each type of pseudorandomly repeated task, the TEE will securely sign the message, and will share the secure message with the Akash network.

The tasks are used to ensure the following properties:

Identification task

The identification task sets up the base configuration for each GPU cluster, and assigns a unique signature associated with the TEE with that cluster. As the identification is performed at the operating system level, it can potentially be spoofed, and therefore, the provisioning/benchmarking tasks are required.

Provisioning task

The provisioning/benchmarking tasks verifies the identification while simultaneously ensuring that the associated GPUs are dedicated for the Akash network and are not prioritizing other tasks. In case they are not provisioned for Akash network, they will fail the provisioning task.

A key point is that both the entire system (user, operating system) cannot differentiate between a provisioning/benchmarking tasks and a regular AI workload provided by the Akash network, and therefore cannot selectively serve a particular type of workload/task. This ensures that the GPUs are both correctly identified and are made available to Akash network-centric tasks at all times.

Team

The team for this project is led by Prof. Sriram Vishwanath from The University of Texas, Austin. Sriram Vishwanath is a professor at The University of Texas, Austin and Shruti Raghavan is a PhD candidate in Computer Science at UT Austin. They are working together with the Harvard Medical School and MITRE on the design of new foundation/base models in healthcare, with causal learning incorporated into such a platform.

Sriram Vishwanath received the B. Tech. degree in Electrical Engineering from the Indian Institute of Technology (IIT), Madras, India in 1998, the M.S. degree in Electrical Engineering from California Institute of Technology (Caltech, Pasadena USA in 1999, and the Ph.D. degree in Electrical Engineering from Stanford University, Stanford, CA USA in 2003. Currently, he is Professor in the Chandra Department of Electrical and Computer Engineering at The University of Texas at Austin, and recently, a Technical Fellow for Distributed Systems and Machine Learning at MITRE Labs.

Timeline

Open Discussions: Starting end of June 2024 Governance Proposal: Through first half of July, 2024 Design Phase: Through Q3 and Q4 2024 Hacknet TEE Phase: Q1 2025 Devent TEE Phase: Q2 2025 Conclusion of Hardware Provisioning testing and handover to Akash Team: End of Q2 2025 Note: This is subject to change based on feedback

Deliverables

Q3 2024 - High Level Design Q4 2024 - Design Specification Q1 2025 - Initial Hacknet Prototype Q2 2025 - Devnet and Conclusion of Testing

Budget

The tentative budget for this project is presented in the spreadsheet attached here.

The high-level breakdown for the budget is: R&D Costs (Student salaries + tuition + University Overhead): $146,547 Akash Computing/Hardware Costs: $75,000 Volatility and Liquidation Buffer (10%): $22,154.70

Total budget requested: $243,701.70 or 68,842.28 AKT Wallet Address: akash1sa5quyrpmf3l2acfrwgsy9t34yxpkvwrnqdmm0

Disbursement:

Disbursement will happen in two increments, coinciding with the few weeks before the beginning of each semester - Fall 2024 (on July 22nd 2024) and Spring 2025 (December 15 2024).

References

[1] Trusty TEE: Android Open Source Project https://source.android.com/docs/security/features/trusty [2] TEE 101 White Paper https://www.securetechalliance.org/wp-content/uploads/TEE-101-White-Paper-FINAL2-April-2018.pdf [3] Shamsoshoara, Alireza, et al. "A survey on physical unclonable function (PUF)-based security solutions for Internet of Things." Computer Networks 183 (2020): 107593. [4] Wang, Yu Emma, Gu-Yeon Wei, and David Brooks. "Benchmarking TPU, GPU, and CPU platforms for deep learning." arXiv preprint arXiv:1907.10701 (2019). [5] Shi, Shaohuai, et al. "Benchmarking state-of-the-art deep learning software tools." 2016 7th International Conference on Cloud Computing and Big Data (CCBD). IEEE, 2016. [6] Araujo, Gabriell, et al. "NAS Parallel Benchmarks with CUDA and beyond." Software: Practice and Experience 53.1 (2023): 53-80.

Proposal #261 overview

Total votes

2,338

Voters

2,323

Total deposit

1,000 AKT

Proposal #261 votes

#	Validator	Options
1	Stakewolle.com \| Auto-compound	Yes
2	Chainflow	Yes
3	Akash Af	Yes
4	Cypher Core	Yes
5	Bi23	Yes
6	Kalia Network	Abstain
7	chainvibes	No
8	Dorminik	Yes
9	c29r3	Yes
10	WeStaking	Yes
11	CLOSED - PLEASE REDELEGATE	Abstain
12	SpacePotato	No
13	europlots	Yes
14	Kahuna	Yes
15	0base.vc	Yes
16	Ping	Yes
17	EZ Staking	No
18	Bitoven	Abstain
19	Quasarch	Yes
20	Jormungand \| 0y	Yes
21	GATA HUB	Yes
22	Chorus One	Yes
23	Aurora Staking	No
24	Stakecito	Yes
25	Atomstaking	Yes
26	Arcturian Tech	No
27	Nocturnal Labs	Yes
28	Smart Stake	Abstain
29	Vitwit (Previously Witval)	Yes
30	Nodeasy.com	Yes
31	AutoStake 🛡️ Slash Protected	Abstain
32	ValidatorNode	Yes
33	Ariel Akash Insider \| Powered by NextNet.Works	Yes
34	cosmosrescue	Yes
35	Allnodes	Yes
36	Coinage x DAIC	Yes
37	Kleomedes	Abstain
38	Komichain.com	Yes
39	[Sunsetting, please redelegate] Informal Systems	Yes
40	strangelove	Yes
41	Foundry-USA	Yes
42	Praetor App	Yes
43	Blocc Dynamics	No
44	Neta DAO	Yes
45	Dora Factory Closed	Yes
46	Cloudmos	Yes
47	Easy 2 Stake	Yes
48	Chandra Station	Yes
49	16psyche	Yes
50	5.0 Validator \| Airdrop	Yes
51	ChainodeTech	Yes
52	Crypto and Coffee	No
53	PrithviDevs	Yes
54	Oldcat - airdrop DHK every month	Yes
55	Anonstake	Yes
56	Cosmonaut Stakes 🤖	Abstain
57	Army IDs	Yes
58	Cosmostation	Yes
59	DO NOT DELEGATE	Yes
60	Meria	Yes
61	Active Nodes	Yes
62	Stakin by The Tie	Abstain
63	WhisperNode 🤐	Yes
64	polkachu.com	Yes
65	Undelegate Please	Yes
66	Cosmic Validator \| Auto Compound	Abstain
67	ECO Stake 🌱 \| REStake.app	Yes
68	w3coins	Abstain
69	Stake Frites 🥩 🍟	Yes
70	Lavender.Five Nodes 🐝	Yes
71	Frens (🤝,🤝)	Yes
72	Nansen \| Deprecating - Please redelegate	Yes
73	POSTHUMAN 🧬 StakeDrop	Yes
74	Nodeify	Yes
75	Imperator.co	Abstain

View: