Home / News / Bounty - EVM OpCodes and Precompiles in Motoko
Austin Fatheree, February 09 2024
These bounties were funded by the IC community through the Gitcoin Grants Season 19 initiative. Thank you all for your contribution.
We are funding the creation of a set of motoko EVM opcodes and precompiles. Implementation of Ethereum Virtual Machine (EVM) opcodes in motoko is an educational and potentially foundational exercise. The benefit to the broader ecosystem are first allowing motoko canisters to simulate transactions and longer-term to build a motoko EVM that can be used to simulate other networks or bootstrap new, purpose built EVMs.
Please consult this list of opcodes and this list of precompiles for more information about EVM Opcodes and Precompiles. For more information on the implementation of each opcode and precompile
We do not currently have a fully functioning EVM in motoko. For this bounty we will assume the following Execution Context as necessary to complete the Bounties. In the future a full EVM engine can populate these:
import Stack "mo:base/Stack";
type Address = Blob;
type Byte = Nat8;
type Word = Nat; //256-bit for EVM compliance. Op codes will need to validate that results do not exceed 256-bit numbers and take overflows into consideration
type OpCode = (Nat8,?Blob); // Considering opcodes range from 0x00 to 0xFF. plus a set of bytes that can be included
// A simplified representation of the stack element in EVM.
type StackElement = Nat; // May need to represent 256-bit integers.
// A simplified structure for representing EVM memory.
// uses https://github.com/research-ag/vector
type Memory = Vec<Byte>;
// Represents the EVM storage, mapping 32-byte keys to 32-byte values.
type Storage = Map<[Nat8], [Nat8]>;
type LogEntry = {
topics: Vec<Blob>; // Topics are usually the hashed event signature and indexed parameters
data: Blob; // Non-indexed event parameters
};
type Logs = Vec<LogEntry>;
type StorageSlotChange = {
key: Blob; // Storage key, typically a 32-byte array.
originalValue: ?[Nat8]; // Optional, represents the value before the change. `None` can indicate the slot was empty.
newValue: ?[Nat8]; // Optional, represents the value after the change. `None` can indicate a deletion.
};
type CodeChange = {
key: Blob; // Storage key, typically a 32-byte array.
originalValue: Array<OpCode>; // Optional, represents the value before the change. `None` can indicate the slot was empty.
newValue: ?Array<OpCode>; // Optional, represents the value after the change. `None` can indicate a deletion.
}; //code may not be changeable...only deletable
// The execution context of an EVM call.
type ExecutionContext = {
origin: Blob; //originator of the transaction
code: Array<OpCode>; // Array of opcodes constituting the smart contract code.
programCounter: Nat; // Points to the current instruction in the code.
stack: Stack.Stack<StackElement>; // The stack used for instruction params and return values.
memory: Memory; // Memory accessible during execution.
contractStorage: Storage; // Persistent storage for smart contracts.
caller: Address; // Address of the call initiator.
callee: Address; // Address of the contract being executed.
currentGas: Nat; // Amount of gas available for the current execution.
gasPrice: Nat; // Current gas price.
incomingEth: Nat; //amount of eth included with the call
balanceChanges: vec<({
from: Blob;
to: Blob;
amount: Nat;
})>; //keep track of eth balance changes and commit at the end. Each new context will have to adjust balances based off of this array.
storageChanges: Map<(Blob, StorageSlotChange)>;
codeAdditions: Map.Map<Blob, CodeChange>; //storage DB for EVM code stored by Hash Key
blockHashes: vec<(Nat,Blob)>; //upto last 256 block numbers and hashs
codeStore: Map.Map<Blob, Array<OpCode>>; //storage DB for EVM code stored by Hash Key
storageStore: Trie.Map<Blob, Blob>; //storage DB for Contract Storage stored by Hash Key. CALL implementors will need to keep track of storage changes and revert storage if necessary.
accounts: Trie.Trie<Blob,Blob>; //a merkle patricia tree storing [binary_nonce, binary_balance, storage_root, code_hash] as RLP encoded data - the account bounty hunter will need to create encoders/decoders for use with the trie - https://github.com/relaxed04/rlp-motoko - https://github.com/f0i/merkle-patricia-trie.mo
logs: Logs; //logs produced during execution
var totalGas : Nat; // Used for keeping track of gas
var gasRefund : Nat; // Used for keeping track of gas refunded
var return : ?Blob; set for return
blockInfo: {
number: Nat; //current block number
gasLimit: Nat; //current block gas limit.
difficulty: Nat; //current block difficulty;
timestamp: Nat; //current block timestamp
coinbase: Blob;
chainId: Nat;
};
calldata: Blob; // Input data for the contract execution.
};
Your op code functions should take in the execution context as in input variable and update it as is demanded by the op code.
This document categorizes and lists the opcodes in numerical order within each category, providing a structured schema for implementation in Motoko programming language.
01
ADD02
MUL03
SUB04
DIV05
SDIV06
MOD07
SMOD08
ADDMOD09
MULMOD0A
EXP0B
SIGNEXTEND10
LT11
GT12
SLT13
SGT14
EQ15
ISZERO16
AND17
OR18
XOR19
NOT1A
BYTE1B
SHL1C
SHR1D
SARThe Basic Math and Bitwise Logic section encompasses a collection of EVM opcodes dedicated to performing elementary arithmetic operations, such as addition, subtraction, multiplication, and division, as well as bitwise operations including AND, OR, XOR, and NOT. These opcodes serve as the foundational building blocks for more complex contract logic and computations on the Ethereum Virtual Machine (EVM), and replicating their functionality within the Motoko environment is essential for EVM compatibility.
Implementing these opcodes within the Motoko programming language requires careful attention to the specifics of each operation, including handling overflows, underflows, and division by zero in accordance with the EVM’s behavior. Each opcode function should accept the ExecutionContext
as an input variable and modify this context as dictated by the opcode’s semantics, ensuring changes are reflected across the stack, memory, and any intermediate computations.
For arithmetic operations such as ADD
, SUB
, MUL
, and DIV
, the opcodes should manage 256-bit integer arithmetic, respecting the bounds of these operations as per the EVM specification. Bitwise operations (AND
, OR
, XOR
, NOT
) operate directly on the binary representations of these integers, enabling manipulation of data at the bit level.
30
ADDRESS31
BALANCE32
ORIGIN33
CALLER34
CALLVALUE35
CALLDATALOAD36
CALLDATASIZE37
CALLDATACOPY38
CODESIZE39
CODECOPY3A
GASPRICE3B
EXTCODESIZE3C
EXTCODECOPY3D
RETURNDATASIZE3E
RETURNDATACOPY3F
EXTCODEHASH40
BLOCKHASH41
COINBASE42
TIMESTAMP43
NUMBER44
DIFFICULTY45
GASLIMIT46
CHAINID47
SELFBALANCE48
BASEFEEThe Environmental Information category of opcodes allows smart contracts to access information about the blockchain environment in which they are executed. These opcodes provide functionalities to retrieve details such as the address of the caller, the contract itself, the balance of an account, input data of a call, and more. Implementing these opcodes in Motoko is crucial for building an EVM-compatible environment, enabling smart contracts to make decisions based on their current execution context.
When implementing these opcodes, the bounty hunter should consider that each opcode is designed to retrieve specific environmental information and interact with the ExecutionContext
data structure accordingly. The structure of your op code calls should be as follows:
Input Handling: Each opcode function should accept the ExecutionContext
as an input. This context contains all necessary information about the current state of execution, such as caller’s address, contract address, call value, and more.
Operation Execution: Based on the opcode being implemented, extract the required information from the appropriate field within the ExecutionContext
. For example, CALLER
would retrieve the address of the initiator of the current call from the caller
field.
Stack Update: After retrieving the required information, push the result onto the stack contained within the ExecutionContext
. Ensure that the data type and size are compliant with the EVM standards (e.g., addresses and balances should be represented as 256-bit integers).
Return and Update: The opcode function should not return any value directly. Instead, it updates the ExecutionContext
that was passed as an input, reflecting changes on the stack, and any other relevant modifications based on the opcode’s logic.
Balance Queries (BALANCE
): When implementing opcodes like BALANCE
, ensure you’re querying the correct information source within the ExecutionContext
. For instance, you might need to access both the accounts
data structure and the balanceChanges
vector to accurately calculate the current balance of a queried address, taking into account any in-flight transactions or changes during execution.
Environmental Data (CALLDATALOAD
, CALLDATASIZE
, CALLDATACOPY
): For opcodes that interact with call data, make sure to handle data offsets and lengths accurately, ensuring safe access to the calldata
field without risking out-of-bounds errors.
Gas and Blockchain Context (GASPRICE
, BLOCKHASH
, CHAINID
): Implementing these opcodes requires careful handling of the ExecutionContext
’s fields related to blockchain context and execution gas. Remember to access the correct information, considering potential updates during transaction execution.
Block information opcodes like TIMESTAMP
, NUMBER
, and DIFFICULTY
are useful for operations that depend on blockchain specifics, like generating randomness or enforcing time-dependent conditions.
50
POP51
MLOAD52
MSTORE53
MSTORE854
SLOAD55
SSTORE56
JUMP57
JUMPI58
PC59
MSIZE5A
GAS5B
JUMPDESTThe Memory Operations category encompasses a variety of EVM opcodes designed to interact with and manipulate the memory space available during the execution of smart contracts. Memory in the EVM is a volatile data storage area that is erased between external function calls and transactions. The primary purpose of these operations is to enable the reading, writing, and management of data in memory during contract execution, allowing for dynamic data manipulation within the scope of a single transaction or function call.
Implementation of memory operations opcodes in Motoko requires understanding and manipulation of the Memory
data structure within the ExecutionContext
. Memory operations include loading data from memory (MLOAD
), storing data in memory (MSTORE
, MSTORE8
), and querying the size of the active memory space (MSIZE
). Additionally, there are opcodes dedicated to jumping within the contract code, based on conditions (JUMP
, JUMPI
, JUMPDEST
), and accessing or modifying the program counter (PC
).
Memory Access: The Memory
type, represented as a vector of bytes (Vec<Byte>
), should be accessed and modified by memory opcodes. For instance, MLOAD
reads a specific location from memory, while MSTORE
writes to a given location.
Data Encoding and Decoding: Memory operations might require encoding data into EVM’s big-endian format before storing and decoding it back into Motoko’s native types upon loading. Careful management of data sizes and alignments according to the specification is crucial.
Dynamic Memory Expansion: The size of the Memory
should dynamically increase to accommodate writes to previously unallocated areas. This expansion should be reflected in the MSIZE
operation and factored into gas calculations, as memory expansion incurs gas costs.
Jump Operations: Implementations of JUMP
and JUMPI
must validate the destination against a list of valid jump destinations (denoted by JUMPDEST
opcodes) within the contract code. This is a critical security mechanism to prevent unauthorized jumps to arbitrary code locations.
Memory Initialization: Initially, memory is empty. The first store operation should allocate memory space dynamically, conforming to the EVM gas costing for memory expansion.
Bounds and Safety Checks: Memory operations should include bounds checking to prevent overflows and underflows. For instance, attempting to read beyond the current memory size should either result in an error or return zeros (consistent with EVM behavior).
Program Counter Management: For jump operations (JUMP
, JUMPI
), updating the programCounter
within the ExecutionContext
accurately is critical to ensure proper execution flow. Validation against JUMPDEST
instructions ensures that jumps are only made to authorized points in the code.
Gas Calculation: The implementation must calculate the gas costs for memory operations, particularly for memory expansion. This includes updating the currentGas
within the ExecutionContext
, following the EVM’s gas pricing structure.
5F
PUSH060
PUSH161
PUSH262
PUSH363
PUSH464
PUSH565
PUSH666
PUSH767
PUSH868
PUSH969
PUSH106A
PUSH116B
PUSH126C
PUSH136D
PUSH146E
PUSH156F
PUSH1670
PUSH1771
PUSH1872
PUSH1973
PUSH2074
PUSH2175
PUSH2276
PUSH2377
PUSH2478
PUSH2579
PUSH267A
PUSH277B
PUSH287C
PUSH297D
PUSH307E
PUSH317F
PUSH3280
DUP181
DUP282
DUP383
DUP484
DUP585
DUP686
DUP787
DUP888
DUP989
DUP108A
DUP118B
DUP128C
DUP138D
DUP148E
DUP158F
DUP1690
SWAP191
SWAP292
SWAP393
SWAP494
SWAP595
SWAP696
SWAP797
SWAP898
SWAP999
SWAP109A
SWAP119B
SWAP129C
SWAP139D
SWAP149E
SWAP159F
SWAP16PUSH1
0x60
to PUSH32
0x7F
: These opcodes push 1 to 32 bytes onto the stack, respectively. The number of bytes to push is determined by the opcode. For example, PUSH1
pushes 1 byte and PUSH32
pushes 32 bytes onto the stack. The bytes are read from the program immediately following the opcode.DUP1
0x80
to DUP16
0x8F
: These opcodes duplicate 1 to 16th stack element to the top of the stack, respectively. For example, DUP1
duplicates the top stack element, and DUP16
duplicates the 16th stack element from the top.SWAP1
0x90
to SWAP16
0x9F
: These opcodes swap the top stack element with one of the 1 to 16th elements below it. For example, SWAP1
swaps the top two elements of the stack, and SWAP16
swaps the top element with the 16th element below it.DUP
operation. Attempting to duplicate an element not present should result in an error.currentGas
in the ExecutionContext
for each operation performed.A0
LOG0A1
LOG1A2
LOG2A3
LOG3A4
LOG4Logging operations in the EVM are essential for emitting events that can be consumed by external entities monitoring blockchain activity. These opcodes (LOG0
to LOG4
) allow smart contracts to record indexed information and data blobs, which external applications can use to track contract events, state changes, or any notable occurrences dictated by the contract logic.
Implementing logging opcodes in Motoko requires interaction with the ExecutionContext
to update the logs
vector with new log entries. Each LOG
opcode differs in the number of topics it allows for indexing, ranging from zero (LOG0
) to four (LOG4
). The data portion of the log is a binary blob, which can contain arbitrary data from the contract’s execution environment.
Bounty hunters implementing these opcodes should structure their operations as follows:
LOG
opcode being executed.LogEntry
by packaging the extracted topics and data. The structure of a LogEntry
includes a list of topics (Vec<Blob>
) and the data blob itself (Blob
).LogEntry
to the logs
vector within the ExecutionContext
. This ensures that all emitted logs are captured in the context of the current transaction execution.currentGas
field in the ExecutionContext
appropriately ensures that gas usage reflects the computational and storage resources consumed by logging.LOG
opcode used. Ensure that your implementation respects these limits and properly handles cases where too many topics are provided.ExecutionContext
.Logging operations do not produce a direct output on the stack but modify the execution context’s state by appending new entries to the logs
vector. This indirect output is instrumental for off-chain applications and tools to monitor, index, and interpret contract activity, making these operations crucial for contract transparency and external integration.
00
STOPFD
REVERTFE
INVALIDFF
SELFDESTRUCTF0
CREATEF1
CALLF2
CALLCODEF3
RETURNF4
DELEGATECALLF5
CREATE2FA
STATICCALLFB
TXHASHFC
CHAINIDThis category encompasses a range of EVM opcodes designed for controlling contract execution flow, system-level interactions, and the creation and management of contracts. These operations are critical for implementing contract logic that responds to execution conditions, interfaces with other contracts, and dynamically generates new contracts. Understanding the nuances of these opcodes is essential for building compliant and secure smart contracts on an EVM-compatible platform like the Internet Computer (IC) using Motoko.
Execution and system operations, within the context of Motoko and the Internet Computer’s architecture, must meticulously manage the ExecutionContext
to accurately reflect changes in state, control flow, and contract interactions. The design of these opcode functions demands careful consideration of the IC’s unique features, such as cycles management and canister interactions, while adhering to EVM specifications. Here are crucial points to consider while implementing these opcodes:
Flow Control: Opcodes like STOP
, RETURN
, and REVERT
are fundamental in managing the execution flow. They dictate the end of execution, returning data to the caller or reverting state changes, respectively. Implementations must ensure that these opcodes accurately update the ExecutionContext
, particularly setting the return
field where appropriate, and managing gas accounting for partial or full executions.
Error and Exception Handling: The INVALID
opcode represents an explicit exception in contract execution, typically leading to the termination of execution and reverting of all changes. Motoko implementations must correctly signal errors and ensure that state changes are not persisted in such cases, aligning with the atomic transaction model of the EVM.
Contract Creation and Interaction: The CREATE
, CREATE2
, CALL
, and DELEGATECALL
opcodes facilitate the dynamic creation of contracts and interaction between contracts. These require intricate handling of the ExecutionContext
to simulate nested transactions/calls within the IC’s environment. Implementors must manage the creation of new ExecutionContexts
for each call or contract creation, accurately passing gas, value, and data between contexts, and correctly merging state changes upon successful completion.
System Level Information and Operations: Opcodes like CHAINID
and TXHASH
provide access to blockchain-specific data. Implementations must derive these values from the IC’s environment or simulate equivalent values where direct analogues might not exist.
State Isolation and Commitment: Ensure isolation between execution contexts for CALL
and CREATE
operations, committing state changes only upon successful execution. Revert state changes on operation failures.
Gas Management: Accurately calculate and deduct gas costs for execution and system operations, updating currentGas
in ExecutionContext
. Implement gas forwarding rules for calls and creations, respecting gas stipends for calls with value transfers.
Return Data Handling: For the RETURN
and REVERT
operations, properly set the return
field in the ExecutionContext
to manage returning data to the caller or reverting state with an error message.
Secure Contract Interaction: Validate destination addresses for CALL
and DELEGATECALL
operations, ensuring they reference valid contracts or precompiles. For CREATE
and CREATE2
, implement address generation according to EVM specifications and ensure non-collision with existing addresses.
SELFDESTRUCT
Implementation: Implement self-destruct functionality with caution, considering the permanence of such an action within the IC’s architecture. This might involve marking contracts as inactive rather than deleting them, reflecting the EVM’s semantics while aligning with the IC’s model.
Security and Compliance: Rigorously test opcode implementations against common security pitfalls and compliance with EVM specifications. This includes handling deep call stacks, stack underflows/overflows, and ensuring atomicity of transactions.
0001
ECDSA Recovery (Elliptic Curve Digital Signature Algorithm)0002
SHA-256 Hash Function0003
RIPEMD-160 Hash Function0004
Identity Function (Data Copy)0005
Modular Exponentiation0006
Elliptic Curve Addition0007
Elliptic Curve Scalar Multiplication0008
Elliptic Curve Pairing Check0009
Blake2 Compression Function FPrecompiled contracts in Ethereum are a set of contracts provided as part of the Ethereum protocol. These contracts are implemented at the protocol level but are presented and interacted with as if they are smart contracts at specific addresses. Precompiles are designed to perform specific, computationally intensive operations such as cryptographic operations or hashing at a lower gas cost than if they were implemented in EVM bytecode in a regular contract. This makes certain operations practical and efficient within the blockchain context, which otherwise would be prohibitively expensive and slow.
When implementing precompile operations in Motoko for the Internet Computer (IC) Ecosystem, bounty hunters should structure their opcode calls to simulate the behavior of these precompiled contracts closely. Given that the IC does not natively support these operations as precompiles, developers must create efficient Motoko implementations that mimic their Ethereum counterparts. The precompiles address a range of operations, from cryptographic functions like ECDSA recovery to various hashing functions, each having a unique precompile address in Ethereum. Various libraries may be imported to support the implementation of these precompiles. In instances where an existing library does not exist, the developer should implement it.
These bounty hunter should intercept any CALL(or related) opcodes that might reference the precompile addresses and route operation through the developed precompile functions.
Efficiency and Accuracy: Implementations must focus on both computational efficiency and accuracy. Since the primary advantage of precompiles lies in their low gas cost for complex operations, your Motoko implementation should aim to be as optimized as possible while producing the correct results.
Execution Context Interaction: Similar to EVM opcodes, precompile function calls must accept and interact with the ExecutionContext
. While they may not alter the execution context as extensively as some opcodes do, they must accurately calculate and deduct their gas costs based on the input size and operation complexity, updating the ExecutionContext
’s currentGas
.
Return Values and Error Handling: Precompile calls are expected to return values or throw errors in specific cases, much like regular smart contract functions. Successful operations should return their result in a manner expected by EVM semantics (typically pushing the result onto the stack), and errors or exceptions should revert any changes made during their execution, preserving the atomicity of transactions.
Interface Definition: Define a clear and consistent interface for each precompile, considering the input parameters and expected output. This assists in abstracting the precompile’s implementation and facilitating future optimizations or revisions.
Gas Costing: Implement gas costing according to the predefined rules set by the Ethereum specifications for each precompile. Accurate gas calculation ensures the economic equivalence of precompile operations across Ethereum and the IC Ecosystem.
Testing and Validation: Rigorous testing is essential. Compare your outputs against known outputs generated by Ethereum’s precompiles to validate correctness. Consider edge cases and input extremes to ensure reliability and robustness.
Documentation: Provide comprehensive documentation for each precompile implementation, detailing the operation performed, expected inputs, outputs, and any limitations or deviations from Ethereum’s behavior. Documentation supports maintainability and usability by other developers in the ecosystem.
Testing is a crucial aspect of software development, ensuring that each module of your code behaves as expected under various conditions. For the implementation of EVM opcodes and precompiles in Motoko, we recommend a robust approach to unit testing, covering each opcode’s functionality comprehensively. This section outlines key considerations and recommendations for writing effective unit tests for the opcodes implementation.
ADD
, MUL
, SUB
, DIV
, etc.: Apart from normal operation, include tests for special cases like division by zero, multiplication resulting in overflow, and subtraction resulting in underflow.MLOAD
, MSTORE
, SLOAD
, SSTORE
: It’s crucial to test not only successful reads and writes but also attempts to access invalid memory or storage locations (e.g., out-of-bounds or unallocated memory).JUMP
, JUMPI
: Test for both valid and invalid jump destinations, ensuring that JUMPDEST
validation is correctly implemented.JUMPI
, include scenarios where the jump should and should not be taken, based on the condition provided.The bounty associated with the implementation of EVM opcodes and precompiles in Motoko will be assigned to a single individual or team. This approach allows for concentrated effort and a consistent vision throughout the development process. However, we recognize the variability of personal circumstances and the challenges that a project of this depth may present.
Code will be pushed to https://github.com/icdevsorg/evm.mo.
The project is structured to allow for modular completion. Contributors can achieve progress in distinct blocks or stages, aligning with the separate categories or sets of opcodes and precompiles laid out in the specifications. This structure facilitates manageable goals and provides clear checkpoints for progress assessment.
We understand that unforeseen circumstances may prevent a contributor from completing the entire bounty. In such cases, compensation will be awarded proportionally based on the percentage of the project that has been completed and approved by the project’s review team. This ensures that all efforts are recognized and valued, even if the project’s full scope cannot be realized by the initial assignee.
Should the original assignee be unable to complete the bounty, a clear and structured handover process will be implemented. This process involves: