OCaml Workshop 2020 – OCaml Under The Hood: SmartPy

Hi, I'm Seb, I'm here virtually to present
an OCaml project called SmartPy, which is a Python library and a meta
programming framework designed to implement distributed
applications on the Tezos blockchain. It's quite a polyglot project that also
uses Python and Javascript extensively. So who am I? So hello this is me, I'm a software engineer at TQ Tezos, the
company whose role is to improve the Tezos Ecosystem in general.
I myself work on the core tezos code base. I also develop and help
people develop Smart Contracts. On top of treasures i also help our
partners getting up to speed and build things on
Tezos. I also contribute to the Standardization efforts
around smart contracts, to improve interoperability.
And I contribute to tooling development and that includes SmartPy.
To this topic, I'm not the main developer at SmartPy. Currently the team is four or less people. The project started and lead by François Maurel of LexiFi fame. Roland also joined,
a recovering haskeller, who's working deeply in the compiler or de-compiler
and has done many more things. Rodrigo joined more recently first on
infrastructure and then also on the web UI and, finally, me.

I've been doing a bit of everything over the past couple of years
more the testing side. The testing framework that the users use
and our testing and CI setup. Also on the implementation of cryptographic and
blockchain primitives of the language.
So let's define a few things because before we lose all the audience.
So Tezos is a cryptocurrency that is usually pretty well-known in the OCaml
world because we have been… there have been a few talks here and
there and we have been hiring all OCaml and formal methods people for a while.
A cryptocurrency, in one sentence, is a distributed system where entities that
do not trust each other agree on a common history, usually,
using financial incentives and very clever
cryptography. What's particular about Tezos is that
it's based on a proof-of-stake concession algorithm so we use baking
instead of mining. Tezos was actually the first
large-scale proof of scale for stake to launch. Proof of stake means that
block producers are selected using their balance in the cryptocurrency
not the hashing power, as in bitcoin.

It also means that, yeah, not like bitcoin
it's… or ethereum, it's actually energy
efficient. Another particular… particularity of Tezos is the on-chain
governance. Its self-amending protocol Tz can upgrade itself
and stakeholders vote on upgrades to the core protocol so
we often say that Tezos can steal other new good ideas from other projects
without centralized hard forks because we can upgrade. Then Tezos like
Ethereum is based on Smart Contracts we just… we don't just have transfers of
money between owners and private keys, we have accounts that are programmable.
They can execute pieces of code that are called with parameters. Some examples of programmable accounts can be
the spending limit contract, which prevents somebody from
spending more than 1,000 tz per day, or multisig contract, a contract that requires more than one person to sign
to agree to spend money or to do an operation.
And, of course, the other particularity is, the OCaml implementation,
the main implementation of the node/client/baking software is written in

And the protocol itself, the thing that
is being voted on by the stakers, it's a hash of a tarball of OCaml
files. So OCaml is really central. Those files define a big functor that
is used to decide what are valid blocks so the
contracts, mentioned above, are actually pieces of Michelson.
Michelson is a VM or a scripting language of Tezis, It is
the language that the programs that the whole network of validators has
to run to reproduce the transactions to everybody understand each other.
It has a stack-based and functional core. it's a stack language. It has clear
semantics and typing rules, they are defined inside the protocol with the giant… gigantic GADT. You can even keep track of
all the types on the stack, and this is, by far, not the only typing
hack in the code base, or on the ecosystem. There is a strong emphasis on
enabling formal methods and on being a proper
compilation target, because actually writing smart contracts
is very hard.

Bugs are very expensive (you don't want
your money to be locked or stolen from
an account). And in the same time, VM languages, like Michelson, are very low-level. Nobody wants to
handle the stack. Very few people actually write it, there
is an Emacs mode but not very successful. This has led to the
proliferation of many high-level language… languages that compile to Michelson. There's been liquidity,
ligo, scaml, lorentz, juvix, and many of the popular ones
as SmartPy is likely the most popular. And actually as I said, it's a library
designed to do meta programming of smart contracts. It's not the language
itself so SmartPy allows you to generate Michelson and test
those contracts. It provides all the tooling you may want and there is an IDE
and command-line tools. You can run simulation of contract interactions.
You can also deploy and interact with a contract on a test network
from the WebIDE. It is a big mix of OCaml, Python, and Javascript,
OCaml being most of the logic, Python being the user facing meta-language,
and Javascript being the glue between everything and also part of the
user interface.

So why Python? And why Python at OCaml 2020? Well it is one of the most popular
languages in the world, so the intuitive syntax, for most people. It has
pretty good meta-programming capabilities,
so the AST is programmable. It has decorations that are programmable.
You can see that success in the machine-learning ecosystem with like
torch and stuff like this. And, moreover, people believe they already know Python
so it's also like a Tezos gateway drug. Users usually get started and hooked even if they don't realize that the really hard part
is actually designing secure distributed applications the meta meta language doesn't matter that much. So what happens in the implementation, SmartPy programs generate SmartML
expressions and SmartML is an imperative type-inferred
intermediate representation. It's an MLish language. SmartML is
implemented as an OCaml library itself. It can compile to native using
for the tests or for command-line tools. Or to javascript, which is what runs inside
the WebIDE or in the end-user command application.
The library provides type inference, other program analysis
like code transformations
and optimizations. It also implements the interpreter.

The interpreter for the test scenarios. SmartML is technically more than a
smart-contract language since it also does the client and off-chain side of
things for testing. Then of course there is a compiler to
Michleson and quite a bunch of Michleson to Michleson sound optimizations.
And for now work in progress there is a decompiler to generate
Python from Michleson to be able to study existing projects. So let's look at a minimal example of a smart contract. Sorry for putting Python code in OCaml2020's slide 🙂 It's a pretty dumb contract that just stores
a string value which can be updated only by one account,
defined only once at initialization time. So on the first line you see that SmartPy is the library.

It's imported like any
other library there's no magic there. Then we see that we define a class
called HelloWorld that inherits from the concept of
contract and it has an __init__ method and that is
the initialization of the contract and it corresponds to the initial state of
the contract that is actually posted on the blockchain when
we create the real contract. ?????
One of the parameters of that function is the administrator the other is that
the account… the address of the account that will be
allowed to modify the storage and the other one is the string
the mem field. So then there is an entry point and it is a function
or a method that anyone it can call. So if you need any form of access control,
of course, you need to program it yourself. That's one of the things that
make blockchain so nice and so dangerous!
So here we do the check that the thing called sender, the sender of the
operation, is the admin that we initialize the
contract with; and if the..

If not – we just fail the transaction.
We see the little SmartPy specific syntactic sugar: that so the keyword "if" is already taken by Pythons "if", it is useful for meta programming you can generate different things,
depending on conditions, so here we use sp.if which is a function that takes a
block of code to generate it and uh.. there is a little
bit of syntactic sugar there to make it nicer you can also
do without it, then line 13 we see how we create test scenarios
so this is also essentially meta programming uh..
functions that generate SmartML scenarios they can run in the SmartML
interpreter or in a sandbox blockchain or even on a public test
network. Line 16, we see how we create a
cryptographic key-pair for testing, it's a real but deterministic
key pair from the seed that we see there ??? that can sign but it can sign messages and everything and it works in the
interpreter just as in the real world so we are fully
interoperable so then we call a few times the contract
and uh… we verify our condition about the state of the contract at the end.
So let's look at a little demo to see how it looks for the end users in the

So we see the the contract that I just
showed with the test in the WebIDE with syntax
hilighting and everything. The user can compile and run the program. We see the
initial state. We see the test scenario results, some successful and expected unsuccessful entry points. And we see the
nice OCaml? 2020. you see also that we can deploy the Michleson contract,
means that we can go and see it. We can see the sizes of the contract,
which are important because you pay for the size of a contract you put on chain.
You can see the heavily commented Michleson code. It's also good for
teaching people how Michleson works. We can then deploy the contract in
another page on the main network or most likely on a
test network.

You can import many kinds of private
keys and you can interact with the contract.
There is also a block explorer like a contract explorer you can, like,
add operations to a contract already existing by creating them from the…
from the web UI. There is also more like a wallet functionality you can, like,
import your keys and manage them and then talk to various contracts there's also Michleson itself editor so
for now for studying Michleson and soon for
decompiling Michleson but already it's useful to like see the
typing live in the code at comments
and then there is an extensive help set of pages the reference manual that is very complete and at the very end of it we see the other aspect which is the command-line
interface use SmartPy that you can install and that we're going to discuss
a bit later so let's go through what happens at the beginning of the demo we saw just before the pipeline code is executed by the bride on interpreter which is an
implementation of Python in Javascript then the only thing the code does it
actually constructs serialized SmartML programs that's everything that
the Python does then the contract enters the JSON programmer world the smart…

Smart demand library does the
type inference, the type checking, it runs the scenario in the interpreter,
reports the errors, as it finds them, and it does the
compilation to Michleson and all the other tabs that we have seen.
And it gives the results back to the UI that itself constructs an
html right pane that we saw in the demo.
So as we saw also in the demo there is a command-line interface to SmartPy.
The point of it is to be extremely portable so that it uses node.js and npm.
And that way it allows anyone using Emacs, VIM, or any
editor to edit SmartPy scripts and, like, test them and compile them. So how do we
make another portable OCaml command an
application with Javascript? So let's look inside this thing, it's a
bit crazy but it works.

So SmartPy.sh is a bash script, it knows
how to install the smartpy-cli dependencies the python and javascript
files under the npm dependencies the javascript is the output of js_of_ocaml
plus some hacks on top of it so SmartPt.sh then calls the OCaml
application application when it's installed
and it's a command-line application the Javascript main
is the concatenation of a prelude.js that loads npm packages and the result
of just working with compilation
though overall it makes something that it's a bit slow especially starting it
up but it does the job and people seem
happy not not that we need to call anyway the Python
interpreter so it's not a pure OCaml super fast executable.
So let's see how this thing is made with a beautiful dune rule
that uses cat and sed to be the linker of the output of the js_of_ocaml
well the call to sed will be removed once the
we integrate the next version of js_of_ocaml it will be cleaner.
So let's look at the what that prelude.js looks like.
It's about 100 lines of requiring npm packages and then doing that we are async hack to make sure that those libraries are
initialized when the js_of_ocaml process takes over there.

May be cleaner
ways of doing this, I mean using a library that works both
in the browser and in node, but note that that cryptographic library
is actually a bunch of C codes compiled to js and this is how it's brought to the OCaml side so we are happy users of the
gen_js_api project we use an interface file with attributes
that make up the js_of_ocaml bindings here we have the bindings to the
verify_signature and sign from libsodium. It's all very clean and
very nice to use. We also have bindings to base-58
encodings for a few hash functions like blake2b and sha256.

We can see that we did not bother and we use the js.global instead of properly passing around the sodium
object. It just works, we also had to implement
a compliant version of Michleson serialization primitives, so pack and
unpack, because they are useful, for instance, for
signing things and verifying signatures. And most Tezos libraries are not yet
usable with js_of_ocaml. But it's getting there, there is work
being done so this is. How it looks, of course, it should be on the slower
side. But it's not important in our case, if it
fits in the blockchain it's small data.

Also a few words about error
messages and location so it's known hard problem in the OCaml
world, multiplied by a lot in our cases so we need to report errors from the
Python interpreter which is actually the easiest, because it's the surface,
and there's also the SmartML type inference and type checking. It gets
trickier because of the meta-programming, but it works still pretty well, because
thanks to Python's programmability of the meta-programming there's also the
interpreter that can report errors. That's still SmartML but
we have two locations now the number… the line number with the
test that failed and the line number with the place in the contract, where the
test failed. And then there's the compiler we usually
just display the error in the Michleson and we don't interrupt
the user we show the errors in the makeup zone
and we try to get rid of those errors in improving SmartML type checking
because we want them to be caught as early as possible but that's work in
progress. And then they are like the worst ones:
the type the Tezos type checker and contract origination errors.
We want to avoid those at all cost so if we get one of those we — consider that
it's a bug in the compiler, but can still happen.

There are other
challenges with a Tower of Babel of languages,
Python and Javascript are easy as languages but we need to actually know
a lot about the language implementations like the Brython versus regular Python interpreters that are like not exactly
the same. We need to understand when we run
javascript in the browser in node.js and what js_of_ocaml does with it
so for that we can do proper abstractions using the module system
for instance the real Tezos client test interpreter,
the one that is not yet in the WebIDE it's used to run the sandbox test
and in the command-line and it's there in the command app so it's factorized
over the concept of Tezos client it works in native and node.js and it's
not yet actually implemented in the browser but
it will be. Also we have observable slight culture clash between our OCaml styles
and Haskell and traditional ml. It's hard to impose rules in that kind of project
there's already a bunch of code.

How do people homogenize when the team is understaffed… understuffed.
So, for example, I'm in the Jane Street style camp, with labels everywhere
and long names and so on. But how do I impose it to everybody? One
nice success is the convincing people to switch to ocamlformat.
We have a 16 line .ocamlformat config file! Which, in fact, is good it,
means that even if I don't care about the style I
can convince people to switch to ocamlformat thanks to
all those options.

Also, as I saidm we need to deal with this, known as
slowness of javascript, it's noticeable especially at the startup of the
application but usually users seem pretty good with
some delay especially since the right-pane has a lot of value for users.
Let me talk also about something called FA2. Without much detail it's part of a
standardization effort around a common multiple asset contract
interface. One of the reference implementations is SmartPY
and it's a good occasion to abuse the meta-programming features.
So it's at the address there but it's also available in the SmartIDE
other template so it really uses meta-programming there are like 12
boolean configuration switches that drive the code generation.
They also those are not completely independent so it's not like 2 to the
power of 12, but still there are many contract variants
for many use-cases that are meaningful and that are not worth discussing here.
But also allow to do heavy benchmarking to try various implementation strategies.
For example, we ruled out an optimization that people think is
often useful and it turns out it's very negligible for a bigger contract.
The blog post about the benchmarks should be out by the time this
airs, by the way uh…

Not really SmartPy-related but if you like
meta programming with big build system hacks there's a lot of OCaml code generation done from the Michleson output
and it's used to build a mini-wallet benchmarks command-line application in real or OCaml this time???
So, of course, when the contract changes we get type errors in the OCaml code,
which is what we all live for so for now we are quite
successful. SmartPy has quite some good popularity
within the Tezos ecosystem The Telegram channel, twitter accounts
are very active There are like online courses and actual
applications being built with SmartPy, other tools in the ecosystem also, like,
directly support SmartPy. But, of course, there's always a ton of
things to do. Among them we want to work on, like, making sandbox tests…
testing more available to the end users. It is available in the command
application, but we haven't had much feedback so far.
It's not that easy to get started with it and many users are allergic to the
docker desktop dependency so we are also working on the web
version of it, which is talking to a node from the WebIDE.
We are also working on the Michleson decompilation so that users can
study and integrate contracts that they find in the wild.

We want to
also integrate other analyses like doing abstract interpretation to
follow data ownership, or work some form of gas usage
prediction and, aside from Michleson, we want to add
other generation targets like ways of parsing or interact… or
interacting with a contract of storage or adding more formal targets
like WhyML or Coq. So this is the end, Thank you for listening and
happy to answer any questions!.

You May Also Like