More

emgeee · 2025-04-24T16:51:02 1745513462

I agree. Tenures may be short but careers are long and tech is (surprisingly) small. Credibility builds trust and trust between people is ultimately what business run on. "Do right be people" is a good strategy.

emgeee · 2025-04-09T22:17:31 1744237051

fellow co-founder here! One fun thing about this project is the entire frontend was vibe-coded using Bolt in a few days.

skeptrune · 2025-04-10T01:22:47 1744248167

Very awesome. Not having to burn time on a UI that looks and feels nice is a huge win.

emgeee · 2025-04-09T21:16:06 1744233366

This is a pretty cool idea but I'm trying to think of the advantage of WASM vs other execution engines.

It seems to me one of the main use-cases for WASM is to execute lambdas, which are often short-lived (like 500ms timeout limits). Maybe this could have a place in embedded systems?

tomasol · 2025-04-09T21:22:03 1744233723

The biggest motivator for me is that WASM sandbox provides true deterministic execution. Contrary to engines like temporal, using hashmaps is 100% deterministic here. Attempting to spawn a thread is a compile error. It also performs well - the bottleneck is in the write throughput of sqlite. Last but not least - all the interfaces between workflows and activities are type safe, described in a WIT schema.

AlotOfReading · 2025-04-10T04:10:31 1744258231

WASM isn't quite deterministic. An easy example is NaN propagation, which can be nondeterministic in certain circumstances. Obelisk itself seems to allow nondeterminism via the sleep() function. Just create a race condition among a join set. I imagine that might even get easier once the TODO to implement sleep jitter is completed.

It's certainly close enough that calling it deterministic isn't misleading (though I'd stop short of "true determinism"), but there's still sharp edges here with things like hashmaps (e.g. by recompiling: https://dev.to/gnunicorn/hunting-down-a-non-determinism-bug-...).

tomasol · 2025-04-10T10:42:31 1744281751

Thanks for bringing that up. Regarding the NaN canonicalization, there is a flag for it in wasmtime [1], I should probably make sure it is turned on.

Although I don't expect to be an issue practically speaking, Obelisk checks that the replay is deterministic and fails the workflow when an unexpected event is triggered. It should be also be possible to add an automatic replay of each finished execution to verify the determinism e.g. while testing.

[1] https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#...

Edit: Enabling the flags here: https://github.com/obeli-sk/obelisk/pull/67

tomasol · 2025-04-10T11:11:23 1744283483

> Just create a race condition among a join set.

All responses and completed delays are stored in a table with an auto-incremented id, so the `-await-next` will always resolve to the same value.

As you mention, putting a persistent sleep and a child execution into the same join set is not yet implemented.

AlotOfReading · 2025-04-10T14:39:18 1744295958

I get that, the nondeterminism would come from the completion order of the join set. If the children sleep appropriately, they'll race to be inserted after completing, and order of the result set will depend on the specifics of the implementation. It's possible this could happen deterministically, but probably not reasonably.

tomasol · 2025-04-11T09:14:25 1744362865

Sorry for the late reply.

The actual order in which child workflows finish and their results hit the persistence layer is indeed nondeterministic in real-time execution. Trying to force deterministic completion order would likely be complex and defeat the purpose of parallelism, as you noted.

However, this external nondeterminism is outside the scope of the workflow execution's determinism required for replay.

When the workflow replays, it doesn't re-run the race. It consumes events from the log. The `-await-next` operation during replay simply reads the next recorded result, based on the fixed order. Since the log provides the same sequence of results every time, the workflow's internal logic proceeds identically, making the same decisions based on that recorded history.

Determinism is maintained within the replay context by reading the persisted, ordered outcomes of those nondeterministic races.

genuine_smiles · 2025-04-10T10:25:44 1744280744

> An easy example is NaN propagation, which can be nondeterministic in certain circumstances.

Which circumstances?

xmcqdpt2 · 2025-04-10T11:37:57 1744285077

See for example

https://github.com/WebAssembly/design/issues/1463

In general, if NaN1 and NaN2 are different (there are 23 bits in a NaN that can be set to an arbitrary value, the NaN payload) then combining them isn’t deterministic. NaN1 + NaN2 might produce NaN1 or NaN2 depending on the processor model, instructions etc. IEEE754 only says that the result must be one of the two input NaNs (or at least it did last time I checked the standard.)

In practice, NaN payloads are seldomly used so it doesn’t matter much. NaN canonicalization involves transforming all NaNs to specific NaN values, and AFAIK it’s expensive because technically you need to check for NaN all the time.

jcmfernandes · 2025-04-09T21:31:59 1744234319

Somewhat similar to Golem - https://github.com/golemcloud/golem - correct?

So, I like this idea, I really do. At the same time, in the short-term, WASM is relatively messy and, in my opinion, immature (as an ecosystem) for prime time. But with that out of the way (it will eventually come), you'll have to tell people that they can't use any code that relies on threads, so they better know if any of the libraries they use does it. How do you foresee navigating this? Runtime errors suck, especially in this context, as fixing them requires either live patching code or migrating execution logs to new code versions.

tomasol · 2025-04-09T21:54:52 1744235692

Yeah, looks like Golem went similar route - using WASM Component Model and wasmtime.

There is always this chicken and egg problem on a new platform, but I am hoping that LLMs can solve it partially - the activities are just HTTP clients with no complex logic.

Regarding the restrictions required for determinism, they only apply to workflows, not activities. Workflows should be describing just the business logic. All the complexities of retries, failure recovery, replay after server crash etc. are handled by the runtime. The WASM sandbox makes it impossible to introduce non-determinism - it would cause a compile error so no need for runtime checks.

jcmfernandes · 2025-04-09T22:36:05 1744238165

I understand what you mean by being able to fully sandbox things and guarantee determinism, a must for the workflows and not the activities (using temporal lingo).

When you say that the runtime handles, for example, retries, doesn't that require me to depend on your HTTP client component? Or do I also need to compile activities to WASM and have obelisk running them because they are essentially background jobs (that is, you have workers pulling)?

Finally, do you see the component's interface as the right layer for capturing IO? I'm imagining people attempting to run managed code (Java, python, ruby, etc.). The VMs can do thousands of syscallls before they start executing they user's code. Logging them one by one seems crazy, but I also don't see an alternative.

EDIT:

I RTFM and found the answers to my first two questions in the README :)

tomasol · 2025-04-09T22:54:55 1744239295

> do I also need to compile activities to WASM

Yes, currently all activities must conform to the WASI 0.2 standard. This is the simplest for deployment, as you only need the obelisk executable, toml config file. The webhooks, workflows and activities pulled from a OCI registry on startup.

To support native code I plan to add external activities as well, with an interface similar to what Netflix Conductor uses for its workers.

> Finally, do you see the component's interface as the right layer for capturing IO?

An activity must encapsulate something much higher level than a single IO operation. So something like "Configure BGP on a router", "Start a VM" etc. It needs to be able to handle retries and thus be idempotent.

Regarding performance, a workflow execution can call 500-700 child executions serially, or around 1400 child executions concurrently per second.

jcmfernandes · 2025-04-09T23:11:49 1744240309

> An activity must encapsulate something much higher level than a single IO operation. So something like "Configure BGP on a router", "Start a VM" etc. It needs to be able to handle retries and thus be idempotent.

I was referring to the workflows, that is, writing the workflows in managed languages, not the activities.

Out of curiosity, are you working full-time on this? I'm working part-time on the same problem, looking to go full time soon, and it's interesting to see how the same ideas are popping up somewhat independently across different projects :) let me know if you're interested in chatting!

tomasol · 2025-04-09T23:33:17 1744241597

> I was referring to the workflows, that is, writing the workflows in managed languages, not the activities.

Ah understood. I have no plans supporting native workflow executors.

> Out of curiosity, are you working full-time on this?

Yes, currently I work on it full time.

> I'm working part-time on the same problem, looking to go full time soon, and it's interesting to see how the same ideas are popping up somewhat independently across different projects :)

Nice website! I also see the ideas of determinism, replayability etc more and more.

> let me know if you're interested in chatting!

Sure, my email is visible in the Git commit history.

emgeee · 2025-03-27T02:28:16 1743042496

great point, thanks for sharing

emgeee · 2025-03-12T20:30:26 1741811426

I've used uv to work on the feast feature store project to great success

emgeee · on Feb 14, 2025

I never really thought about this perspective but in some ways it makes sense. I think the ironic part is that LinkedIn now provides built-in AI tools that make you sound more like a bot.

Maybe they could fingerprint slop generated with they tools and allow it through to incentivize upgrading

soco · on Feb 14, 2025

But "our" bots are always the good ones. Why does this sound like literature...

emgeee · on Dec 22, 2024

Checkout warpstream (recently acquired by confluent)

emgeee · on Nov 13, 2024

awesome, happy to answer any questions and would love any feedback!

emgeee · on Nov 12, 2024

Commenting code and generating documentation.

I like to copy entire python modules into the context window and say something like "add docstrings to all methods, classes, and functions".

You can then feed the code into something like sphinx or pdoc to get a nice webpage.

emgeee · on Nov 12, 2024

I've found them incredibly useful for writing Dockerfiles or other bits of infra config like K8s yaml