I was experimenting with snark proof generation inside a browser for mobile phones and made a benchmark: I’ve tested a simple circuit (5 rounds of pedersen commitment) written on Circom and ported to browser using browserify, and the same circuit written on Bellman and compiled to WebAssembly; I also compared them to their native counterparts.
MacBook Core-i7 @ 2.6Ghz circom (node.js): 53685ms circom (chrome): 53266ms circom (websnark): 1408ms bellman (webassembly): 1151ms bellman (rust, 1 core): 235ms bellman (rust, 8 cores): 85ms iPhone XS bellman (webassembly): 1126ms
As we can see Bellman compiled to wasm is already 50x faster and still has a room for more than 10x improvement as WebAssembly execution speed should be pretty close to native speeds when everything is done right. The full support for wasm compilation was recently merged into Matter labs fork of Bellman, and it was uploaded to crates.io and can now be incuded as
bellman_ce, you can see wasm-bellman test repo for example usage. On modern phones the execution speeds are similar to laptop speeds, so in the current state wasm compilation seems to be good enough for many simple snark proofs on mobile.
The main bottleneck is that when compiled to wasm Bellman is single threaded like Circom. If we take advantage of multiple CPU cores (most modern phones have 8), we can speed this up significantly. WebAssembly in Chrome already supports multiple threads with shared memory as experimental feature, so we are waiting when the rust compiler will support it (alternatively it can be done via importing WebWorker api). Additionally, the browser profiler ,  shows that most CPU time is spent in __multi3 that is called from mul_assign followed by internals of mul_assign itself, so maybe there are some inefficiencies in int64 multiplication implementation on wasm. When compiled to native mobile app, Bellman can already take advantage of all the cores and calculate proofs even faster than in browser. @shamatar from Matter labs did a few tests on that and got some impressive results.
We currently work on Circom -> Bellman export so that we can combine the ease of use of Circom language and performance and portability of Bellman. The goal is to make a toolchain that allows you to write snarks in Circom and get super fast .wasm + js bindings as compilation result, hiding the complexity of Bellman under the hood.
The circuit verifies that a hash corresponds to a private 256 bit preimage after taking 5 rounds of pedersen commitment. Resulting circuits on Circom and Bellman have similar number of constraints:
3459 bellman 3540 circom
Circom has a little bit more partially because it calculates hash in 256 bit field compared to 254 bit in Bellman, but the difference is negligible.
The tests were done on a 5 year old laptop with 2.6GHz Core i7 CPU, and an iPhone XS.
In this last repo you can see an example of how to build your snark that will work within a browser.
update: added websnark benchmark for the same circuit