Enforceable Human-Readable Transactions: how to solve Bybit-like hacks

1° That was not my point. My point was that creating a malicious contract that a user might trust is now easier than before with your description. Since the description would be correct, the underlying code could, of course, do something else.

2° I looked it up, and Ledger actually has such a system already. They call it clear signing with EIP-712. It works similarly to what I suggested—no guesswork needed. However, I think they use a central database to query on the fly to save space on the device. If they stored it entirely on the device for some high-stakes contracts, it could work fine along with some contract address whitelisting, even with the correct translation for the user.
www ledger com / blog / securing-message-signing

Probably, the lack of standards, frequent upgrades, and extensibility for these existential use cases has prevented hardware manufacturers from implementing this correctly. Which was my point all along—if you read my previous posts—that these existential use cases should be standardized.

I’m sorry, but reading this reply, specially the second point, makes me think you haven’t read the OP.

Can you describe exactly why you think this is the case? What are you imagining are the steps an attacker would take, the user would take, and how this proposed change would have an impact on that process.

The change here does not propose anything that would alter the mechanism by which a contract establishes initial trust with the user and convinces them to send their money to the contract in the first place. It only makes it so once a user has entrusted a contract with their money they can trustlessly interact with it from an offline device without needing to worry about man in the middle attacks like we saw with ByBit.

1 Like

@GCdePaula Have you read the Ledger documentation? This is exactly what Ledger has been able to do for years by merely using EIP-712, and @MicahZoltu claimed was impossible. Also, translations can easily be supported with that solution, whereas your proposal would struggle significantly with heavy string manipulations. If hardware wallets implemented this entirely on the device with some whitelisting of contract addresses, and if every signer used a different brand hardware wallet, this would surely have prevented the Bybit hack. We don’t need even more complexity that increases the attack surface and requires extra signing—we need more standardization for these high-stakes use cases.

@MicahZoltu I think I’ve explained it pretty well in three posts already.

We simply disagree, and that’s fine. I’ll leave it at that.
Best of luck with your proposal.

If you had read the post, you’d know about EIP-712 (typed data) and clear signing, why it is not enough alone, and that my proposal uses typed data and clear signing as part of the solution.


I don’t know where you’re getting this. The K&R from 1978 could already format strings to the necessary degree. The added compute costs and mitigations are addressed in the original post.


The users interacted with the correct address, but signed a transaction (to the correct address) with a spoofed payload. It’s not possible that a transaction sent to contract A can make contract A change the state of contract B, as @MicahZoltu has already explained.


Not really, no.


I think this is the end of the conversation from my side. Cheers!

After this, I think we should note that this proposal is not limited to transactions with 712 signatures. Normal transactions can also benefit by adding the description text field as the first argument of every public function.

1 Like

As of today, I don’t think this proposal makes sense to me, but I have some questions as to my understanding.

I think the only way to add human-readable transactions will be to:

  1. Decode the calldata (as nested and far down as possible)
  2. Have a local AI/LLM transcribe the calldata to the human operator

At this time, I think embedding such a description directly into the application is:

  1. Redundant
  2. Because it is redundant, it is needlessly gas-inefficient
  3. Potentially even misleading

Redundant and needlessly gas-inefficient

It is redundant because the calldata already has all the information required, which means a user would essentially be signing the same thing twice (basically in 2 different “languages” one being the “calldata language” and another being the “human language”). It is potentially misleading because language is constantly evolving and often very confusing.

Potentially misleading

Let’s say I want to do the following:

  1. Approve my ERC20 token to be deposited into Aave with the approve function
  2. Deposit my ERC20 into Aave using the supply function
  3. Using a batch transaction with my Safe{Wallet} smart contract wallet

Typically, this would require an EIP-712 message to a user’s wallet for signing. How would you suggest this works? I’ll lay out what I think it looks like based on your outline of the technique here: (Please correct me if I’m wrong here)

UI queries the application’s canonical decoder and populates the description field

Is the application’s canonical decoder the smart contract? So would the Safe{Wallet} contract need a new function called “decoder”, which would do… what exactly? Would every smart contract need to decode calldata → a string? This seems like a MASSIVE amount of gas.

Then, since the Safe{Wallet} is just part of the transaction, it would then need to call the description function of the Aave contract, assuming one such exists, otherwise it would fail? And by convention, we’d have to tell users that they should not interact with any contract that has not correctly implemented this?

Then, on the topic of language, for the Aave contract, let’s say the calldata is decoded to:

this transaction will supply USDC to the Aave contract to gain yield on behalf of you

One could argue this is not specific enough, so we may need to change this to:

this transaction will supply the USDC token at address XXX to the Aave contract at address YY to gain ZZ yield on behalf of you at address AAA

But even this is misleading… What if you get front run and the interest is different? Would it then need to be:

this transaction will supply the USDC token at address XXX to the Aave smart contract at address YY to gain ZZ yield on behalf of you at address AAA, unless you get front run in which case you will get BBB interest...

And then, there are many cases where language evolves, and maybe in 10 years we have changed the name smart contract to mean something else, does someone need to upgrade the contract because of this?

And this is just the Aave contract, this isn’t even including the description from the Safe{Wallet} contract. I expect these “human-readable” descriptions people need to sign would turn into massive apple iOS terms of service that people ignore and sign - and we are back to the original problem of people not verifying what they are signing.

If the intent is to hash the description for people to sign to save gas, then yes, it wouldn’t be as gas horrendous as I thought, but it would still cost extra gas (as the OP pointed out), but without the security guarantees intended.

More thoughts

I think the intent of this is to enshrine a human-readable description into the smart contracts. I feel like what might be nicer, would be to have external solidity functions return a string description for all functions, but then we are back to the gas issue, which we could probably mitigate by pushing developers to make these concise, and then have some supporting documentation to explain what they mean? But it feels like we are probably still being quite redundant here…

Instead

Going through @MicahZoltu’s github I came across theintercepter, which looks more appropriate to what I think we should do.

Calldata already has all the information we need to understand the transaction. It might be more interesting to upload a documentation contract to the blockchain with just straight text than an LLM can read and use to help explain calldata. Then, all contracts can have a getDocs function which returns the documentation. Or, we could do something similar to an NFT like getDocsURI, which returns a URI to the robot-enabled documentation.

Thinking through this out loud, I sort of like this idea more. From a security perspective, this has issues - “what if the LLM does something stupid?” but I think if we are already nervous about people not being able to understand decoded calldata, then I think it’s reasonable to assume we could standardize an LLM to be able to decipher the data, and those who are extra security conscious can still verify the data themselves…

The problem with approaches like this, and The Interceptor, is that they require substantial hardware to run and they need direct access to the EVM to verify. Ideally, we want to make it so an offline low-power device (e.g., hardware wallet or AirGap.it mobile device) can present data to the user that doesn’t require they trust the device that the data was sourced from. A user should be confident that as long as their offline signing tool is secure, they are secure, even if their online device is fully compromised.

An ideal solution is one that allows the online device to provide some metadata to the offline device and then the offline device signs over that and provides it to the contract. This class of solution makes it so the offline device doesn’t need to trust anything, and all they need to do is have the ability to render a string from a template and hash some stuff.

Doesn’t a QR code (or any type of fast scannable scheme we choose) solves this? You get the QR first screen of the hardware wallet sign sequence.

  • Can be read/scanned on air-gapped version of some app on a bigger device or a dedicated reader. (even one not air-gapped first if you want to have a rapid sense of danger)
  • Easy to see simulation results if any.
  • Readers should be cross wallets compatible for more trust.
  • Maybe readers are also built in current wallet apps.
  • Other screens are the tx bytes because people love lists of bytes.

It’s probably not trustless but it’s a step forward I would have liked if the Bybit signers had such a process to go through

  • Ledger Nano X : 128 x 64 pixels , depending if the tx is too large but like 600bytes maybe of data outputable (needs to consider padding and such.)
    Can animate QR code if needed (in loop mode over the QR chunks until user interaction)

most TXs wouldn’t go over 5-10kb of calldata

Simulating on an airgapped device in a trustless way is very hard.

I don’t think anyone loves lists of bytes?

Anything that is not trustless would not have protected ByBit. The ByBit attack was incredibly sophisticated and it caused the signser to see an attacker provided payload on their online machine. The only way to protect against this would be for the offline signing device to trustlessly present the data in a human readable way on the offline signing device directly. Anything that involves trusting the online device is susceptible to ByBit style attacks.

Ah good point.

I feel like @Elyx0’s QR code point holds water, though. Basically, you base64 encode the data for the transaction and show it as a QR code. Then, someone can use whatever device they want to make sure the transaction is good.

You could have a second QR code “decoder” hardware device, or some hardware devices could opt in to being connected to the internet to help decode and “translate” the data.

To me, it feels like we either:

  1. Show a QR code to make it easier to extract the data off hardware devices. This could be base64 encoded, so the security assumptions would be the same as normally showing the screen itself, and just make sure the base64 encoding is good.

  2. We hash the data to a calldata digest or EIP-712 digest and let the user compare it to an expected one.

  3. We have smart contract developers add a new function to their contracts where they hash a description that users have to sign in addition to their original transactions.

TBH, I feel like #1 is actually the best option, #2 is less engineering lift for hardware wallets (and they can still do #1), and #3 is redundant, and in my opinion in the long run is more cumbersome, and gas ineffective.

I’d imagine if we go with #3, eventually descriptions will get so long and confusing, that we will just wind up back here because no one reads them because they are too confusing anyways. My argument here is that now that I “can read” calldata (I feel like I’ve learned the language of the hex) it feels very weird for me to have to sign another transaction which has essentially the same information just “in another language”. Granted, the calldata doesn’t have the ABI/documentation, but then I’d still rather do like a docsURI feature and let the user find that.

Hardware wallets already have a 2-way communication channel with at least one device. Why not use that communication channel instead of a separate one (QR codes)?

Regardless of the communication channel used to send the transaction data to an external device, the core problem is still that any online device that the offline hardware wallet is communicating with may be compromised. The offline device is the only device that can be “trusted” to present data directly to the user, as any other intermediate online device can be compromised.

We should be targeting regular everyday users, not powerusers that learn to read hex. Regular users can read basic English if you are lucky, some other language if you are unlucky. Anything that is presenting ABIs, function calls, hex strings, big numbers, etc. to a user are effectively unreadable for the vast majority of potential users.

2-way communication channel with at least one device

The reason we are showing the QR code is that we are nervous the original device is compromised. I’m not sure it makes sense to do our verification with the same device.

any online device that the offline hardware wallet is communicating with may be compromised

Agreed.

We should be targeting regular everyday users

Agreed, but we have a mismatch of expectations. I do not think showing a description that a user has to sign will fix the problem for them.

I do not think adding human-readable text to a transaction will actually solve much.
Decoded calldata is basically 90% of the way there (maybe even more since it’s so specific). The user just needs to understand what each parameter means, and it’s much less effort on the smart contract developers and the entire ecosystem. I think when we start getting into specific examples, we see where this will start to break down.

Example

This is a decoded supply transaction on the Aave contract.

supply(address asset, uint256 amount, address onBehalfOf, uint16 referralCode):
- asset: 0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48
- amount: 1000000000000000000 [1e18]
- onBehalfOf: 0xF8Cade19b26a2B970F2dEF5eA9ECcF1bda3d1186
- referralCode: 0

And here is what it “probably” would be with the text description idea:

“You are depositing 1000000000000000000 [1e18] of asset 0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48 onBehalfOf 0xF8Cade19b26a2B970F2dEF5eA9ECcF1bda3d1186. This will return you the aToken associated with 0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48, which will gain yield in your wallet.”

But the user has to also sign this, even though the decoded calldata tells you exactly this already? I don’t see how this is advantageous. The only nice part about this is the additional part of This will return you the aToken associated with.... However, a regular user should probably still refer to the documentation to understand everything that’s going on here. For example, a normal user will go:

  • “1000000000000000000 tokens! That’s too many!”

So they need to do some research to understand decimals in ERC20s.

  • 0xF8Cade19b26a2B970F2dEF5eA9ECcF1bda3d1186: yeah that looks like my address…

Now we are back to the user potentially falling victim to an address poisoning attack!

The users (IMO) will have the same questions with the “signed description” as the decoded calldata. A hardware wallet could very easily have a list of known ABIs (many already do) to decode the calldata for the user.

Bigger Example

Now image a user wants to do the same thing (supply a token to Aave) but through a safe wallet, and batch the approval first.

Full (example) calldata:

0x6a761202000000000000000000000000f220d3b4dfb23c4ade8c88e526c1353abacbc38f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000140000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000034000000000000000000000000000000000000000000000000000000000000001c48d80ff0a00000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000172005a7d6b2f92c77fad6ccabd7ee0624e64907eaf3e00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000044095ea7b300000000000000000000000078e30497a3c7527d953c6b1e3541b021a98ac43c000000000000000000000000000000000000000000000002b5e3af16b18800000078e30497a3c7527d953c6b1e3541b021a98ac43c00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000084617ba0370000000000000000000000005a7d6b2f92c77fad6ccabd7ee0624e64907eaf3e000000000000000000000000000000000000000000000002b5e3af16b18800000000000000000000000000009467919138e36f0252886519f34a0f8016ddb3a300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000041000000000000000000000000f8cade19b26a2b970f2def5ea9eccf1bda3d118600000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000

Here is the (nested) decoded calldata. So the multiSend contract accepts “packed” bytes to save gas, and I have to decode the bytes sent to that contract differently than a normal decoder.

function execTransaction(
        address to,
        uint256 value,
        bytes data,
        Enum.Operation operation,
        uint256 safeTxGas,
        uint256 baseGas,
        uint256 gasPrice,
        address gasToken,
        address payable refundReceiver,
        bytes signatures
    )"
- to: 0xf220D3b4DFb23C4ade8C88E526C1353AbAcbC38F
- value: 0
- data: 
---- (decoded A): "multiSend(bytes)":
-------- (decoded B [TX 1]): 
------------ requiredValue: 0x00
------------ to: 5a7d6b2f92c77fad6ccabd7ee0624e64907eaf3e
------------ value: 0
------------ dataLength:  68
------------ (decoded C 1): "approve(address,uint256)":
---------------- spender: 0x78e30497a3c7527d953c6B1E3541b021A98Ac43c
---------------- amount: 50000000000000000000 [5e19]
-------- (decoded B [TX 2]): 
------------ requiredValue: 0x00
------------ to: 0x78e30497a3c7527d953c6b1e3541b021a98ac43c
------------ value: 0
------------ dataLength: 132
------------ (decoded C 2): "supply(address,uint256,address,uint16)"
---------------- asset: 0x5A7d6b2F92C77FAD6CCaBd7EE0624E64907Eaf3E
---------------- amount: 50000000000000000000 [5e19]
---------------- onBehalfOf: 0x9467919138E36f0252886519f34a0f8016dDb3a3
---------------- referralCode: 0
- operation: 1
- safeTxGas: 0
- baseGas: 0
- gasPrice: 0
- gasToken: 0x0000000000000000000000000000000000000000
- refundReceiver: 0x0000000000000000000000000000000000000000
- signatures: 0x000000000000000000000000f8cade19b26a2b970f2def5ea9eccf1bda3d1186000000000000000000000000000000000000000000000000000000000000000001

How do you possibly turn this into English while making it easier for someone to understand? And since there are nested calls between contracts, does the final description have to make a lot of view calls between contracts?

Here is what AI (Claude Sonnet 4) told me:

“”"
This is a DeFi lending transaction that performs two operations in sequence:

  1. Token Approval: The wallet first approves a lending protocol (contract 0x78e30497...) to spend 50 tokens (50,000,000,000,000,000,000 wei) of a specific ERC-20 token (contract 0x5a7d6b2f...).
  2. Supply/Deposit: The wallet then calls the lending protocol’s supply function to deposit those same 50 tokens into the protocol on behalf of address 0x9467919138....

In summary: This transaction is depositing 50 tokens into a DeFi lending protocol (likely Aave or a similar platform) to earn yield. The tokens will be supplied to the lending pool on behalf of the specified beneficiary address.

The transaction uses Safe’s multiSend functionality to batch these two operations together atomically - meaning both the approval and the supply happen in a single transaction, or neither happens at all.

The signature indicates this transaction was authorized by the Safe wallet owner at address 0xf8cade19....
“”"

IMO this is the easiest possible explanation that could be given (because an AI is understanding it on a wholeistic level), and it still is confusing. As transactions get bigger and bigger, these walls of text will get worse and worse. AND there are still a lot of “gotchas” here.

  1. The user isn’t informed of the refund address if the transaction fails, should we explain that?
  2. The user isn’t informed of the gas token, do we add that to the explainer
  3. call vs delegatecall isn’t explained

Each contract would have their own different explainer, making the “final” transaction very confusing. I don’t see a world where this helps. Personally, I think transactions as of today are technical in nature, and security people at least should be able to understand them before we even consider the masses.

If all hardware wallets will have LLMs in them in the future, then great, we should still have a way for security concisous people to have higher assurance of the transactions they are sending.

Summary

Arguments against English transactions enshrined in the smart contracts

Argument 1: Human eyes

Users don’t have a good way to extract data from their wallets, therefore, their human eyes are highly likely to mischeck something (an address, a 0 on a number, etc) even if we have an English description of the transaction.

Argument 2: Gas

Signing multiple transactions is a waste of gas, not to mention it puts a lot of extra effort to the smart contract developers.

Solutions

Big one

No matter how you look at it, as of today, we need hardware wallets to have a better way to extract the data from them.

Let’s walk before we run. Right now, security researchers have a hard time verifying information on hardware wallets. I know because most people are not getting perfect scores on my wise-signer test.

Solutions

Based on argument 1, no matter how you slice it, we need a way to extract data from a hardware device, or make verification easier. Also, users should probably check the documentation of the tools either way, because descriptions are often confusing. The two proposed solutions:

  1. A QR code with all the data (IMO this is the best option, then a user could easily do #2 here)
  2. A digest where all the data is hashed and a user can compare it to an expected hash.

In both solutions, wallets could decode them, or we could encourage users to go to the documentation.

All of this being said:

  1. We should have tools that convert transactions to text
  2. I don’t think embedding this in the smart contracts as a transaction itself makes sense

Let me clarify the goals of this proposal. First, as is typical in engineering, there are no silver bullets or panaceas. Likewise, it’s impossible to build foolproof systems that are user-facing (such as signing interfaces)—users will always be able to shoot themselves in the foot.

This proposal specifically addresses attacks like the one ByBit suffered, which targeted their signers, who were most definitely tech-savvy. The aim is to remove trust in front-ends without requiring protocol changes or specialized signing devices. App developers adopting this approach only need some programming effort, and their users need a trusted signing device that supports EIP-712.

I don’t think the fact that languages evolve, nor the impossibility of fully educating users, are real issues here. Again, the goal is to remove the requirement to trust front-ends, as imposed by blind signing.

The only (huge) issue is gas costs, and I agree this is significant. This technique (while not exclusive to them) was designed for app-specific rollups, which should have more available blockspace to compensate. In that context, the trade-off might be worth it to avoid losing $1 billion.

Let’s look at it practically though, for this transaction (this is a valid transaction on the ZKsync Era chain btw, all addresses are real.)

For this example:

Full calldata:

0x6a761202000000000000000000000000f220d3b4dfb23c4ade8c88e526c1353abacbc38f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000140000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000034000000000000000000000000000000000000000000000000000000000000001c48d80ff0a00000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000172005a7d6b2f92c77fad6ccabd7ee0624e64907eaf3e00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000044095ea7b300000000000000000000000078e30497a3c7527d953c6b1e3541b021a98ac43c000000000000000000000000000000000000000000000002b5e3af16b18800000078e30497a3c7527d953c6b1e3541b021a98ac43c00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000084617ba0370000000000000000000000005a7d6b2f92c77fad6ccabd7ee0624e64907eaf3e000000000000000000000000000000000000000000000002b5e3af16b18800000000000000000000000000009467919138e36f0252886519f34a0f8016ddb3a300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000041000000000000000000000000f8cade19b26a2b970f2def5ea9eccf1bda3d118600000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000

The transaction does the following:

  1. User called execTransaction on their Safe{Wallet} with a payload that:
  2. Sends to the multiSend contract, with a payload that:
  3. Calls approve on the ZK token on the ZKsync Era chain
  4. Calls supply on the Aave contract on the ZKsync Era chain

There are four contracts involved. How do you propose that the description be created? And what would the description be?

1 Like

First, do you agree that there’s a subset of attacks (including the one of ByBit) that could be prevented by this technique?

This is why I’m asking what such a description would look like for this complex transaction.

At this time, I’m not convinced the technique you laid out in the OP would help, and I think the one that myself and @Elyx0 laid out would have bigger impact with less work.

The transaction I proposed is a real-world scenario, and relatively uncomplicated compared to some of the other batch transactions I’ve seen. If your technique can actually simplify it, I’m happy to reverse my opinion.

I love constructive discourse; this is how we reach a better consensus.

1 Like

If an attacker can compromise one online device, why do you think they won’t be able to compromise the other online device? If the user has an online device that is lower risk (e.g., they don’t download game cracks off MegaUpload with it), why don’t they just use that device for the whole transaction end-to-end, why involve the high risk device at all?

This is not at all what I am imagining. I’m imagining something like the following:

Supply 1 USDC to Aave for lending in exchange for yield.

Through a SAFE wallet:

Use your safe to do the following two operations in sequence:

  1. Approve Aave to withdraw 1 USDC.
  2. Supply 1 USDC to Aave for lending in exchange for yield.

The above is possible with this proposal. I think also possible with EIP-712 (a bit of a question mark on fetching data from external contracts trustlessly, possible with advanced hardware wallets that can do merkle proof and block verification if they have an anchor, which is a non-trivial problem).

1 Like