Trustodial: An Ontological Dilemma

Bitcoin Magazine

Trustodial: An Ontological Dilemma A lot of criticism has been circulating after the recent announcement that Wallet of Satoshi will be returning to the United States shortly thanks to the integration of Lightspark’s recent “Spark” system, specifically focusing around the issue of trust models and whether the new version of Wallet of Satoshi constitutes a noncustodial wallet or not.  Spark is a system based on statechains (explainer article there). Statechains don’t have the most clear cut trust model. Spark is essentially the channel factory version of statechains, with numerous statechains nested inside of a transaction tree built on a single on-chain UTXO.  Statechains are a Layer 2 system that allow entire UTXOs to be freely transferred off-chain with no liquidity constraints, but with the requirement of accepting some trust tradeoffs. You must trust that an operator, the service provider essentially, will delete private key material every time the statechain is transferred.  So let’s look at what makes something noncustodial. 

  • A user has unilateral control over their funds, or the ability to regain it. 
  • No other party (or parties) has the ability to prevent the user from spending their funds, or regaining their ability to, or to spend them without the involvement of the user. 
The first quality definitively applies to statechains. Just like a Lightning channel a user has the ability to use a pre-signed transaction to reclaim their funds after a timelock period to ensure honest settlement. The second quality isn’t so clear cut in terms of applying or not applying.  The statechain protocol requires the operator and original user to collaboratively generate a key that neither party ever has full knowledge of. Using their shares they can collaborate to pre-sign the users withdrawal transaction. When the original user transfers it to someone else, the original user, new user, and operator all collaborate to “regenerate” the same key but with a different set of shares between the new user and operator.  After signing the new user’s withdrawal transaction, the operator is then supposed to delete the share they generated with the original users. This prevents the operator from ever signing a new transaction with the original user, and the shorter timelock on the new user’s transaction guarantees that they can spend theirs before the original user can spend his.  If the operator does not delete old key shares, then it would be possible for them to collaborate with any past user who kept their key share to steal the funds in the statechain. The Operator If the operator is doing what they are supposed to and deleting their old key shares every time the statechain is transferred, they are not a custodial system. They physically are incapable of signing any transactions in collaboration with anyone except the current and rightful owner of the statechain. The pre-signed transactions decrementing timelock guarantees that the current owner can always confirm their withdrawal transaction before any previous owner.  Operators can even run their software in an SGX enclave or other secure computing environment, and have the enclave enforce the correct behavior of the software. It can even provide proofs (granted you trust the environment to not be broken) of this that others can verify.  They also have a strong incentive to operate the protocol honestly, because in doing so they are not required to comply with the regulations that come along with being a custodial service holding other people’s money.  The Users End users have a unilateral withdrawal transaction. This can be used any time after the timelock for their ownership expires and before the timelock for the previous owners time window expires. If the operator stops responding or disappears, they have this option.  But they have to trust that the operator is operating the protocol honestly, and deleting past key shares. There is no way for them to really verify that. As mentioned above, something like the SGX enclave could handle security for the operator’s software and sign proofs it is running honest software. But all that is doing is moving the point of trust away from the operator and onto Intel, the makers of the SGX enclave.  Even when dealing with a truly honest operator, who has only ever run honest software and never cheated a single user, a user can never actually know that they are an honest operator. They can only see that the operator has been honest, and hope they will continue to be.  So….? There is no real clear cut answer. In the situation where an operator is actually being honest, it fits all the criteria I laid out above to be noncustodial. The user has an unimpeded ability to gain full access to their funds, and no one else is able to stop them from doing that or steal their funds.  The problem is that it isn’t verifiable.  There is no way to trustlessly verify as a user that you have trustless control over your funds. Even if you actually do.  So there is a problem with labeling it as noncustodial, because even if it is it is not possible for a user to ever truly verify it. But there is also a problem with calling it custodial, because the operator cannot do anything to move funds without collaborating with another user and the current user has a unilateral withdrawal transaction. This creates a dilemma in terms of categorizing tools in the space.  I don’t know what the solution is, but the first step I think is acknowledging the technical realities occurring before jumping to label things one way or another (why not a new category?) because of your own incentives. These types of questions, especially in an environment of glacially slow Bitcoin protocol changes, will become more frequent as developers struggle with the trade offs of Bitcoin’s current limitations. Bitcoin is a programmable money, and the ways people will program it won’t always fit neatly into our predefined boxes.  This post Trustodial: An Ontological Dilemma first appeared on Bitcoin Magazine and is written by Shinobi.

The Scroll: A Brief History of Wallet Clustering

Bitcoin Magazine

The Scroll: A Brief History of Wallet Clustering Our previous post in this series introduced the basic idea behind wallet or address clustering, the trivial case of address reuse, and the merging of clusters based on the common input ownership heuristic (CIOH), also known as the multi-input heuristic. Today, we’ll expand on more sophisticated clustering methods, briefly summarizing several notable papers. The content here mostly overlaps with a live stream on this topic, which is a companion to this series. Note that the list of works cited is by no means exhaustive. Early Observational Studies – 2011-2013 As far as I’m aware, the earliest published academic study that deals with clustering is Fergal Reid and Martin Harrigan’s An Analysis of Anonymity in the Bitcoin System (PDF). This work, which studies the anonymity properties of bitcoin more broadly, in its discussion of the on-chain transaction graph, introduced the notion of a “User Network” to model the relatedness of a single user’s coins based on CIOH. Using this model, the authors critically examined WikiLeak’s claim that it “accepts anonymous Bitcoin donations.” Another study that was not published as a paper was Bitcoin – An Analysis (YouTube) by Kay Hamacher and Stefan Katzenbeisser, presented at 28c3. They studied money flows using transaction graph data and made some remarkably prescient observations about bitcoin. In Quantitative Analysis of the Full Bitcoin Transaction Graph (PDF), Dorit Ron and Adi Shamir analyzed a snapshot of the entire transaction graph. Among other things, they note a curious pattern, which may be an early attempt at subverting CIOH: We discovered that almost all these large transactions were the descendants of a single large transaction involving 90,000 bitcoins [presumably b9a0961c07ea9a28…] which took place on November 8th, 2010, and that the subgraph of these transactions contains many strange looking chains and fork-merge structures, in which a large balance is either transferred within a few hours through hundreds of temporary intermediate accounts, or split into many small amounts which are sent to different accounts only in order to be recombined shortly afterward into essentially the same amount in a new account. Another early confounding of this pattern was due to MtGox, which allowed users to upload their private keys. Many users’ keys were used as inputs to batch sweeping transactions constructed by MtGox to service this unusual pattern of deposits. The naive application of CIOH to those transactions resulted in cluster collapse, specifically the cluster previously known as MtGoxAndOthers on walletexplorer.com (now known as CoinJoinMess). Ron and Shamir seem to note this, too: However, there is a huge variance in [these] statistics, and in fact one entity is associated with 156,722 different addresses. By analyzing some of these addresses and following their transactions, it is easy to determine that this entity is Mt.Gox Although change identification is mentioned (Ron & Shamir refer to these as “internal” transfers), the first attempt at formalization appears to be in Evaluating User Privacy in Bitcoin (PDF) by Elli Androulaki, Ghassan O. Karame, Marc Roeschlin, Tobias Scherer, and Srdjan Capkun. They used the term “Shadow Addresses,” which these days are more commonly referred to as “change outputs.” This refers to self-spend outputs, typically one per transaction, controlled by the same entity as the inputs of the containing transaction. The paper introduces a heuristic for identifying such outputs to cluster them with the inputs. Subsequent work has iterated on this idea extensively, with several proposed variations. One example based on the amounts in 2 output transactions is if an output’s value is close to a round number when denominated in USD (based on historical exchange rates), that output is likely to be a payment, indicating the other production is the change. This early phase of Bitcoin privacy research saw the theory of wallet clustering become established as a foundational tool for the study of Bitcoin privacy. While this wasn’t entirely theoretical, evidential support was limited, necessitating relatively strong assumptions to interpret the observable data. Empirical Results – 2013-2017 Although researchers attempted to validate the conclusions of these papers, for example, by interviewing Bitcoin users and asking them to confirm the accuracy of the clustering of their wallets or using simulations as in Androulaki et al.’s work, little information was available about the countermeasures users were utilizing. A fistful of bitcoins: characterizing payments among men with no names (PDFs: 1, 2) by Sarah Meiklejohn, Marjori Pomarole, Grant Jordan, Kirill Levchenko, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage examined the use of Bitcoin mixers, and put the heuristics to the test by actually using such services with real Bitcoin. On the more theoretical side, they defined a more general and accurate change identification heuristic than previous work. In his thesis, Data-Driven De-Anonymization in Bitcoin, Jonas Nick was able to validate the CIOH and change identification heuristics using information obtained from a privacy bug in the implementation of BIP 37 bloom filters, mainly used by light clients built with bitcoinj. The underlying privacy leak was described in On the privacy provisions of Bloom filters in lightweight bitcoin clients (PDF) by Arthur Gervais, Srdjan Capkun, Ghassan O. Karame, and Damian Gruber. The leak demonstrated that the clustering heuristics were rather powerful, a finding which was elaborated on in Martin Harrigan and Christoph Fretter’s The Unreasonable Effectiveness of Address Clustering (PDF). Attackers have also been observed sending bitcoin, not through a mixer as in the fistful of bitcoins papers, but small amounts sent to addresses that have already appeared on-chain. This behavior is called dusting or dust1 attacks and can deanonymize the victim in two ways. First, the receiving wallet may spend the funds, resulting in address reuse. Second, older versions of Bitcoin Core used to rebroadcast received transactions, so an attacker who was also connected to many nodes on the p2p network could observe if any node was rebroadcasting its dusting transactions and that node’s IP address to the cluster.2 Although Is Bitcoin gathering dust? An analysis of low-amount Bitcoin transactions (PDF) by Matteo Loporchio, Anna Bernasconi, Damiano Di Francesco Maesa, and Laura Ricci offered insights in 2023, exploring dust attacks, the data set they analyzed only extends to 2017. This work looked at the effectiveness of such attacks in revealing clusters: This means that the dust attack transactions, despite being only 4.86% of all dust creating transactions, allow to cluster 66.43% of all dust induced clustered addresses. Considering the whole data set, the transactions suspected of being part of dust attacks are only 0.008% of all transactions but allow to cluster 0.14% of all addresses that would have otherwise remained isolated. This period of research was marked by a more critical examination of the theory of wallet clustering. It became increasingly clear that, in some cases, users’ behaviors can be easily and reliably observed and that privacy assurances are far from perfect, not just in theory but also based on a growing body of scientific evidence. Wallet Fingerprinting – 2021-2024 Wallet fingerprints are identifiable patterns in transaction data that may indicate using particular wallet software. In recent years, researchers have applied wallet fingerprinting techniques to wallet clustering. A single wallet cluster is typically created using the same software throughout, so any observable fingerprints should be fairly consistent within the cluster.3 As a simple example of wallet fingerprinting, every transaction has an nLockTime field, which can be used to post-date transactions.4 This can be done by specifying a height or a time. When no post-dating is required, any value representing a point in time that is already in the past can be used, typically 0, but such transactions haven’t been post-dated when they were signed. To avoid revealing intended behavior and address the fee sniping concern, some wallets will randomly specify a more recent nLockTime value. However, since some wallets always specify a value of 0, when it’s not clear which output of a transaction is a payment and which is change, that information might be revealed by subsequent transactions. For example, suppose all of the transactions associated with the input coins specify nLockTime of 0, but the spending transaction of one of the outputs does not, in this case it would be reasonable to conclude that output was a payment to a different user. There are many other known fingerprints. Wallet Fingerprints: Detection & Analysis by Ishaana Misra is a comprehensive account. Malte Möser and Arvind Narayanan’s Resurrecting Address Clustering in Bitcoin (PDF) applied fingerprinting to the clustering problem. They used it as the basis for refinements to change identification. They relied on fingerprints to train and evaluate improved change identification using machine learning techniques (random forests). Shortly thereafter, in How to Peel a Million: Validating and Expanding Bitcoin Clusters (PDF), George Kappos, Haaroon Yousaf, Rainer Stütz, Sofia Rollet, Bernhard Haslhofer and Sarah Meiklejohn extended and validated this approach using cluster data for a sample of transactions provided by a chain analytics company, indicating that the wallet fingerprinting approach is dramatically more accurate than only using CIOH and simpler change identification heuristics. Taking fingerprints into account when clustering makes deanonymization much easier. Likewise, taking fingerprints into account in wallet software can improve privacy. A recent paper, Exploring Unconfirmed Transactions for Effective Bitcoin Address Clustering (PDF) by Kai Wang, Yakun Cheng, Michael Wen Tong, Zhenghao Niu, Jun Pang, and Weili Han analyzed patterns in the broadcast of transactions before they are confirmed. For example, different fee-bumping behaviors can be observed, both via replacement or with child-pays-for-parent. Such patterns, while not strictly fingerprints derived from the transaction data, can still be thought of as wallet fingerprints but about more ephemeral patterns related to certain wallet software, observable when connected to the Bitcoin P2P network but not apparent in the confirmed transaction history that is recorded in the blockchain. Similar to the Bitcoin P2P layer, the Lightning network’s gossip layer shares information about publicly announced channels. This is not typically framed as a wallet fingerprint but might be loosely considered as such, in addition to the on-chain fingerprint lightning transactions have. Lightning channels are UTXOs, and they form the edges of a graph connecting Lightning nodes, which are identified by their public key. Since a node may be associated with several channels, and channels are coins, this is somewhat analogous to address reuse.5 Christian Decker has publicly archived historical graph data. One study that looks at clustering in this context is Cross-Layer Deanonymization Methods in the Lightning Protocol (PDF) by Matteo Romiti, Friedhelm Victor, Pedro Moreno-Sanchez, Peter Sebastian Nordholt, Bernhard Haslhofer, and Matteo Maffei. Clustering techniques have improved dramatically over the last decade and a half. Unfortunately, widespread adoption of Bitcoin privacy technologies is still far from being a reality. Even if it was, the software has not yet caught up to the state of the art in attack research. Not The Whole Story As we have seen, starting from the humble beginnings of address reuse and the CIOH described by Satoshi, wallet clustering is a foundational idea in Bitcoin privacy that has seen many developments over the years. A wealth of academic literature has called into question some of the overly optimistic characterizations of Bitcoin privacy, starting with WikiLeaks describing donations as anonymous in 2011. There are also many opportunities for further study and for the development of privacy protections. Something to bear in mind is that clustering techniques will only continue to improve over time. “[R]emember: attacks always get better, they never get worse.”6 Given the nature of the blockchain, patterns in the transaction graph will be preserved for anyone to examine more or less forever. Light wallets that use the Electrum protocol will leak address clusters to their Electrum servers. Ones that submit xpubs to a service will leak clustering information of all past and future transactions in a single query. Given the nature of the blockchain analysis industry, proprietary techniques are at a significant advantage, likely benefiting from access to KYC information labeling a large subset of transactions. This and other kinds of blockchain-extrinsic clustering information are especially challenging to account for since, despite being shared with 3rd parties, this information is not made public, unlike clustering based on on-chain data. Hence, these leaks aren’t as widely observable. Also, bear in mind that control over one’s privacy isn’t entirely in the hands of the individual. When one user’s privacy is lost, that degrades the privacy of all other users. Through the process of elimination, which suggests a linear progression of privacy decay, every successfully deanonymized user can be discounted as a possible candidate when attempting to deanonymize the transactions of the remaining users. In other words, even if you take precautions to protect your privacy, there will be no crowd to blend into if others don’t take precautions, too. However, as we shall see, assuming linear decay of privacy is often too optimistic; exponential decay is a safer assumption. This is because divide-and-conquer tactics also apply to wallet clustering, much like in the game of 20 questions. CoinJoins transactions are designed to confound the CIOH, and the topic of the next post will be a paper that combines wallet clustering with intersection attacks, a concept borrowed from the mixnet privacy literature, to deanonymize CoinJoins. 1 Not to be confused with a different kind of dust attack, such as this example analyzed taking clustering into account by LaurentMT and Antoine Le Calvez. 2 A notable and somewhat related attack on Zcash and Monero nodes (Remote Side-Channel Attacks on Anonymous Transactions by Florian Tramer, Dan Boneh and Kenny Paterson) was able to link node IP addresses to viewing keys by exploiting timing side channels on the P2P layer. 3 More precisely: fingerprint distributions should be consistent within a cluster, as some wallets deliberately randomize certain attributes of transactions. 4 Note for nLockTime to be enforced the nSequence value of at least one input of the transaction must also be non-final, which complicates things both for post-dating and in terms of the different observable patterns this gives rise to. 5 Channel funds are shared by both parties to the channel but the closing transaction resembles a payment from the funder of a channel. Dual-funded channels may confound CIOH, similarly to PayJoin transactions. 6 New Attack on AES – Schneier on Security This post The Scroll: A Brief History of Wallet Clustering first appeared on Bitcoin Magazine and is written by Yuval Kogman.

The Bitcoin Mempool: Relay Network Dynamics

Bitcoin Magazine

The Bitcoin Mempool: Relay Network Dynamics In the last Mempool article, I went over the different kinds of relay policy filters, why they exist, and the incentives that ultimately decide how effective each class of filter is at preventing the confirmation of different classes of transactions. In this piece I’ll be looking at the dynamics of the relay network when some nodes on the network are running different relay policies compared to other nodes.  All else being equal, when nodes on the network are running homogenous relay policies in their mempools, all transactions should propagate across the entire network given that they pay the minimum feerate necessary not to be evicted from a node’s mempool during times of large transaction backlogs. This changes when different nodes on the network are running heterogenous policies.  The Bitcoin relay network operates on a best effort basis, using what is called a flood-fill architecture. This means that when a transaction is received by one node, it is forwarded to every other node it is connected to except the one that it received the transaction from. This is a highly inefficient network architecture, but in the context of a decentralized system it provides a high degree of guarantee that the transaction will eventually reach its intended destination, the miners.  Introducing filters in a node’s relay policy to restrict the relaying of otherwise valid transactions in theory introduces friction to the propagation of that transaction, and degrades the reliability of the network’s ability to perform this function. In practice, things aren’t that simple.  How Much Friction Prevents Propagation Let’s look at a simplified example of different network node compositions. In the following graphics blue nodes represent ones that will propagate some arbitrary class of consensus valid transactions, and red nodes represent ones that will not propagate those transactions. The collective set of miners is denoted in the center as a simple representation of where transacting users ultimately want their transactions to wind up so as to eventually be confirmed in the blockchain.  This is a model of the network in which the nodes refusing to propagate these transactions are a clear minority. As you can clearly see, any node on the network that accepts them has a clear path to relay them to the miners. The two nodes attempting to restrict the transactions propagation across the network have no effect on their eventual receipt by miners’ nodes.  In this diagram, you can see that almost half of the example network is instituting filtering policies for this class of transactions. Despite this, only part of the network that propagates these transactions is cut off from a path to miners. The rest of the nodes not filtering still have a clear path to miners. This has introduced some degree of friction for a subset of users, but the others can still freely engage in propagating these transactions.  Even for the users that are affected by filtering nodes, only a single connection to the rest of the network nodes that are not cut off from miners (or a direct connection to a miner) is necessary in order for that friction to be removed. If the real relay network were to have a similar composition to this example, all it would take is a single new connection to alleviate the problem.  In this scenario, only a tiny minority of the network is actually propagating these transactions. The rest of the network is engaging in filtering policies to prevent their propagation. Even in this case however, those nodes that are not filtering still have a clear path to propagate them to miners.  Only this tiny minority of non-filtering nodes is necessary in order to ensure their eventual propagation to miners. Preferential peering logic, i.e. functionality to ensure that your node prefers peers who implement the same software version or relay policies. These types of solutions can guarantee that peers who will propagate something to others won’t find each other and maintain connections amongst themselves across the network.  The Tolerant Minority  As you can see looking at these different examples, even in the face of an overwhelming majority of the public network engaging in filtering of a specific class of transactions, all that is necessary for them to successfully propagate across the network to miners is a small minority of the network to propagate and relay them.  These nodes will essentially, through whatever technical mechanism, create a “sub-network” within the larger public relay network in order to guarantee that there are viable paths from users engaging in these types of transactions to the miners willing to include them in their blocks.  There is essentially nothing that can be done to counter this dynamic except to engage in a sybil attack against all of these nodes, and sybil attacks only need a single honest connection in order to be completely defeated. As well, an honest node creating a very large number of connections with other nodes on the network can raise the cost of such a sybil attack exorbitantly. The more connections it creates, the more sybil nodes must be spun up in order to consume all of its connection slots.  What If There Is No Minority?  So what if there is no Tolerant Minority? What will happen to this class of transactions in that case?  If users still want to make them and pay fees to miners for them, they will be confirmed. Miners will simply set up an API. The role of miners is to confirm transactions, and the reason they do so is to maximize profit. Miners are not selfless entities, or morally or ideologically motivated, they are a business. They exist to make money.  If users exist that are willing to pay them money for a certain type of transaction, and the entirety of the public relay network is refusing to propagate those transactions to miners in order to include them in blocks, miners will create another way for users to submit those transactions to them.  It is simply the rational move to make as a profit motivated actor when customers exist that wish to pay you money.  Relay Policy Is Not A Replacement For Consensus At the end of the day, relay policy cannot successfully censor transactions if they are consensus valid, users are willing to pay for them, and miners do not have some extenuating circumstances to turn down the fees users are willing to pay (such as causing material damage or harm to nodes on the network, i.e. crashing nodes, propagating blocks that take hours to verify on a consumer PC, etc.).  If some class of transactions is truly seen as undesirable by Bitcoin users and node operators, there is no solution to stopping them from being confirmed in the blockchain short of enacting a consensus change to make them invalid.  If it were possible to simply prevent transactions from being confirmed by filtering policies implemented on the relay network, then Bitcoin would not be censorship resistant. This post The Bitcoin Mempool: Relay Network Dynamics first appeared on Bitcoin Magazine and is written by Shinobi.