Designing Reliable Payment Systems: A Practical Low-Level Guide for Backend Engineers

At its heart, a payment application is a digital bridge. Imagine you are at a coffee shop. You tap your phone to pay for a latte. Behind that simple vibration on your device, a complex sequence of events occurs. Money doesn’t actually ‘move’ like a physical object; instead, a series of records are updated across different databases. One number goes down, another goes up. The primary goal of a payment system is to ensure that these numbers are updated accurately, securely, and reliably, even if the power goes out or the internet cuts out halfway through the process.

When we talk about Low-Level Design (LLD) for payments, we aren’t just talking about the ‘happy path’ where everything works perfectly. We are designing for the ‘unhappy path.’ We are designing for the moment a user clicks ‘Pay’ twice, or the moment a bank’s server takes too long to respond. In the world of finance, ‘almost correct’ is the same as ‘completely wrong.’ This guide focuses on how to build a system that handles money with the precision it deserves.

The Core Components of a Payment System

To build a robust payment application, we need to break it down into specialized building blocks. Each block has one specific job. This separation of concerns makes the system easier to debug, test, and scale.

The API Layer is the front door of your application. Its job is to receive requests from the mobile app or website, perform basic checks (like making sure the request is formatted correctly), and hand the work off to the internal services. It acts as a shield, ensuring that only valid requests enter your system.

The Payment Service is the brain. This is where your business logic lives. It decides which steps need to happen next. If a user wants to send money, the Payment Service coordinates with the other components to make sure the sender has enough money and the receiver is eligible to get it. It manages the lifecycle of a payment from start to finish.

The User and Account Service acts as the vault’s registrar. It keeps track of who the users are, what their current balances are, and what their limits are (for example, a user might only be allowed to spend $500 a day). It provides a quick way to check if a transaction is even possible before we start doing the heavy lifting.

The Ledger Service is perhaps the most critical part for auditing. While the Account Service tells you the current balance, the Ledger Service tells you the history. It records every single movement of money as a line item. If the Account Service says a user has $50, the Ledger Service should be able to prove it by adding up every deposit and subtracting every withdrawal since the account was opened.

The Database is the permanent memory. This is where we store the state of every transaction. We typically use a Relational Database (like PostgreSQL or MySQL) because they are excellent at handling ‘ACID’ properties, which ensure that a group of database operations either all succeed together or all fail together.

The Message Queue (like Kafka or RabbitMQ) is used for tasks that don’t need to happen instantly. For example, sending a confirmation email or updating a loyalty points balance doesn’t need to happen before the user sees the ‘Success’ screen. We put these tasks in a queue to be processed a few seconds later.

Finally, External Integrations are the bridges to the outside world. This includes connections to UPI providers, card networks (like Visa or Mastercard), or traditional banks. These are often the most unpredictable parts of the system because you don’t control their uptime or speed.

The Data Model: Transactions vs. Ledgers

When designing the database, it is tempting to just have a ‘Users’ table with a ‘balance’ column. However, in a real payment system, that is not enough. You need a detailed trail of breadcrumbs. We focus on three main tables: Users, Transactions, and Ledger Entries.

The Transactions Table records the ‘intent.’ When User A tries to send $10 to User B, we create a record here. This record stays with the payment through its entire life. It has a ‘Status’ column that moves from PENDING to SUCCESS or FAILED. This table is what the user sees when they look at their ‘Recent Activity’ screen.

The Ledger Table records the ‘fact.’ Every transaction usually results in at least two ledger entries: a debit from the sender and a credit to the receiver. This is based on double-entry bookkeeping. If User A sends $10 to User B, the ledger will show ‘-$10’ for User A and ‘+$10’ for User B. The sum of these two entries must always be zero. This makes it very easy to spot errors; if the sum isn’t zero, money has effectively ‘vanished’ or ‘appeared’ out of nowhere, which indicates a bug.

Why do we need both? Because a transaction might fail halfway through. The Transactions table tells us what we tried to do, while the Ledger table tells us what actually happened to the money. If a transaction is marked as SUCCESS in the Transactions table but has no corresponding entries in the Ledger, we know we have a data consistency problem that needs fixing.

The Step-by-Step Payment Flow

Let’s walk through a practical example: Alice wants to send $50 to Bob using a wallet app. This process needs to be handled in a very specific order to ensure no money is lost.

First, Alice’s app sends a request to the API Layer. The API checks if Alice is logged in and if the request is valid. Then, it passes the request to the Payment Service. The Payment Service immediately creates a record in the Transactions Table with a status of PENDING. This is vital; even if the system crashes in the next millisecond, we have a record that Alice tried to pay.

Next, the Payment Service asks the Account Service: “Does Alice have $50?” If yes, the service places a ‘hold’ on that money or proceeds to the debit. We then create a Ledger Entry to debit $50 from Alice’s account. This is the point of no return for the sender. Once the debit is successful, we attempt to credit Bob’s account. We create another Ledger Entry for +$50 for Bob.

Finally, once both ledger entries are written successfully, the Payment Service updates the Transactions Table status to SUCCESS. The user is then notified. If the credit to Bob fails (perhaps his account is frozen), the system must trigger a ‘Compensation’ flow, which means it must reverse the debit to Alice so she gets her money back. We never just leave the money in limbo.

Idempotency: Preventing Double Payments

One of the most common issues in payment systems is the ‘double-click’ problem. A user is on a slow elevator, clicks ‘Pay,’ nothing happens, so they click ‘Pay’ again. Without idempotency, they might be charged twice. Idempotency is a fancy word for a simple concept: no matter how many times you perform the same operation, the result should be the same as if you did it once.

We solve this using an Idempotency Key. This is a unique string (like a UUID) generated by the client (the mobile app) for every new payment attempt. When the API receives a request, it checks the database: “Have I seen this Idempotency Key before?”

If the key is new, we process the payment. If the key already exists, we don’t process it again. Instead, we simply return the result of the previous attempt. This ensures that even if the network fails and the app retries the request automatically, the user is only charged once. It is the single most important safety feature in any payment LLD.

Concurrency and Consistency

In a popular app, thousands of people might be moving money at the same time. What if Alice tries to send her last $50 to Bob and Charlie at the exact same time? If two different server threads read her balance as $50 simultaneously, they might both allow the transaction, leaving Alice with a balance of -$50. This is called a ‘race condition.’

To prevent this, we use Database Locking. When the system starts processing Alice’s payment, it ‘locks’ her row in the database. Any other process trying to read or update her balance has to wait until the first process is finished. This ensures that transactions are processed one after another for that specific user, maintaining a consistent and accurate balance.

Another approach is Optimistic Locking. Here, we don’t lock the row immediately. Instead, when we go to update the balance, we check: “Is the balance still what I thought it was when I started?” If it changed, we know someone else moved money in the meantime, so we abort the current attempt and try again. This is often faster for systems where users don’t frequently perform simultaneous actions.

Handling Failures Gracefully

In payments, failure is not an option, but it is a certainty. External APIs will time out, databases will occasionally be slow, and networks will flicker. The key is how you handle these interruptions. We use a combination of Retries and Compensations.

If an external bank API doesn’t respond, we don’t immediately tell the user ‘Failed.’ We might retry the request three times with a short delay in between. However, you must be careful: only retry if the error is ‘retriable’ (like a timeout). If the error is ‘Invalid Account Number,’ retrying won’t help.

If a transaction fails after the money has already been debited from the sender, we need a Compensation Flow. This is an automated process that ‘undoes’ the previous steps. If Alice was debited but Bob couldn’t be credited, the system must trigger a refund to Alice. This is often handled asynchronously using a Message Queue to ensure that even if the refund fails initially, the system keeps trying until Alice has her money back.

Asynchronous Processing for Better Performance

Not everything in a payment flow needs to happen while the user is staring at a loading spinner. The ‘Core Flow’ (debiting and crediting) must be synchronous because the user needs to know if the payment was accepted. However, ‘Secondary Tasks’ should be moved to the background.

For example, once a payment is successful, we might need to send a push notification, generate an invoice PDF, and update a marketing dashboard. If we did all of this in the main request, the user would be waiting for several seconds. Instead, we publish a ‘PaymentSuccess’ message to a Message Queue. Other small services listen to this queue and perform their tasks independently. This makes the main payment flow much faster and more reliable.

Security and Scaling

Security in a payment app is about more than just passwords. It involves Encryption of sensitive data both while it’s moving across the network and while it’s sitting in the database. A common mistake is logging sensitive info. You should never, ever see a credit card number or a PIN in your application logs. Use ‘masking’ to ensure that logs only show things like **** **** **** 1234.

As your app grows from 1,000 users to 1,000,000, you will hit bottlenecks. Most of these will be in the database. To scale, we often separate ‘Reads’ from ‘Writes.’ We have one database for processing payments (Writes) and several ‘Replica’ databases for showing users their transaction history (Reads). Since users check their balance much more often than they send money, this offloads a huge amount of work from the main database.

Building a payment system is a journey of managing edge cases. It requires a mindset where you assume things will fail and build safety nets to catch those failures. By focusing on clear transaction states, strict idempotency, and a reliable ledger, you create a foundation of trust. In the end, the code you write isn’t just about moving bits and bytes; it’s about ensuring that when a person sends their hard-earned money, it arrives exactly where it’s supposed to, every single time. Payment systems are not complex because of code, but because money must always be correct.