Rethinking E2E Tests and Contract-Based Testing in Distributed Systems

Imagine an online store. A user buys something on the store's website, paying with money they hold at a financial institution (we'll call it Bank). For the sake of example, the Bank has a direct software integration with the Store. So we have two parties participating in the process: the backend system of the Store and the backend system of the Bank. The entire workflow—from when the buyer clicks "Pay" to the very end (when proceeds of the sale are sent to the Store's bank)—is a multi-step process involving a chain of API calls between the two parties.

Some of the requirements of such a system are as follows:

  1. When a payment is made, it must first be approved (or denied) after verifications such as checking the payer's payment limits and performing checks against online fraud.
  2. The payer's money must be reserved for a period of time, to protect the Store's interests against fraudulent buyers.
  3. The payment can be refunded to the buyer if a problem with the sale occurs.

Each of these steps is achieved through API communication between the two systems. Whether the APIs are real-time remote calls (RPC), messages sent to a message broker, or files placed into a persistent store doesn't matter—the idea stays the same.

In this article, I want to lay out some of the challenges of creating automated tests for such systems, and a solution that I personally like.

To process payments like this, both the Store and the Bank use internal subsystems. These are made up of a combination of microservices, monoliths, or both. Each subsystem contains software suites (also called projects) with classes, structs, or functions. A class, struct, or function can be thought of as a unit. Units make up larger units, and all of them together form the entire project.

The units are usually tested by unit tests: tests that run the code in complete isolation by using mocking or stubbing of dependencies and the environment.

To test how each unit works with others, we can write higher-level automated tests—local integration tests. This is good practice and, in my opinion, something every project should have. But it only covers one software suite at a time. Since we're dealing with a payment flow that involves multiple systems and multiple software suites within each, it's not enough to test everything in isolation. We also need to make sure the systems talk to each other in the right way. The communication between different systems is just as important as how the systems behave on their own.

Automated End-to-End (E2E) Tests

One way to test the communication between systems is by writing automated end-to-end (E2E) tests. These are tests where you spin up several systems and make them call each other just like they would in real life. In our case, that would mean setting up a test environment where the Store and the Bank can call each other’s APIs and we test the whole process.

E2E testing can be useful, but it also comes with a lot of problems. The tests can be slow, hard to write, and often fail for reasons that are not clear. The environments of different systems need to stay in sync, and test data has to be prepared in a specific way. When something breaks, it can take a long time to figure out what went wrong and where.

A Different Way: Contract-Based Testing

Instead of doing full E2E testing, I prefer to define strict API contracts between the systems. These contracts describe in detail how each request and response should look. We cover as many scenarios as we can think of: successful cases, different types of failures, and edge cases.

With these contracts in place, each team can test their own system without needing to call the other one. The Store can simulate how the Bank will respond, and the Bank can simulate how the Store will call it. As long as both sides follow the contract, we can be confident that the systems will work together in production.

To make this work, I had to write a lot of emails to the partner team. I asked them questions about how their system behaves in different situations. We discussed different scenarios, and I had to explain what my system needs to do. Sometimes they had to make changes on their side, and sometimes I had to change something on mine. This back-and-forth was a big part of the work. But once the contract was agreed on, things became much simpler.


Example Payment Flow

The following is a simple example of a payment integration between the backend of an online store (Store) and a financial institution (Bank) holding funds of a Customer making a purchase:

  1. The Customer clicks "Pay".
  2. The Store registers a new payment and sends a verification request to the Bank.
  3. The Bank, upon receiving the request:
    1. Verifies the payment by checking the status of the account to be debited, spending limits, etc.
    2. If any problems are found, it responds with a specific rejection code.
    3. If everything looks fine:
      1. Registers the payment in its own system.
      2. Responds with an approval code.
  4. The Store, upon receiving the response:
    1. If the code is a rejection, it shows an error to the user and ends the process.
    2. If the code is an approval:
      1. Asks the user to confirm or cancel the payment.
      2. Sends the Bank the user's choice.
  5. The Bank, upon receiving the user's choice:
    1. If confirmed, it reserves the funds for the transaction.
    2. If canceled, the payment is marked as canceled in both systems.

Sample API Contracts

Here are some basic examples of what the API contract might look like.

Payment Verification Request

POST /api/payments/verify
{
  "accountId": "12345678",
  "amount": 100.00,
  "currency": "EUR",
  "referenceId": "abc-xyz-123"
}

Possible Responses

// Success
200 OK
{
  "status": "approved",
  "paymentId": "pay-789"
}

// Rejected due to insufficient funds
400 Bad Request
{
  "status": "rejected",
  "code": "INSUFFICIENT_FUNDS",
  "message": "Not enough balance"
}

// Rejected due to fraud suspicion
400 Bad Request
{
  "status": "rejected",
  "code": "FRAUD_SUSPECTED",
  "message": "Transaction flagged for review"
}

Contract Rules

  • Every request must include a valid account ID, amount, and reference ID.
  • The Bank must respond with the correct status and code for each situation.
  • The Store must handle each type of response properly.

Conclusion

Instead of building tests that go through the whole payment flow in real time between systems, we can define strict contracts and test each side separately. This way, each team can work and test independently. It takes effort to define these contracts and get both sides to agree on them, but in the long run it saves time and makes testing simpler and more reliable.

Comments

Popular posts from this blog

Cyberghost Vpn on Arch Linux

AWS VPN Client on a guest VM

Go error handling and stack traces voted as the biggest challenge