Case study · June 16, 2026

How GolfNext Restored Mobile Payment Reliability at Peak Load

GolfNext had a payment problem that looked like a business problem first.

Customers could not always complete mobile payments. Transactions failed or timed out under load. Venues started losing trust because a basic promise was breaking: a customer should be able to pay, get access, and continue.

For the business, the symptoms were clear. Revenue was being lost. Customers were frustrated. Some started to reconsider the product because payments were not reliable enough. The hard part was that the root cause was not obvious.

GolfNext did not need another generic performance tune-up. It needed someone to find why a working payment system became unreliable when enough customers used it at the same time.

Mavka diagnosed the hidden bottleneck and rebuilt the payment confirmation flow so GolfNext could process Nordic mobile payments at higher scale without adding a large number of backend instances.

Key results

Increased payment polling throughput by more than 10x.
Reduced backend instance pressure for the payment workload from roughly 10 instances to 2.
Lowered infrastructure cost by reducing the number of instances needed for the same payment volume.
Removed a payment polling bottleneck that could starve the Spring backend of available threads.
Shortened deploy and rollback cycles by removing the need to wait for long-running payment polling requests to drain.
Made the payment architecture better prepared for expansion beyond Denmark, Norway, and Sweden.

Case snapshot

Client: GolfNext
Industry: golf venue technology, payments, connected infrastructure
Markets affected: Denmark, Norway, Sweden
Payment methods: MobilePay, Vipps, Swish, and similar QR-based mobile payment flows
System: Spring backend handling mobile payment confirmation
Pattern: blocking polling replaced with Amazon SQS delayed-message workflow
Business risk: failed transactions, lost revenue, customer churn, blocked market expansion

The business problem

GolfNext supports local mobile payment systems used across Nordic markets. These payment methods are important because they match how customers already pay in Denmark, Norway, and Sweden.

The payment flow looked simple from the outside:

A customer scanned a QR code.
GolfNext created a payment order.
The customer confirmed the payment in a mobile payment app.
GolfNext waited for the payment provider to confirm the result.

When the system was under normal load, this could work. Under heavier load, the same design became dangerous.

If enough customers started payments at the same time, the backend could run out of available capacity. Some payments did not complete reliably. Other backend requests also started to slow down. To the business, this looked like broad platform instability, not one isolated payment implementation detail.

That made the problem expensive. Failed payments directly affected revenue. Unreliable payment experiences damaged trust with venues. And because the symptoms appeared across the backend, it was hard to explain why the system was failing or which part needed to be fixed.

Why the cause was hard to see

The payment providers were not simply broken. The backend was not simply too small. The deeper issue was how the system waited for payment confirmation.

The original implementation used blocking polling. After creating a payment order, the backend repeatedly checked the payment provider API until the customer completed the payment or the flow timed out.

A negative scenario could keep one backend thread busy for up to roughly two minutes. In an average scenario, one payment could still occupy a thread for around a minute.

That mattered because those were not special disposable payment threads. They competed with the same Spring backend capacity used to serve the rest of the system.

When many payments were pending at once, payment confirmation consumed the backend's ability to respond to other requests. The platform could look generally slow or unstable even though the root cause was concentrated in the payment polling model.

Scaling the number of instances helped only up to a point. It treated the symptom by adding more capacity for blocked threads. It did not fix the architecture that allowed one pending payment to hold backend capacity for minutes.

The deployment problem

The same design also slowed releases.

Before replacing an instance, GolfNext had to stop sending new traffic to it and wait for active payment polling requests to finish. That drain period could take around two minutes. Only after that could the instance shut down and a new one start.

In practice, a release or rollback for an instance could take at least about five minutes: wait for payment polling to drain, terminate the old instance, start the new one, and let it establish connections.

For a payment system, slow rollback is a business risk. If a release affects transactions, the team needs to recover quickly. The old polling model made that harder because payment waiting lived inside the instance.

Mavka's diagnosis

The core problem was not the existence of polling. For these QR-based mobile payment methods, waiting for an external confirmation is part of the business flow.

The problem was where the waiting happened.

Waiting inside a Spring request thread meant every pending payment consumed backend capacity. The system was using expensive application resources to do nothing for most of the payment window.

Mavka's diagnosis was simple:

The payment result could be checked repeatedly, but the idle time between checks should not live in a backend thread.

It should live in infrastructure designed for waiting.

What Mavka changed

Mavka replaced blocking payment polling with an Amazon SQS delayed-message workflow.

Instead of keeping a backend thread occupied while waiting for the next provider check, the service created a short task and placed it in SQS with a delay.

The new flow worked like this:

GolfNext creates the payment order.
The backend places a "check payment status" task into SQS with a delay, such as three to five seconds.
The backend thread is released.
When the delayed message becomes available, one instance picks it up.
The instance calls the external payment provider API.
If the payment is still pending, the service places another delayed check into SQS.
The loop continues until the payment succeeds, fails, or reaches its timeout.

The important change was not cosmetic. The waiting moved out of the application thread and into the queue.

Now a backend thread was busy only during the short moments when it created a task or called a provider API. The minutes of waiting between payment checks no longer consumed Spring request capacity.

Because the work was stored in SQS, any healthy instance could process the next check. A payment was no longer tied to the lifecycle of one backend instance.

Why this changed the business outcome

The old model made growth expensive and fragile. More payments meant more blocked threads. More blocked threads meant more instances. More instances meant higher cost, slower operational recovery, and more pressure during releases.

The new model changed the scaling unit.

GolfNext no longer had to size backend capacity around payment requests that were mostly waiting. The system could process the same workload with far fewer instances because each instance spent its time doing active work instead of holding idle waits.

That is what made the business operable again. Payments could be processed more reliably at peak load. Backend performance stopped being dominated by pending payment confirmations. Releases and rollbacks became less constrained by in-flight payment polling.

Most importantly, the architecture could support growth.

The original issue appeared in Nordic payment markets: Denmark, Norway, and Sweden. But the same pattern would become more expensive in a broader international rollout. If payment volume increased by an order of magnitude, the old model could have required many more backend instances just to survive blocked polling. The SQS-based model gave GolfNext a path to scale payment volume without multiplying infrastructure in the same way.

The result

After the change, the payment workload could be handled with much less backend capacity.

For the same class of payment load, GolfNext moved from needing roughly 10 backend instances to around 2. That reduced infrastructure pressure and cut the cost of serving the payment confirmation workload.

Payment polling throughput increased by more than 10x because pending payments no longer occupied backend threads for one or two minutes at a time.

Deploys and rollbacks also became faster. Instances no longer had to wait for long-running payment polling requests to finish before being replaced. The release cycle moved closer to the actual time needed to start a new instance and establish its required connections.

The visible business outcome was not "we added a queue." The outcome was that GolfNext could accept mobile payments more reliably, reduce the cost of doing so, and prepare the same payment architecture for more markets.

Why it worked

Mavka did not try to solve the issue by adding more servers first.

That would have hidden the bottleneck and increased cost. It also would have left the business exposed to the same failure pattern at the next level of growth.

Instead, the team changed the architecture around the slowest part of the payment flow: waiting for external confirmation.

SQS was a good fit because the system needed delayed, repeatable, distributed work. The payment check could happen later. It could be retried. It could be picked up by another instance. And it did not need to hold a request thread while the customer was still confirming payment in a mobile app.

The fix worked because it matched the real shape of the problem.

The business needed reliable transactions. The technical system needed to stop spending backend threads on idle waiting.

Lessons for similar companies

Payment reliability problems are not always payment provider problems.

If your business depends on external payment confirmation, especially QR-based or mobile-app confirmation flows, look carefully at where the waiting happens.

Ask:

Does one pending payment occupy one backend thread?
What happens when hundreds or thousands of payments are pending at once?
Do payment checks compete with normal API traffic?
Can an instance shut down safely without waiting for long polling requests to finish?
Are you scaling infrastructure for active work, or for idle waiting?

If the answer is "we hold backend capacity while waiting for the provider," the system may work at low volume and fail at the exact moment the business needs it most.

About Mavka. We help companies fix the engineering problems that block revenue: payment reliability, backend scalability, cloud cost, release safety, and senior architecture work that turns unclear failures into operable systems.

Fix a revenue-critical system