HCE environment are used to manage payment in a transparent way in the installation of a bank or payment processor. It has three critical modules. The issuer interface, that’s used to manage issuer actions like create a card, disable it or manage customer devices. The customer interface, that directly talks with the mobile device to allow payments to be performed. And the most important one, and time critical, the payment authorizer.
WHY SPECIAL ARCHITECTURE REQUIRED?
On the payment interface it’s critical to respond with times below 100ms to let the other critical paths take time to give the right answer. The POS can deny a valid operation if response doesn’t arrive on right timing.
On the other interfaces atomic transactions are required because on a payment environment is critical to follow the all or nothing rule. This way you avoid creating a virtual card and not be able to send it to the customer. So the card gets activated but not owned by customer, for example.
We need a system that’s fault tolerant, because hardware issues can happen. It must be load balanced and near real time. This is not an easy task so it’s better to plan ahead than regret after.
Provisioning of the system is one of the important requirement of this kind of system. When you are running one or two nodes there’s no problem, but what happens when you have 100 nodes. How you can manage them? If we think they are taking about a week per node and it’s starting to be embarrassing to see that they are missing one or another step and the whole software does not finally works… It’s something to think about.
The database is the most critical point because is where all customer data is stored and no failures are allowed here. And also because we must choose one of the existing solutions because no time to adapt or enhance one of the solutions. We soon discovered that plain postgreSQL was not the way to go since the most stable cluster solution works in active/passive mode and we will miss our performance requirements very fast.
MySQL cluster is another solution but it requires at least 4 dedicated machines to build a basic cluster. We built it and really is working without issues. But we are not sure that this is the way to go on the future.
Then mongo and other noSQL databases arose. And we started to think about a combination of kafka and cassandra for our high performance stuff. Everyone is telling that’s the perfect match and they are cluster friendly from the beginning. Also HA and load balanced. So we tend to think this is the right way to go.
The first version of the software was a monolithic, hard to maintain, piece of software. I took the software splitted into smaller functional areas, linked everything with Rabbitmq queues. And now the platform is very easy to maintain and to run. The problem with current solution is the deployment management, ha and HSM management. Because is currently based on JBoss for historical reasons (even if we no longer need it to run) and this means that you have to deploy a full stack of preconfigured software to just run one node.
I then investigated Apache Storm. Man, this is a great piece of software. I could take away our queue management system and concentrate on Bolts. So we are designing the software to go that way. Next year we will be able to run the whole system on a Storm cluster. With high availability resilience and a great performance.
We will no longer to care about where to place node and how many nodes we will have to place of every working module, because now they are splitted in functional areas and we can tell Storm to place as many nodes as required.
I want to design a system that could take as much 2 hours from the installation of the machine to becoming a fully functional node. This is a big improvement.
The perfect system will provision the hardware machine with Ubuntu MAAS, the operation has only to decide what is the role assigned to the machine and the OS will get there by the network. Once the machine is up and running. It will become a puppet node. It will be by the MAAS installation or by operator done manually. The first is preferred. Once the node adds to the puppet network it will got installed and configured the following components:
- Nagios monitoring – For monitoring of the hardware.
- Daemontools – For supervision of the software.
- HSM drivers – If HSM node.
- Docker – For images.
This can take minutes. Once this is done the provisioning docker cluster will start downloading and staring docker images based on what kind of node is it.
It can download a cassandra image + kafta image + HCE image. Or just a cassandra image + hsm processing node image, etc.
Since the nodes are added to the cluster automagically, once docker starts each node they will automatically appear and start processing.
You have a fully working cluster in just few hours not days. Completely managed by software and with almost 0 maintenance. The software updates are very easy, we just deploy a new image to the docker repository and you just have to reboot the docker images one by one. You can even add the new nodes with new version and then shutdown the old ones so 0 stop time.
And that’s all!