Keyhole Labs is proud to announce the release of Trouble Maker version 2.0.0.
Trouble Maker is a platform-agnostic tool that randomly takes down services to test stability. It also provides an ad hoc console to produce common troublesome issues in your platform so you can test durability on-demand.
Trouble Maker v2.0.0 introduces specific performance improvements implemented with Spring Boot and Java Websockets. Additionally, this release has updated the Trouble Maker dashboard user interface look and feel.
The new user interface has been built from the ground up using Angular 2. To provide a more interactive user interface, the Spring Boot server was enhanced to use Spring-based WebSockets.
Users access the dashboard to access the trouble event log. The Trouble Maker dashboard also allows services to be selected and killed on demand, or invokes other troublemaking issues against these services.
How Trouble Maker Works
Trouble Maker is a Java Spring Boot application that communicates with a client service that has a small servlet registered with a Java API-based service application. By default, Trouble Maker accesses Eureka to discover services, and based upon a cron task, randomly selects a service to kill (i.e. shut down).
By default, when started, once per day Monday through Friday, a random service will be selected and killed. This option can be turned off, and when this occurs, can be configured.
These options are defined in the property file located at
###When to invoke trouble maker KILL service, default 2:00 pm Monday through Friday trouble.cron=0 0 14 * * MON-FRI ### Access token that trouble maker client trouble.token=abc123 ###Operation timeout in milliseconds, 0 means forever, default is 5 minutes trouble.timeout=300000 ###Threads to spawn when Blocking trouble is invoked, default is 200 blocking.threads=200 ###Trouble Service name, defaults to trouble.maker trouble.service.name = trouble.maker ###use https when accessing client servlet api trouble.ssl=false
These properties can be set from the command line using VM argument. An example is shown below:
java -jar khs-trouble-maker.jar -Dtrouble.timeout=200000
Why Cause Trouble?
Failure is going to happen – it is just a matter of when that failure will occur. Things like memory utilization and leaks, port exhaustion, connection pool timeouts, too many resource file handles, and numerous others.
Even more potential issues are introduced when distributed systems such as Microservices are adopted, as the entire system has more moving parts than a standard monolithic web application. Service registries, load balancing and failover, and redundancy are essential, so there is even more surface area for these types of potential failures. Handling these types of failures is a characteristic of system stability.
System stability can be tested and validated outside of production. Unfortunately, it’s difficult (and expensive) to do this. Having to create, maintain, and then apply similar usage and loads to emulate production is complex and costly.
Our answer: treat failure as a use case, and engineer failures into your platform’s production environment purposefully.
A use case outlines a system’s behavior as it responds to a request. Instead of waiting for a failure to occur and seeing how durable and resilient your platform is, we suggest that you be proactive and make failure a USE CASE of your platform.
Netflix has been a pioneer of this purposeful error strategy, using a framework called Chaos Monkey which can be configured to randomly take down AWS resources (i.e. load balancers, etc.) during normal business hours. Unlike Chaos Monkey (which is based upon Amazon EC2 API), Trouble Maker is not dependent upon the cloud and could be easily used within an enterprise environment.
If you know failures are occurring, yet pagers are not going off at 3 a.m. and the help desk is not being called, then you know your system is durable. Causing trouble can actually help!
Trouble Maker is an open source tool created by the Keyhole Labs team. We encourage all enterprises implementing a Microservices style of architecture to have tools like Trouble Maker in place to test durability and stability.