Monday, October 9, 2017

Flex5: A New Scalable Backend

Reposting this from our main company blog at: https://www.flexrentalsolutions.com/blog/

Hello, everyone! This is Roger Diller, Technical Lead here at Flex. Let’s discuss Flex5 and its new scalable backend. This aspect is perhaps more important than the new UI as it lays down the foundation for Flex5.

We have been methodically crafting the Flex5 strategy for over a year now. It all began with the knowledge that Flash was on its way out and we needed to rewrite the UI in order to stay relevant. After quite a bit of discussion, we ended up on a “mobile first” strategy for two main reasons.

Adopting the mobile first strategy


First, we didn’t have a large mobile presence outside of our warehouse iOS app so gaining a stronger presence for mobile is important to us. Second, we knew that if we started with a less complex mobile UI, we could more easily focus on our design there, and then scale up to desktop vs the other way around. We decided to start with a tablet UI first and later move on to phone and desktop UI’s. It took a lot of work to come up with a UI design for tablet that would transition well to phone. By late spring of 2016, we had completed the brunt of the mobile design strategy. We began experimental coding of a tablet app and started a new REST API inside the existing Flex4 backend.

After our tablet UI design and app proof-of-concept success, we still had concern for the Flex4 backend. Could it stand up as a backend for the long-term? We weren’t sure. We thought we could possibly refactor our way to a faster and more scalable Flex4 backend. In the fall of 2016, we made substantial improvements to Flex4 performance, but we could see we weren’t going to be able to  reach the necessary level of performance to support future growth.


Rewriting the Flex backend architecture



It took some time to figure out a solid approach for rewriting the backend. How could we accomplish that without a risky all-or-nothing rewrite? In late 2016, we came up with a proof of concept Flex5 backend that would coexist with the Flex4 backend.

Let me take a moment and explain the key difference between the Flex4 & Flex5 backend. Flex4 was designed to run as one process per customer. This means we don’t have any way to run a second process for a customer to provide service redundancy in case one process goes down.

The Flex5 backend, on the other hand, is a cluster of at least two Flex5 processes that work together via a load balancer to provide the Flex service. This means one process can fail and the other process  will still be there to provide the service without the user knowing anything happened.

Improved reliability and performance


With this new architecture, we will be able to horizontally scale the Flex5 service. If demand goes up, we can add new application servers to the cluster to handle the load. This is huge and will allow us to support the high demand that Flex5 is going to bring. 

By early 2017, we gained confidence that the new backend was the way forward. We began to incrementally build new API’s in the Flex5 backend and simultaneously call “hard to rebuild” API’s in Flex4 (such as search and availability) until our schedule allows us to build them in the new backend.

We didn’t know it at the time, but later realized we were following the “strangler” rewrite pattern. It sounds kind of strange, but basically it means the new application grows beside or around an existing one and over time it takes over more and more of the work until eventually the old system is not used at all. This dramatically reduces risk and allows access to the new system much sooner and throughout the migration process. An important point I want to emphasize is that you will be able to use Flex4 & Flex5 side by side until the feature migration is complete. This means both systems point to the same database, so changes in one system are visible in the other.

This approach was working well but we were still missing one piece. We needed a way to coordinate events between Flex4 and Flex5. For example, if Flex5 saved a new inventory model, Flex4 was completely unaware that it was inserted into the database so its caches and search index were stale. In the Spring of 2017, we found an inter-process communication tool to solve this problem. The tool keeps each system aware of events happening in the other system and enables each system to respond to remote events.

The road ahead for Flex


In summary, all of the key pieces for shipping the tablet version of Flex5 are in place. We are still rounding out some less critical pieces, but we are getting closer and closer. We expect Flex5 to continue to gain momentum the rest of this year with 2018 being a year of heavy code lifting. It’s very exciting!

Saturday, August 19, 2017

Adobe Flash Withers as Flex 5 Comes to Life

Reposting this from our main company blog at: https://www.flexrentalsolutions.com/blog/
Hello, everyone! This is Roger Diller, Technical Lead here at Flex. The purpose of this blog post is to briefly address the recent announcement from Adobe on ending updates for Adobe Flash at the end of 2020.
Some of you have likely already seen the Adobe announcement a couple of weeks ago that they are going to end Flash updates at the end of 2020; this is the official end of life date for Flash in terms of updates and support from Adobe. For us at Flex, this wasn’t a big surprise. We’ve known for years now that Flash’s days were numbered. Now, we have an exact date which is more than 3 years out, giving us a nice window to build out the Flex5 platform.
As you all know, we use Flash technology for our Flex4 frontend, so you will naturally have questions about our HTML5-based Flex5 timeline in light of the Adobe news. In short, we are already on the right path. The news from Adobe just validates the path that we have been hard at work on for the last year or so. Our development team is focused on completing the Flex5 tablet MVP (minimum viable product). We plan to begin incrementally rolling out Flex5 Tablet around the end of the year. From there, in 2018, we will have incremental updates going out for Flex5, some for tablet and phone but we plan to have some core desktop UI’s coming out sometime in 2018. The bulk of the UI transition will be completed in the next couple years well ahead of the end of 2020 Flash end of life date.
Not only are we reinventing the frontend and getting off the Flash platform, but we are also reinventing the backend as well. I plan to write about the Flex5 backend re-architecture soon in a separate blog post in the next couple weeks. In short, we have created an all new Flex5 multi-tenant backend that will run alongside the Flex4 backend for the next couple years. Eventually, all functionality will be migrated over, and the Flex4 backend will be decommissioned entirely. The new backend is designed to be faster, more scalable, and more reliable from the ground up and will serve as the foundation of the Flex platform for many years to come.
In summary, we are on a path to fundamentally transform the Flex product. From transitioning to HTML5-based UI’s for phone, tablet, and desktop to a new backend that will be able to handle the additional load that Flex5 will bring. We believe we will be entirely off the Flash platform well before the end of 2020.

Tuesday, March 7, 2017

Choosing a HTML5 Web Toolkit for Flex 5

We just posted "Choosing a HTML5 Web Toolkit for Flex 5" blog post on our main corporate site. In this blog, I talk about how & why we choose the HTML5 toolkit that we did and our "mobile first" approach to Flex 5.

The link is below. Enjoy!

Choosing a HTML5 Web Toolkit for Flex 5

Flex 5: Everything you’ve been wanting to know

We posted our first Flex 5 blog post on our main corporate website the other month. I'm re-posting the blog link here so that folks following this blog will be aware of it.

We plan to post all the Flex 5 blog series on the main corporate blog, and repost the links here. We will still post engineering blogs here, but since Flex 5 is an important topic to our customers, we wanted to post the Flex 5 blog series on our main website.

Without any further ado, here is first Flex 5 blog post:


Flex 5: Everything you’ve been wanting to know

Friday, January 20, 2017

We are hiring Full Stack Engineers!

We are hiring full stack engineers to help us build out the new Flex 5 platform! The job description is posted below. While the job description states we are seeking candidates that are close enough to Carlisle, PA to commute to the office once a week, we will consider candidates from anywhere in the United States if you have industry experience (e.g. you have used Flex software, worked in the AV industry, etc).

If you are interested or know somebody that is, please go to our job post at Indeed and apply asap!

Regarding Flex 5 development, we are planning to start a blogging series soon to flush out all the exciting things we have been doing with Flex 5 over the last while! Stay tuned!


-------------------------------------------------------------------------------------------------

Job Description


We are looking for developers to join our growing engineering team.
We are looking for developers to help build out our exciting next generation SaaS platform. Our engineering office is located in Carlisle, PA. We are looking for local or semi remote candidates with the ability to commute to Carlisle once a week.

Our developers work up and down the stack. We have developers that prefer backend or frontend development. We think it’s awesome to have a forte, however, all of our developers are expected to be able to work anywhere in the stack to get things done.

We integrate our code regularly and ship often. We believe in short incremental development cycles so we expect our developers to be committing code at least daily. Our next generation platform is being driven by automated tests and a continuous deployment philosophy. We expect developers to work autonomously, and want candidates with an ability to identify, communicate, and solve problems.

Skills & Requirements


You are fluent with the JVM ecosystem. You will write code in Groovy, Java, JavaScript, & SQL but you are open minded about other languages & ecosystems. You know something about ORM’s such as Hibernate and the tricky tradeoffs that come along with them. It is normal for you to not use the ORM for everything and write pure SQL for complex queries. You know something about Spring Boot and have coded on Spring applications before.

You are an avid unit tester. You already practice test driven development, primarily with unit tests. It is normal for you to commit code with accompanying unit tests. You are able to articulate principles of unit testing to other team members.

You view yourself as a software craftsman. You are constantly improving your craft and know how to stay relevant in the changing technology landscape. You embrace failure as a learning opportunity. You love collaboration and transparency. You look for simple solutions to complex problems.

You are self motivated. At Flex, we won’t tell you when to work. We just expect that you’ll love coding and will naturally have a bias towards getting things done.

You value agile & lean development. At Flex, we don’t subscribe to dogmatic views on agile processes. We happen to use some elements of Scrum but really we just value fast feedback, collaboration, quick iterations, test driven development, continuous integration & deployment.


About Flex Rental Solutions


Our team of dedicated engineers develops rental and event management software for over 600 business customers from all over the world. Our customers are primarily within the Professional Audio Visual, Concert Touring, Live Event, Staging, and Production market segments. Flex offers a flexible and dynamic work environment, opportunity to work on interesting projects and technical challenges, paid company holidays, paid vacation, and health, dental, and vision insurance, along with other benefits.


Monday, November 7, 2016

Making Flex Faster

Speed is something that is one of the top engineering priorities at Flex even as we transition more and more effort towards the Flex 5 HTML5 rewrite effort. We recognize and understand that as awesome as it will be to one day be free from the old Flex 4 Flash-based platform, our users still need to get their work done between now and then.

Speed is something that we have been making small incremental improvements on most releases throughout this year, but it is something that is difficult to really move the needle on. We have also worked on memory leak issues, but those tend to be quite a bit easier. Normally with memory leaks you just take a memory snapshot of a customer who is struggling with memory leaks, and with a little detective work with a tool like YourKit, you can usually find what is hogging the memory and somewhat easily make a fix. Speed is completely different ballgame, as it's hard to get good visibility and the fixes are much more difficult and time consuming.

For us, our speed issues revolve almost entirely around database IO (input/output). Most of the time it's one of the following type of problems.

  • N+1 SQL queries. Queries in a loop, where you get one database trip per iteration in a loop.
  • Large SQL queries. Just sheer size, like many MB's of SQL output.
  • Bad SQL queries. E.g. lack of indexes, bad joins, etc

Focused Speed Work

In light of all this, back in late September we spent at least a week of focused developer time to try to get some speed relief on some targeted high use areas of Flex. We started by having the support team compile a list of top slow areas from customers. Development then took that detailed information and began the process of trying to understand what was really happening in those slow areas.

So even while we have known about these problems for some time, the fixes are not so straightforward. The reasons for this is a bit hard to explain, but the short answer is we have a huge domain model (think all the many tables & fields in the database) and use an ORM (Object Relation Mapper) tool called Hibernate inside our Java application that maps database tables & columns to Java objects.

Hibernate is a "great" tool when you start because it allows a developer to rapidly add new tables & fields and the SQL for them will automatically be generated when you ask the database for something (such as a Quote or Inventory Item). However this really accumulates over time and gets completely out of hand. You might just be after one or two fields from a table for your business logic, but Hibernate will fetch everything because it has no idea what we are really after. 

In the end, you just end up with tons of overhead with the database getting absolutely hammered with sheer amounts of SQL (e.g, I've seen 30MB or greater of SQL generated for a single line item edit action) which in the end is mostly of no use and just gets garbage collected inside the Java application.

So again, we've known for awhile what the general problems are. The problem is getting the right kind of visibility and even knowing what to change to help it.

The Breakthrough

The breakthrough back in September was bringing in a tool called P6Spy. It's an open source tool you can plugin without the application even knowing about it. Basically it intercepts all of the raw SQL that is being sent to the database. It has many configuration options, but one of the coolest settings you can enable is the application stack trace. With that enabled, in addition to being able to see the raw SQL output, you can see the exact line of code inside of the application that generated the SQL!

With this in place, it was like we suddenly had eyes into what was going on. We rapidly discovered some obvious issues, like some unexpected N+1 select issues that were going on. Usually these fixes involved some caching tweaks so that a database hit wasn't needed or moving find by id fetches into some kind of one time batch fetch.

Collection Batching

We fixed those obvious ones and then moved onto other improvements. Specifically, we started doing "collection batching". Let me explain... in the application you can have a Java domain object with a collection hanging off of it (e.g. an Inventory Item has a collection of Serial Numbers) and with Hibernate those are always lazily loaded by default. That means you could pull that item from the database, but the serial numbers won't load from the database until you actually call getSerialNumbers() on the item object.

This is fine sometimes, but what if you were looping over 1000 inventory items? Yeah, you'd be hitting the database every time you call getSerialNumbers() on an inventory item. That is what we call an N+1 select issue, and they are an absolute performance hog.

The nice thing is we discovered a little known Hibernate setting known as "collection batching". What this means, is say you have the serial number collection as above, but you set the collection batching size on the collection to say 100, when you call getSerialNumbers() it will fetch up to 100 other serial number collections (that are already in the Hibernate session) in a single call. This means for the 1000 inventory items, you might only get 10 hits to get all the serial numbers. See https://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html/performance.html#performance-fetching-batch for more info on this batch size setting. 

That is a factor of 100 reduction in database trips, instead of 1000 individual hits it might be as low as 10 trips for all the serial number collections. This was a huge breakthrough and we implemented this strategy in key document editing areas.

We rolled out out these multiple speed fixes in version 4.18.2 in mid-October. We have have heard directly that it is faster, specifically with document editing, which is exactly what we were after.

Where do we go from here?

Monitoring is our next big step. We have set up a tool stack with InfluxDB, Grafana, and Telegraf that will collect metrics sent from Flex. We have version 4.19.0 queued up for deployment this week. With that release, Flex will begin shipping metrics to this new tool stack. We will be able to setup dashboards that will get us all kinds of different types of visibility into the application. We will use this info to make more targeted fixes. 

So we intend to rinse & repeat with targeted fixes, until we get Flex 4 to reasonable speed levels in the "day to day" high use areas. With the Flex 5 rewrite, we are contemplating a whole new way of database access which will be fast and be the ultimate solution to Flex 4's speed woes.

Hopefully we'll have a blog post here in the future on the new monitoring/metric tool stack and the results we get from that! Stay tuned!

Saturday, September 24, 2016

The Engineering Blog is Back!

We are bringing back this engineering blog as a dedicated place to talk about some of the behind the scenes engineering at Flex. We have a lot of exciting things to post about in the coming weeks and months! First up is a post on Flex speed and what we are doing to make Flex faster! Stay tuned!