Wednesday, February 29, 2012

Automated GUI Testing

One of the things we struggle with at Flex is the trade off between QA and quick turnaround times on bug fixes and new features, which is a pretty common problem in software development.  The more time you spend checking your work, the more releases and bug fixes get delayed.

This tradeoff reared its ugly head today with the release of 4.4.23.  Our testing missed three or four fairly serious bugs and our frustration was compounded by the fact that they were all quick fixes that could have been easily corrected if only we'd known about them before release time.  We invest more technology and resources into software testing than any other rental software provider, but even our process doesn't catch everything.

Our customers are very demanding and we have a large backlog of new features to work on, so in general our philosophy leans toward quick release times.  To minimize how this tradeoff impacts quality assurance, our strategy is to use as much automated regression testing as possible, which enables us to have a high level of test coverage without requiring manual test plans and the time it takes to do manual testing prior to a release.  Our continuous integration platform is a pretty good safety net, but every net has holes in it.

In our case, these holes fall into two different areas.  The first hole is just those things we didn't think of when assembling our automated tests, meaning weird cases that might pop up in the real world but aren't tested explicitly.  Usually when we find a bug that the test automation doesn't catch, we start by adding a regression test that successfully duplicates the bug before fixing it.  This is called test driven development.  It ensures that our test really works and patches that particular hole in the safety net so we don't have to worry about that bug coming back.

The second hole is a big one, and it's user interface testing.  Our existing test automation tests the service layer of Flex, which includes business logic like pricing math, availability and the scan process.  We do extensive manual user interface testing, but because this testing is manual and therefore time consuming, it's usually focused on testing new features or bugs that were explicitly worked on during a sprint.  A full UI regression test for a system like Flex, if done manually, could take weeks, which we think is an unacceptable delay for our customers.

This would appear to leave us with two options: either push releases without UI regression testing (which is what we're doing now) or develop manual regression test plans and invest the days or weeks it would take to regression test the UI for every release.

Today's experience has reignited our interest in pursuing a middle way: automated UI regression testing.  In this kind of testing, our QA team would develop test plans that simulate mouse moves and keystrokes, and validate that the user interface behaves correctly in response.  This would validate that UI specific logic works correctly and provide a second layer of test coverage for the core functionality, almost like a backup for our existing test coverage.

We're looking at a tool called Selenium, a popular open source testing tool for situations like ours.  There are lots of other tools, but front end test automation is a notoriously difficult thing to set up.  You need a specialist (luckily we have one) and it takes time to get the process up and running, but I think it's worth it.  Over the weeks and months to come I imagine we'll start with automated test coverage for core functionality: quotes, the warehouse process and expand our test coverage into lesser used areas as the UI regression test suite matures.

There is no process that can catch every bug, but we can always improve and given our current process, I can think of no better way to get some big quality improvements than adding UI test automation.  It will be time consuming at first, and require some discipline to get up and running, but in the end it will mean fewer bugs in releases and fewer morning-after-release surprises for customers.




Monday, February 27, 2012

Transparent Field Encryption/Decryption

Recently we have been working on expanding our integration with QuickBooks to include the Online Edition. While working on the transport pipe for QBOE (QuickBooks Online Edition), a realized requirement was the need to store a QBOE connection ticket. Something like this is usually a very simple task involving adding a new field to the appropriate domain object.

However in this case, one of the security rules that QBOE requires of our application is to store the connection ticket in an encrypted state. This is a common security requirement. Since the ticket acts as a key to QBOE, we don't want anybody with database access to get their hands on the actual ticket.

The first thought was to simply create a utility class to handle the encryption/decryption, possibly in a getter or setter. Another thought was to create a custom hibernate type that would handle the encryption/decryption by making use of an encryption utility class.

Upon doing some searching we came upon this: http://www.jasypt.org/ Basically Jasypt is just a java library that makes encryption extremely easy at the data layer. In fact all you need to do is get the appropriate jars on the classpath and do just a little configuration in the hibernate configuration, and voilĂ  encryption/decryption just works.

Here is an example of setting up the custom type in the hibernate configuration:

<typedef name="quickbooksEncryptedString" class="org.jasypt.hibernate3.type.EncryptedStringType">
      <param name="algorithm">PBEWithMD5AndDES</param>
      <param name="password">encyption-password</param>
      <param name="keyObtentionIterations">1000</param>
</typedef>

And the corresponding property mapping:

<property column="qb_online_connection_ticket" name="quickbooksOnlineConnectionTicket" type="quickbooksEncryptedString" length="1000"/>

Besides getting the Jasypt jars on the classpath, that is really all there is to it. With the above configuration the "quickbooksOnlineConnectionTicket" property is automatically encrypted when it is saved to the database and then automatically decrypted when it is retrieved. So from a coding perspective nothing changes. You can simply "set" the ticket and "get" the ticket whenever you need to.

Wednesday, February 22, 2012

Flex Is Hiring

As Flex grows, the demand for custom reports and integration projects is mounting quickly.  With this in mind, we've decided to create a new role for a data and business analytics specialist.  I've posted the job description below, but what we're really looking for is a report specialist with some basic coding skills and a good handle on statistical methods.


Data Analyst

Flex Rental Solutions is looking for a data analyst to help customers with their reports and business analytics requirements.  The successful candidate will start on a part time trial basis and ramp up to full time if the trial period goes well.  This is a telecommuting position.

Key Responsibilities

  • Develop custom reports backed by SQL and XML based data sources.
  • Work with customers and project management team to develop reports and reporting tools per customer requirements.
  • Work with the development team to develop new standard reports.
  • Coordinate reporting requirements with developers to ensure that all necessary data is available and accessible to reporting platforms.
  •  Work with the support team and new customers to prepare legacy data for importing.

Required Qualifications

  • Bachelor's Degree in Computer Science, Mathematics, Management Information Systems or related field - or equivalent experience.
  • At least 2 years working with SQL, including a knowledge of joins, aggregate functions and subqueries.
  • At least 2 years working with some kind of visual report designer like Crystal Reports or iReport.
  • At least 2 years experience with XML and XSD, ideally with a working knowledge of xpath.
  • At least 2 years experience working with a source code configuration management system like CVS or Subversion.
  • Excellent written and verbal communications skills in the English language.
  • Experience working directly with customers to analyze and interpret requirements.
  • Experience with Agile Development / Scrum.


Preferred Qualifications

  • At least 2 years working with MySQL 5.1 or later.
  • At least 2 years designing reports for Jasper Reports with iReport.
  • Working knowledge of the Java programming language and the Spring framework.
  • A working knowledge of statistical methods including distributions, co-variance, co-relation, and standard deviation.
  • Experience with Service Oriented Architecture and web service transport protocols like SOAP and REST.
  • Knowledge of standard Java build tools like Ant or Maven.
  • Experience with entertainment technology or event production.


Interested parties should send a cover letter and resume to jobs [at] flexrentalsolutions [dot] com.

Flex Rental Solutions cannot provide H1B sponsorship at this time.

Wednesday, February 8, 2012

Tales of the Infinite

Any control structure which goes on and on infinitely wastes server resources and can ultimately crash a server.  At Flex, if we see a remote function call from the Flash workbench lock up and no client error is thrown and there's no exception in the server logs, we usually start to suspect infinite trouble of one kind or another.

In software development there are two main types of control structures subject to the perils of infinity: infinite loops and infinite recursion. 

Infinite Loops

An infinite loop is a loop whose test condition is always true, like these:

while (2 + 2 = 4) {
     //do something
}
          while (true) {
               //do something
          }

We use loops all over the place, usually for each loops or for loops, which are seldom subject to infinite loop problems.  It can happen however:  while loops like this next one are a great way to crawl up a tree or acyclic graph structure without resorting to a recursive function:

TreeNode node = random tree node;
while (node.getParentNode() != null) {
    //do something
   node = node.getParentNode();
}
This is all well and good; in a regulation acyclic tree eventually you'll reach the parent node in the tree, the loop's test condition will be false and the loop will exit.  But what if this isn't true, what if what should be the root node somehow thinks that one of it's descendents in the tree structure is its parent?  In graph theory, this is called a cycle.  If the underlying tree in this example had a cycle, it would be impossible to find a node whose parent was null and thus this loop would continue executing over the same closed path until someone stopped it.

Infinite Recursion

Another technique for traversing directed graphs is recursion, which is what we call it when a function or method calls itself.  We use this technique for traversing data structures throughout Flex.  A stripped down example of recursion looks like this:

public int calculatePricing(FinancialDocument lineItem) {

     int result = lineItem.getQuantity() * lineItem.getPrice();

     if (lineItem.getChildren() != null) {
            for (FinanicalDocumentLineItem childLine : lineItem.getChildren()) {
                   result += calculatePricing(childLine);
            }
     }

     return result;

}

This example takes a line item and uses recursion to drill down into child line items.  If we have a nested line item structure that goes several levels deep, eventually each unique path through the tree will have a leaf node (a node without children) and recursion will stop.  But what if a single node is both an ancestor and a descendant?  What if we have a line item that contains a line item that contains itself?  Again, we have a cycle.  There will be at least one path through the tree for which the conditions for continued recursion would always be met.  Usually this kind of infinity is eventually stopped by the Java Virtual Machine with a Stack Overflow Exception. Even so, infinite recursion is bad and has to be addressed before it gets to the point of overloading the call stack.

Cycle Proof Recursion

Courtney recently found an infinite recursion issue related to cycles in suggestion configuration; where somewhere down the line a suggestion ends up suggesting itself.  If these same suggestions also happened to be auto-include suggestions, the auto include logic would recurse infinitely and eventually crash the session.

Obviously we need to add logic that detects cycles to the validation process when configuring suggestions, but I always like to add an extra measure of safety in situations like this, such that even if there's a cycle in the graph the recursive function can detect the cycle and exit.

This usually means adding a parameter to the function that maintains some information about what's happening elsewhere on the call stack.  In this case, our addSuggestion() function, which is recursive for auto-include suggestions, needed to maintain a list of items for which suggestions had already been processed.  This list would get checked for every method call and ensure that suggestions for the given item had not already been processed.  Here's a snippet from the Flex codebase (edited for clarity):

    protected boolean addSuggestedItem(
         FinancialDocumentLineItem baseLineItem,
         ManagedResource resource,
         Set<String> antiCycle) throws Exception{
       
        if (antiCycle == null) {
            antiCycle = new HashSet<String>();
            if (baseLineItem.getResource() != null) {
                antiCycle.add(baseLineItem.getResource().getObjectIdentifier());
            }
        }
       
        if (antiCycle.contains(resource.getObjectIdentifier())) {
            return false;
        }
        antiCycle.add(resource.getObjectIdentifier());

        //do normal suggestion processing

       //recursive call
       if (resource has auto include child suggestions) {
           addSuggestedItem(baseLineItem, suggestedItem, antiCycle);
      }

    }

This check starts by initializing the antiCycle information if the caller doesn't provide it, which would probably happen if the method call is the first one on the call stack.  The antiCycle list is checked to make sure the method call would not be redundant and if so, short circuits out of the function and breaks recursion.

If the function call is valid, the item for which it's being called adds itself to the antiCycle list to guard against infinite recursion further down the stack.

Bad Data

This method doesn't eliminate the cycle in the underlying data; it merely limits how much damage the bad data can do.  The next step for this particular issue is to add cycle checks to the suggestion configuration screens.  More to come on that front.

Friday, February 3, 2012

Update Notifications

Like many busy software shops, we often suffer from the cobbler's-children-have-no-shoes problem at Flex.  Meaning we're usually so busy working on new features and customer driven work that we neglect internal projects we need to continue doing our work effectively.  This often means living with less than ideal solutions to internal problems because the customers don't see those problems.

The most common way this problem manifests itself at Flex is that common tasks are done manually that could be automated.  For example, we used to do all software updates manually, which in the Java space means logging into the server, downloading a new war file and restarting the app server -- and this had to be done for every customer.

Growing Pains

As the number of customers grew, this process became unwieldy and we wrote crude shell scripts to automate parts of the process, beginning with the basic stop app server, download war, deploy war, start app server process.  This saved some time, but that script still had to be manually started for each customer instance.  We're up to 140+ customers now, so that process would no longer work without an unreasonable expenditure of human effort.

Then we added a new script that invoked the old script for every customer on a server.  This shaved off some time.  When the number of servers grew - we now manage around 18 customer and cloud servers - even this became too cumbersome.  We broke down and developed a deployment tool called Cluster Bomb that allows all instances of Flex to be updated with a single command using message queues.

One aspect of our deployment process that's been lacking is the notification process: the process for informing customers when an update is pending and what changes or fixes are in the update.  Chris does the notifications using the same tool we use for doing newsletters and other forms of marketing spam (Oops, I meant opt-in premium content). 

This was better than nothing, but it required a close degree of coordination between Chris and the Engineering Team - and as busy as every one is, especially Chris, it's sometimes hard to make that work.  It was also imprecise.  It gave customers a vague notion that an update was coming, but no specific warning when the update started and no confirmation when it completed.

In addition to annoying customers, it also generates support tickets, which costs us time and support hours.


Automated Notifications

Yesterday we broke down and added automated notifications to Cluster Bomb.  This meant adding distribution lists to each customer instance configuration, a system for assembling release notes, including email text that includes those release notes.  (Our first test of this system was responsible for a brief outage yesterday afternoon.)

From now on, with no coordination with Chris, customers will receive notification emails whenever we start the deployment process and another email when the process completes for their instance.  The big change here is that it's now physically impossible to trigger an update without also triggering notifications.  The process also requires us to provide release notes or it won't permit us to start a deployment.  And we've looped Twitter into the process and will send a status update to our @frssupport Twitter account when a deployment starts, with the version number and a link to the release notes.

We're going to seed the notification system by making some logical guesses based on customer email addresses we have on file.  This is likely to miss people who'd like to get the notifications and spam people who don't care about them.  I believe Chris is going to do an email blast and ask everyone to provide their preferred distribution list for updates, but just in case, feel free to send an email to support@flexrentalsolutions.com with your company name and the email addresses you'd like to get notifications (or not).  Once that feedback comes in, we'll fine tune the list.

Throw Away Code

We've talked about doing this before.  The reason we didn't, other than the cobbler's-children-have-no-shoes problem, is that we're about to radically change our deployment architecture and any work we do on Cluster Bomb will become obsolete once the new architecture goes live.  This is why we didn't go the Full Monty and provide a web based way to subscribe and unsubscribe to notifications.  We would also have liked to make pretty HTML release notes and email notifications.  Still, high availability Flex, the TrueCLOUD as we call it, is about a year away, and one day's work is a small price to pay for a year of better communication.