SOA in Practice: The Art of Distributed System Design

Thanks to the hour a day I spend riding mass transit to and from work (instead of the two hours a day I'd spend sitting in my car), I get to read a lot. I'd like to think that I'm spending most of this reading time on contemporary fiction, but that's really not the case. I'd estimate that a good 80% of my time is spent reading technology articles or books, or Entertainment Weekly. =)

Call me crazy, but I do read a good number of technical books on the subway. I've not yet delved in to the arena of books reviews, but there's no good reason for that, especially with some of the excellent books I've read of late.

"SOA in Practice: The Art of Distributed System Design," by Nicolai M. Josuttis, is a great introduction to the complexities of designing and deploying a service-oriented architecture for your business application needs. It's largely devoid of code, which some developers may find frustrating. I think that's a good thing, as SOA is about approach and not necessarily about the code it takes to set up and run those services.

I suppose you could call this a book for managers, or system architects who are just looking to get in to service-oriented architecture design, but I think it provides such a thorough, clear introduction to the idea of SOA (free from the marketing jargon of companies trying to sell you their SOA "solutions") that it's valuable for any developer who wants to create more loosely coupled services inside a single application or across applications. After reading endless issues of eWeek and Information Week and CIO Insight, I certainly knew some of the principles behind the idea, some of the SOA bus solutions available, and how governance was really important to SOA rollouts, but as for the practicalities of how one might go about doing this in the real world? Not so much.

Aside from the clear insight Josuttis brings to the book from his own SOA experiences, it's the lessons learned the hard way that I found to be most useful. Josuttis brings up a lot of issues with messaging and idempotence that I had simply not considered before and now form a cornerstone of how I'd look at developing a truly service-oriented architecture. That alone is worth the cost of admission.

Perhaps its lack of code will lead some reviewers to say this book is more for pointy-headed managers than developers (even I wanted to see a bit more on how he handled the messaging patterns he raised with some code examples, or some kind of concrete solution rather than saying "This is out there, deal with it"), but I think it's a darn fine and thorough look at getting started with SOA.

News and Tidbits on ColdFusion 9 (aka Centaur)

I just came across two blog posts which highlight some probably and some possible features for the next version of ColdFusion. My friend Adam Lehman (on Adobe's CF team) made an interesting post about the process that Adobe uses to develop new features for new versions of their products. It's a really interesting read, and the Synchronous Development process they use sounds quite interesting. You can't get the full details on that process from the SyncDev Web site because, well, their business is to sell consulting services. The key process idea of "Sell, Design, Build" (rather than "Design, Build, Sell") is a really interesting one as it ensures that your customers (and people you want as your customers) would actually want to buy your product before you ever build it. That's pretty powerful stuff for ensuring that your customer base will actually go out and buy your product. This is especially true of software, where versions above #4 tend to be about adding "nice" features when the product has already solved the core problems it was meant to solve or where the product has a free (often open-source) alternative.

While Adam doesn't talk about anything that hasn't been announced elsewhere, it's an interesting and valuable read to understand not only Adobe's product development process, but where the ColdFusion team, specifically, is coming from in determining the product's future.

Brian Rinaldi has put together a much more extensive look at all the public knowledge about ColdFusion 9. From process improvements to actual feature descriptions, his overview is a great read for anyone interested in the next version of the product and the language. I found it particularly interesting that it appears that Adobe has not gone the route of the Active Record pattern with their Hibernate integration, which I think may not save developers a ton of time but will instead allow for more flexible, robust uses of Hibernate from within ColdFusion 9. Hibernate is going to save developers a lot of time as it is, so I don't think there's a big loss in productivity by not going the Active Record route. (Implicit getters and setters will be a huge time saver, however.)

The big unknowns are the "management features," as they're sometimes called. Those are the easy-to-grasp, sexy, obvious features which make it much easier to sell a product upgrade to managers. This could include a real ColdFusion IDE, or audio/video management tools, workflow and BPEL engine integration, or who knows.

We'll know a lot more by the time MAX is over at the end of November. I'm looking forward to the time we'll finally get to play with CF9!

Doing the Upsert

In the process of developing the contextual guidance API, I tried to push as much of the work as I could to the database. I knew that I would have to do multiple queries to check and see if a user had an existing record, when their last visit was to the application, and more. I had read some time ago about the concept of the upsert, and thought this was a good time to put it in to practice.

In many applications, you need to check to see if a record exists in a table, and do an UPDATE if a record does exist, or an INSERT if one does not exist. This is a common pattern in application development. Many developers do something like this:

Query the table to see if a record exists which matches the passed ID
IF match found {
   Run an UPDATE query
} else {
   Run an INSERT query
}

That's fine, but ultimately requires two trips to the database, or more, depending on what other conditional processing you need to do. The upsert combines all of this in to a single SQL statement. In addition, I needed to run some other database-related logic that would affect the SQL statement. If a user hadn't visited the application in more than 180 days (6 months), I wanted to treat them as a new user and delete their previous record in the log table. (You could set an "isDeleted" flag or delete the record, that's up to you.)

My solution is as follows, and combines some conditional logic in ColdFusion with conditional logic in the SQL as well. This is for MS SQL Server, but the concept applies to pretty much any database.

<!--- First check and see if a record for this user has been updated in the last 30 minutes. If so, don't do anything as that's not really a new session. --->
DECLARE @lastVisitDate datetime
set @lastVisitDate =
   (
   SELECT lastVisit
   FROM logTable
   WHERE userID = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />
   AND appName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
   <cfif Len(arguments.applicationSection)>
      AND sectionName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
   <cfelse>
      AND sectionName IS NULL
   </cfif>
   AND DateDiff(n, lastVisit, getDate()) > 30
   )
         
<!--- If the @lastVisitDate value is not null, a record was found matching the criteria, so we can do an update or insert, as needed. --->
IF @lastVisitDate IS NOT NULL
   BEGIN
      <!--- If the user hasn't visited the app in more than 180 days, treat them as a brand-new user by deleting their record and starting fresh. --->
      IF DateDiff(d, @lastVisitDate, getDate()) > 180
       BEGIN
            DELETE FROM logTable
            WHERE userID = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />
            AND appName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
            <cfif Len(arguments.applicationSection)>
               AND sectionName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
            <cfelse>
               AND sectionName IS NULL
            </cfif>
       END
               
<!--- Try the update first. If there's nothing to update, we need to do an insert instead. --->
      UPDATE contextualGuidanceLog
      SET totalVisits = totalVisits + 1,
         lastVisit = <cfqueryparam cfsqltype="cf_sql_timestamp" value="#CreateODBCDateTime(Now())#" />,
       previousLastVisit = (
                  SELECT lastVisit FROM contextualGuidanceLog
                  WHERE userID = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />
                  AND appName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
                  <cfif Len(arguments.applicationSection)>
                     AND sectionName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
                  <cfelse>
                     AND sectionName IS NULL
                  </cfif>
                  )
         WHERE userID = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />
       AND appName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
         <cfif Len(arguments.applicationSection)>
            AND sectionName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
         <cfelse>
            AND sectionName IS NULL
         </cfif>
               
      <!--- If nothing was updated, then the resulting @@rowcount value will be zero. We need to do an insert in this case. --->
      IF @@rowcount = 0
         BEGIN
            INSERT INTO contextualGuidanceLog (
               userID,
               appName
               <cfif Len(arguments.applicationSection)>, sectionName</cfif>
               )
               <!--- totalVisits, lastVisit and previousLastVisit have defaults created in the table by the db --->
               VALUES (
                  <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />,
                  <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
                  <cfif Len(arguments.applicationSection)>
                        , <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
                  </cfif>
               )
       END
   END <!--- End the "IF @lastVisitDate IS NOT NULL" IF --->

This may look a bit complex, but the key thing here is the UPDATE followed by the IF @@rowcount = 0 statement. If no records were updated (meaning there was no record to match the passed values), then we do an insert. This is just like our original logic, except we're doing everything in one query, rather than two. Because I also needed to check the last time a record for this person was updated and make sure we're not updating it more than once every 30 minutes, I combined 3 potential <cfquery> calls in to a single one.

There's one issue, though, that I'm not too happy with:

I realize that I'm repeating a bunch of code to generate the WHERE clause in these SELECT statements. I can't generate this text dynamically as a string and then <cfoutput> the WHERE clause because I'm using <cfqueryparam>, as I damn well better be. I guess I could build a dynamic string and then wrap it in the Evaluate() function, but that seems pretty ugly to me (and may not work because of how CF's engine parses the query text and does the binding to the values in a <cfqueryparam> tag). Though I guess using Evaluate() isn't any uglier than the repeated code for the WHERE statement.

I like the upsert approach. I'll be using it from here on out in these situations. If anyone has any suggestions about the repeated WHERE clause code, I'd love to hear them!

Wrap Up on the Contextual Guidance API

Given all the background thought I've done on the contextual guidance API, it was a breeze to implement. Planning before coding really does pay off!

The table I'm using to store the data for the tool looks like this:

As you can see, I've simplified things and haven't gone the route of more advanced, multivariate analysis on usage patterns for a users within an application. I certainly could have, but given that determining a user's experience level with an application is an act of generalization and not a precise science, this seemed right. Or maybe I'm lazy, or like things simple, or both.

In addition to a totalVisits value, I'm storing two date values per user, per application, per section of the application (if that's passed). This allows me to place some context on the user's activity within the application. Instead of just saying "oh, well, they haven't visited this application in 30 days, they should be given the beginner materials again," I can instead say "Well if they haven't visited in more than 30 days, but less than 90, and their total visits were greater than 50, give them the intermediate materials because they probably remember some of what's going on here." Again, in an ideal setup, I should look at not only the number of visits (or visits per section), but average visits over periods of time, the length of breaks between visits, and the exact actions they took within the application to provide them with highly specific contextual guidance. I think that for my users and my applications, being able to look at their activity within sections of a complex application given a rudimentary temporal context will be just fine. If anything, this simpler version of the logic will result in a more consistent display of help/guidance as the user moves from beginner to intermediate user to expert. When you try to get super-specific about guessing what a user is trying to do and provide help each step of the way, you are in danger of ending up with Clippy.

There will be one more related post about the API. It has to do with the magic of upserts and how I thought moving some business logic in to the query itself would save me code, but may not have.

The Logic of the Contextual Guidance API

In my last post, I provided background on this project, and why it's useful to be able to provide users with help that fits their experience with the application. I'd like to shift now to looking at the business logic that would make this happen.

I've already covered the method signature for this very simple API, but how does the API know to return a value of "beginner," "intermediate," or "advanced,"? To make this determination, we have to think beyond a simple, reductive value. I could certainly write something simple to say that "If someone has logged in to the app more than 20 times, they've got to be an intermediate user." That's easy, but misleading.

People tend to forget things. I work in academia, and work here is highly cyclical. There's a flurry of activity at the start of each academic term, and the place is a ghost town in the months of May and June (well, faculty and students can't be found. Staff is working as hard as ever). Faculty and TAs working on courses tend to do a lot of work in a concentrated period of time, then walk away and don't look back until the next time they work on a course. I think that many users are the same way: they get what they need to get done ASAP, then walk away without a look back until the next time they need to use the app. There are exceptions, of course (email clients, social networking and financial sites come to mind). Most of the time, though, a user will need to use your application for a little bit and then come back to it only when needed again. In this away time, users forget a whole lot of stuff. Humans are creatures of habit and need reinforcement to retain processes. If someone uses your application for two weeks and then doesn't use it again for eight months, they're going to have forgotten how to do things. They won't be starting entirely from scratch, as, ideally, bits and pieces of how the application works will come back to them, but they'll definitely need a refresher.

The contextual guidance API needs to take simple forgetfulness into account. We can't just say "If someone has logged in to the app more than 20 times, they've got to be an intermediate user." We have to take into account time — specifically, the time between their last visit and their current one. But we can't stop there. What if they visited our app a lot nine months ago, but then came back to it for the first time yesterday. Are they still an intermediate user? Have they remembered all the basic stuff that they needed to remember? Probably not. So we need to look a bit at their history and factor that into our decision.

One possible version of the business logic would work as follows:

  1. Store a record of a request for a page in the app
    • Insert if it's a new user
    • Update if it's an existing user
  2. Look at two stored values: lastVisit and totalVisits
  3. Run a simple decision tree based on these values (see below for discussion)
  4. Return "beginner," "intermediate," or "advanced," as appropriate.

So what would the decision tree look like? There are a number of ways of approaching this, but I plan to say something like this:

  • If lastVisit < 14 days and totalVisits < 10 ===> new
  • If lastVisit < 14 days and totalVisits between 10 and 25 ===> intermediate
  • If lastVisit < 14 days and totalVisits > 50 ===> new
  • If lastVisit > 14 days and totalVisits < 50 ===> intermediate
  • If lastVisit > 30 days and totalVisits < 75 ===> intermediate
  • If lastVisit > 180 days ===> new

There are lots of choices you can make here in terms of the "rules" for determining someone's experience with an application. But we're only making decisions on two pieces of data. It's useful, as Edward Tufte might say, to be multivariate. We need to factor in previous visits to the application and not just look at the lastVisit value. What if someone's lastVisit was yesterday, their totalVisits was 64, but the previous visit before yesterday was 9 months ago? Should they really still be labeled an "intermediate" user even though they haven't been using the app for more than a single day in the past nine months?

Beyond examining simple date values, we could also look at the amount of time the user spent working on the app in the past few days. "The amount of time" is always a highly flawed piece of data on the Web, as a user can sit on a Web page for hours but not do anything during that time. Instead of tracking hours, minutes, and seconds, we can look at the number (or even kind) of requests a user is making in our application and factor that in. You could even go so far as to weight certain, difficult tasks more than others. I'm not going to go quite that far, but it's another option. Looking at all the requests a user made and not just their number of logins can help determine if they're really using the app, or just browsing.

What about in-house staff or "experts"? Should they be a part of this decision tree, or should they always receive the "advanced" view? That's probably outside of the scope of this simple API, as the application that's calling this API can decide if the user is an expert and always return "advanced," if appropriate.

So that's a look at the business logic of the contextual guidance API. Next up will be a look at the (simple) database table structure, and some sample code in ColdFusion, my development language of choice.

Getting Users Up to Speed, But Not Getting in Their Way

I've mentioned Robert Hoekman, Jr.'s excellent book "Designing the Obvious" a number of times in previous posts. Hoekman devotes an entire chapter ("Turn Beginners Into Intermediates, Immediately") to the subject of getting users up to speed with applications as quickly and unobtrusively as possible. He argues that most of the users who stick around long enogh to use your application don't ever get beyond being "intermediate"-level users, and that:

We need to implement some tools that stick around long enough to help users learn what they need to know, but then disappear once their purpose has been served.

So how do we do this? How do we know someone is a real beginner, an intermediate user, or an expert user (for those rare people who really want to take the time to learn all the nooks and crannies of the application)? How do we then track they they are gaining experience with the application so we no longer treat them as beginners but as intermediate users (or advanced users, as appropriate)?

I'm working on a simple API to help provide the appropriate level of contextual guidance to users of an application. The focus here is really on the backend tracking and determination of a user's experience level with an application. The contextual guidance API would be called before a page-view is rendered and the view itself would determine what to display based on the "experience level" of the user that has been passed along to it. With this system, we can hopefully begin to provide users instructive text (aka help) in a way that's useful and appropriate to their experience with the application, and that adapts over time to get out of their way or change to show them new tools at their disposal.

The core function would look something like this:

function getExperienceLevel(userID, applicationName [, applicationSection]):string {}

The arguments would be:

  • userID (numeric): The PKEY userID of the user.
  • applicationName (string): The string name of the application, so that multiple applications could be tracked within the same table, if appropriate. This could also be an applicationID value, if you have an applications table to work with which defined each application and gave it an ID.
  • Optional: applicationSection (string): The string value of the section of the application in which the user is working. This is useful in larger applications, where a user may spend most of their time in sections A and B but not in C. The user would need more help in section C of the application, and less in A and B. You need a way to be able to discern this, and this is the proposed way.

The function would return a string containing one of three values:

  • beginner
  • intermediate
  • expert

You could return a numeric value that somehow translated in to a range (eg; 1-10 = super noob, 10-20 = inexperienced, 20-30 = lower intermediate, and so on), but that level of granularity is probably more trouble than its worth — especially in the view where you'd have to write a whole lot of conditional logic to handle the appropriate display for each of those value ranges.

So those are the basics of the method. In the next post on this topic, I'll talk about internal logic and the fact that we can't reduce experience to simple numbers. We have to take in to account the fallibility of memory in our business logic.

If you have any questions or comments, I'd love to hear them!

A Surprisingly Clear Slide Presentation on Unobtrusive JavaScript and jQuery

One of the many RSS feeds I consume is the one from Ajaxian, and there's a good read at least once a week there. Today there's a post on and embed of a presentation about Unobtrusive JavaScript and jQuery.

I point this presentation out for a couple of reasons:

  1. The presentation is clear, and makes sense without the speaker audio. That's very rare, especially for a presentation on code.
  2. I didn't really "get" the idea of unobtrusive JavaScript. Now I do. Those first 20 slides are great.
  3. It provides a surprisingly good introduction to jQuery in very little text. Code examples are well thought-out and can be followed easily.

I recently finished a project where I used jQuery as the primary JavaScript library. I learned a lot from using it, and I'll be posting soon about some "issues" I had with the library. It's a good, but wildly uneven, library.

If You Need to Explain, You've Got a Design Problem

The title of this post is a wee bit binary, but it's a topic worth discussing. I was just reading a post by Rob Adams, who is working on Adobe's Thermo project, about the value he finds in paper prototypes, and some of the problems.

I'm in the middle of developing a new application with my super-great team, and we're moving from paper prototypes to a clickable, HTML prototype. There's a lot of discussion with the client during this time, and they've now seen four versions of the application on paper. At one point in his post, Adams makes the following salient point:

Correct any mistaken assumptions that are an artifact of the low-fidelity paper, but make sure you record points where you have to help them or explain something - these are design problems you need to fix.

It's very, very easy when settling in to a routine of back-and-forth and discussion with a client to easily explain away a question they have about the interface (and therefore the workflow) without realizing that this, in all likelihood, is a design problem that needs to be solved. If they're asking you "How will this work?" or "What is this for?" or "Will it do [x]?" when that task is represented on the prototype (or is supposed to be represented or implied in the prototype), then you've got a design issue that you need to handle. If they're asking about how data is stored or background workflows not covered by the prototype at hand, then that's not necessarily a design issue that needs to be addressed.

Sometimes you're so focused on just getting through the key aspects of the prototype in the hour that the client allows you to meet with them that it's easy to quickly explain something they don't understand in regards to how the application will work. Have a second set of hands with you to note all the questions (something else that Adams recommends) so that you can go back later and say "They didn't understand this, and it's supposed to be self-evident. Let's take a look at how we can improve the design."

If someone in the prototyping process has a question about how the app is supposed to work, then you can be sure that there will be lots of people who use the app who have the same question. By listening to the workflow or application operation questions being asked and not quickly explaining them away, you'll find the design flaws and be able to fix them before they become a headache in production.

Getting the Query String When You're Using SES URL Rewriting

I just finished up an administrative application (a process that's going to lead to a couple of posts in the near future), and I came across an issue that I hadn't seen before, so I thought I'd post about it.

If you're using Search Engine Safe (SES) URL rewriting in your application, the QUERY_STRING value of the CGI scope on the server is no longer available to you. You might want to capture this, as I do, for logging purposes. It's often useful to know the URL variables passed to a given request for auditing and security, or for debugging a user problem. If your application is re-writing all of your URLs to make them search engine safe, you don't get this information anymore. For example: a typical URL in a Web application might look like:

Search engines haven't, historically, liked such URLs and have tended to stop right at the end of the "path" listed in the URL (eg; http://www.myapp.com/) and ignore everything after the ? (the standard query string delimiter). That's changing, but most Web application developers will tell you to make such URLs "search engine safe," like this:

Search engines will typically follow that URL because it lacks a query string. This can result in better search results at the end of the day.

If you want to capture that query string, however, you can't rely on the old server CGI scope standby, QUERY_STRING, however. That's because in a SES URL, there is no query string. There is no ? to delimit it. So what do you do?

The CGI specification indicates that a PATH_INFO value should be passed in addition to a number of other variables, including the QUERY_STRING value. The PATH_INFO value will contain "The extra path information, as given by the client. In other words, scripts can be accessed by their virtual pathname, followed by extra information at the end of this path. The extra information is sent as PATH_INFO." The PATH_INFO essentially becomes QUERY_STRING, and you can capture and parse it as you see fit for your logging/auditing/processing purposes.

In my case, I'm using Mach-II as my MVC framework, and it handles the SES URL rewriting for me (and a whole lot of other things as well). Anything that comes after the /index.cfm in the URL path then populates the CGI.PATH_INFO value, and that's what I use to capture my "query string" for logging, manipulation, or anything else I see fit.

Why So Silent?

I've been more than lax of late in posting here. It's not for want of material (for once!), but more for want of time. An acquaintance of mine sent me a message the other day telling me that he missed my movie reviews and liked them very much. This got me thinking about my deliberate choice to stop writing about the movies I had seen and focus more on software development. The trouble with that plan is that when I'm developing software all the time (and I've been cranking out a new app in the past few months), I don't feel like writing about it so much. Thus the lack of postings of late.

So I'll start the movie reviews again, and I've got a bunch of postings in the works about Web application development too. I've been using jQuery of late on my latest project and have some very good and less good things to say about it. (I've also used Script.aculo.us and Ext 1.1 on previous projects too.) I'm developing an API for smarter guidance/help systems in Web applications, so expect a series of postings on that. I've also got to start work on a major overhaul to a major application, focusing on simultaneously expanding the flexibility of the core business object structure to allow for a greater range of functionality while simultaneously improving performance in the face of object instantiation costs in ColdFusion. Maybe it's time to look at Transfer?

So that's the plan. Let's see if I can stick with it.

More Entries

BlogCFC was created by Raymond Camden.

Creative Commons License
The content on http://www.iterateme.com/ is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.