Converting a Numeric Field to a Numeric Field

When we create the first version of an application, we're prescient enough to always know the exact functional and storage needs of the application, no? I recently ran in to a bit of a problem caused by some poor planning on my part (or maybe it was the guy who wrote the original version of the app before me), and ran up against an unpleasant datatype conversion issue in SQL Server.

There's a field in the database for this particular app which stores relatively small numeric values — point values for each question in a quiz. This field is currently defined as a NUMERIC(4,2) in SQL Server. For some of you, this may have immediately raised a red flag. Why is this field limited to such a small range of numbers? Sometimes my long experience developing Web-based applications gets the better of me, and I concern myself with the limits of storage from an old-school "an 8GB hard drive costs $400!" point of view. And there are reasonable arguments to be made for limiting the size of fields within a database. However, my original reasoning was something like "Well they're not going to need values above 1000 nor decimal accuracy past two places, so why bother?"

You can see where this is going, right?

Now I have to support decimal accuracy to three places (and, sooner, or later, someone will ask me for four places) because in the world of scores for quizzes and exams, there's a big difference between 3.125 and 3.13.

So my problem was this: how to change the definition of the database field from NUMERIC(4,2) to a more standard NUMERIC(18,5)? Well you can't just fire up SQL Server Management Studio, right click the table, select "Design" and change the datatype. You'll get an internal conversion error from the database engine.

The next approach was to create a new column with the correct datatype and copy the contents of the old column in to the new one, drop the old column, and rename the new column with the old column name. In SQL, it's pretty trivial to copy one column to another. You just run:

UPDATE table
SET column1 = column2

However, when copying from a NUMERIC to a NUMERIC field, the database engine again throws the internal conversion error.

In this article from the SQL Server docs on CAST and CONVERT, there's a figure about 1/2 way down that shows which datatype conversions are supported in SQL Server. You'll notice that NUMERIC to NUMERIC is supported but it "Requires explicit CAST to prevent loss of precision or scale that might occur in an implicit conversion." So all I have to do is CAST() the origin field and we'll be OK, right?

Nope. Yet more internal conversion errors.

What I ended up doing was this, and I probably should have done everything in the database, but I was done wasting time and knew I could get this one time conversion done quickly using ColdFusion. I simply <cfquery>ied the database to pull all values from the original column, but CASTing the values as FLOATs. I then <cfloop>ed over that query and UPDATEd the table with the new values. SQL Server had no issues in going from FLOAT to NUMERIC(18,5). I was then able to drop the old column, rename the new column, and everything was good again.

I'm sure that if my SQL-Fu was better, I could have created a temp table, copied the data in there, done the CAST to FLOAT in the copy, then pulled from that temp table and updated the new column in the original table with the new values from the temp table. If anyone would like to provide me some samples to get started, or has alternate suggestions, I'm more than willing to give them a go.

How Do You Stop Users from Creating Duplicate Accounts?

I'm prepping to build out a consolidated registration site for a couple of the services that we provide at my place of work. Instead of replicating a registration process across a couple systems, we're building a centralized registration system and users can then take those credentials and log in to the services that they need. (We're not replacing existing user accounts with OpenID and OAuth or anything like that — sorry!)

There's a lot of work we'll be doing to make the process simpler and more elegant than the current set of processes. One major hurdle remains, however: how to prevent users from creating duplicate accounts.

When users register for a site — any site — and if they don't access the site right away or within the next couple days, they often forget their account information. If they wait long enough, they'll even forget the email address with which they registered. This is particularly problematic in online education, as someone may take a class, then not take another for a year or two, but return to take another class at some distant point in the future when their email (or physical) address may have changed. We don't want them to make new accounts because then we don't have a unified history of their activity (and this can cause problems with things like prerequisite courses).

A common approach I've seen and we've used in the past is to check against the provided email address. If there's already an account in the system with the given email address, then you let the user know that they already have an account and provide them a link to retrieve that information.

However, if the user is trying to register with a different email address, they're going to make a duplicate account. The system isn't going to know that bsimpson@hotmail.com and santoslhalper@gmail.com are the same person. So what to do?

I'm considering the following, but am very open to suggestions:

  • Check the first and last name: if there's a match to the first and last name, suggest display a list of email addresses that match and say "Hey, if one of these is your email address, go ahead and retrieve your account info."

  • Check the first and last name and city and state or country: It's very unlikely that there are two Janet Halzipools in Buffalo, New York, but one never knows. (In parts of South India, for example, a name like Omar Khan is very common, though less so in Reno, Nevada so maybe this approach won't work particularly well.)

  • Check another unique identifier: We don't/won't/never will store Social Security Numbers, but if you have another unique identifier, you can always check against that. The University that I work for distributes unique identifiers to some, but not all, of our user base (hence the need for a separate set of authentication data). This would work pretty well, but if the user no longer has access to the email address which is associated in the system with this unique identifier, how do they retrieve their account information? Do you ask them to engage in the manual task of contacting support and then waiting for a response from support to get information updated in the system?

Once of my key considerations about any of the above strategies is this: if you find what you think is a match, how do you prompt the user to see if this is them without making them think their account security is at risk?

Again, any suggestions are welcome. Even if I can't remove duplicate account creation, if I can significantly reduce it with one or more of the above strategies, I'll be happy.

Using SQL's COALESCE to Return a Comma-Delimited List in a Query

As part of the standard database normalization process, you want to store related items of data in a join table rather than a comma-separated list in a single row within a table. Instead of putting a list of a user's favorite colors ("red, orange, aubergine") in a single field in the database, you'd store each item in the list as a separate row in the join table. This results in easier querying and faster performance when trying to access this data -- except in one specific instance.

Oftentimes, you will need to run a query that returns a good deal of information about a user, including an item such as the user's favorite colors. If you store the user's favorite colors in a single field in the database, this is trivial, but it's not good form and eventually will result in all sorts of problems when you want to find all users who have, say, red as one of their favorite colors. So you store that information in a join table instead. However, when you want to get a complete set of data on a bunch of users at once, including the favorite color for each user, things start to get a little complex.

You can do a standard SELECT with a whole bunch of WHERE clauses to filter the data and do the join between the user information and the favorite color join table, but you'll end up with one result for every item in the favorite color join table. So if a user has three favorite colors in the join table, you get three rows in your select. How, then, do you make it so you can retrieve a single row for each user and have a comma-separated list of their favorite colors?

The answer, as I found it, came from the smart guys over at 4guysfromrolla.com. I'm not going to rehash the entire article, but instead point directly to the solution.

The solution is this: create a user-defined function in the database which uses the SQL COALESCE function to create your comma-delimited string. For example:

CREATE FUNCTION dbo.udf_getFavoriteColorsList(@userID int)
RETURNS VARCHAR(1000) AS

BEGIN

DECLARE @ColorsList varchar(1000)

select @ ColorsList = COALESCE(@ColorsList + ', ', '') + userColors.colorName
from userColors
where userColors.userID = @userID

RETURN @ColorsList

END

So this function takes a userID value and returns a comma-separated list of all the matches from the userColors table where the userID is a match. You can add additional WHERE statements in this function to filter the results further. The magic is the COALESCE statement, which takes all the results and combines them together into a single value.

This has to be set up as a user-defined function. You can't do this inline with other SQL in your main query.

Let's say your main query is "Give me all users in California and their favorite colors." You'd code the query like this:

SELECT userID, firstName, lastName, email, dbo.udf_getFavoriteColorsList(userID) as ['Favorite Colors']
FROM users
WHERE state='CA'

The userID value in each row in the query gets passed off to the getFavoriteColorsList function that you created earlier, and it returns a comma-separated list of that user's favorite colors.

This is much faster than doing it in the application tier (ie; your ColdFusion or Java or PHP app). The drawbacks are twofold: it's not terribly generalizable — you'd have to create a user-defined function in your database for each such list you wanted — and you can't do a whole lot of fancy formatting as you might be able to in your application tier. Perhaps you could make the function generalizable (that is, pass in the column and table names as function arguments), but I don't believe that works. Please correct me if I'm wrong!

More Tidbits on ColdFusion 9

Sean Corfield covered the keynote at the Wee Dram of Scotch conference yesterday, where Adam Lehman talked about some of the possible new features in ColdFusion 9. To quote Sean:

He said that Adobe is considering how to expose all of the "services" inside CF as SOAP and AMF remote services. The idea is that it would allow clients to access the engines that drive CF's query, mail, document (PDF), imaging, charting, Exchange services etc. They are considering making AS3 libraries available that would allow Flex developers to easily call any of these services directly, making CF the back end of choice because of the rich functionality it adds to Flex. An interesting approach. He also said they are considering integrating BlazeDS more deeply into CF, not like the current optional LCDS Express install, but as a core part of CF which would open up the possibility of direct message handling via CFCs as well as potential improvements to AMF performance (since it would be tightly integrated with CF).

The key point here is that Adobe is, potentially, looking at ColdFusion as the "hub" for a lot of development for a lot of different platforms. If core CF services (such as querying, mail, image manipulation and so on) are exposed as remote services, you can have PHP or .NET or J2EE applications calling these services. Interesting idea, though most other platforms have libraries which do many of these things, and do them natively. That may not be the case for everything (particularly AMF-based querying) and for every platform (Ruby and Perl don't, for example), but I think that developers who primarily code on these other platforms would say "Why use CF when I can grab a native PHP/.NET/J2EE library which does this for me?"

Exposing CF functionality as a series of AS3 libraries really does bind (pun intended) Flex to CF much more closely. That's definitely a good thing, as it increases the capabilities of Flex and keeps development strong within the Adobe family, while the more general Web services exposure of core CF services makes it available to other platforms.

And I'd personally love to see BlazeDS integrated deeply in to CF. It would make for faster, easier application building in CF and really help promote the use of the platform for data synchronization in rich Web applications. BlazeDS and LCDS is really nifty stuff, but it's a bit of a pain to configure, and there are some issues with CF object marshalling and management which need to be addressed. Deep integration of BlazeDS in to CF would knock down those two major barriers, and give the platform another needed bost.

Passing Functions as Arguments, Who Knew?

It's good to learn something new every day. You can always do better. You can always learn more. Today I learned something that I suppose should have been obvious to me: given that ColdFusion is a weakly-typed, dynamic scripting language, you can pass functions as arguments to other functions.

This can be really useful for custom renderers: say you have a series of products but each need to be output in a slightly different way. You could write separate functions with a lot of repeated code to handle the rendering of each of the products, or you could be a good developer and pull out the repeated code in to a single function. You could be an even better developer and pass in a custom rendering function to the generic rendering function to handle the unique processing that each product needs for proper rendering. The generic function then calls the custom function that you passed in as an argument. This leads to simpler syntax, more flexibility, and reuse of functionality. Harel has an example of passing in a function as an argument on his blog.

News and Tidbits on ColdFusion 9 (aka Centaur)

I just came across two blog posts which highlight some probably and some possible features for the next version of ColdFusion. My friend Adam Lehman (on Adobe's CF team) made an interesting post about the process that Adobe uses to develop new features for new versions of their products. It's a really interesting read, and the Synchronous Development process they use sounds quite interesting. You can't get the full details on that process from the SyncDev Web site because, well, their business is to sell consulting services. The key process idea of "Sell, Design, Build" (rather than "Design, Build, Sell") is a really interesting one as it ensures that your customers (and people you want as your customers) would actually want to buy your product before you ever build it. That's pretty powerful stuff for ensuring that your customer base will actually go out and buy your product. This is especially true of software, where versions above #4 tend to be about adding "nice" features when the product has already solved the core problems it was meant to solve or where the product has a free (often open-source) alternative.

While Adam doesn't talk about anything that hasn't been announced elsewhere, it's an interesting and valuable read to understand not only Adobe's product development process, but where the ColdFusion team, specifically, is coming from in determining the product's future.

Brian Rinaldi has put together a much more extensive look at all the public knowledge about ColdFusion 9. From process improvements to actual feature descriptions, his overview is a great read for anyone interested in the next version of the product and the language. I found it particularly interesting that it appears that Adobe has not gone the route of the Active Record pattern with their Hibernate integration, which I think may not save developers a ton of time but will instead allow for more flexible, robust uses of Hibernate from within ColdFusion 9. Hibernate is going to save developers a lot of time as it is, so I don't think there's a big loss in productivity by not going the Active Record route. (Implicit getters and setters will be a huge time saver, however.)

The big unknowns are the "management features," as they're sometimes called. Those are the easy-to-grasp, sexy, obvious features which make it much easier to sell a product upgrade to managers. This could include a real ColdFusion IDE, or audio/video management tools, workflow and BPEL engine integration, or who knows.

We'll know a lot more by the time MAX is over at the end of November. I'm looking forward to the time we'll finally get to play with CF9!

Yet Another Reason CF Needs a Quality IDE: Refactoring

I'm currently in the process of taking a relatively old application and refactoring it to support a real service layer in the model. Learning object-oriented development is a process, and my early apps were all about encapsulation and not so much about architecture. Hence, I have DAOs in one package calling gateways and other business objects in other packages, and other not-so-good coding practices. A good service layer also opens my app up to the kind of flexibility and expansion I need, especially for providing more AJAX-based client services.

Now I could just go in and do a simple find and replace for all the component or function names I'm changing, but that's rather limited. It's difficult to preview all the changes (because sometimes you may not need or want to make a specific change when you refactor), and there's no way to see this in CFEclipse/Eclipse. A good refactoring tool will go one step further and assist you when you change method signatures as well, updating method signatures in other objects as well or warning you where there's an incompatibility. Right now, I have to do all of this by hand, and it's tedious and time-consuming.

A good ColdFusion IDE with good refactoring support (for at least the two items mentioned above) would make a huge difference to me in this project. It would also give a professional level development tool to those who develop in CF. I know that there are difficulties in doing this in a a weakly-typed language, but if the Adobe team can solve this one, it would be great for the higher-end CF developers who are looking to (or moving to) other languages with widespread support for this kind of work.

Big News on the Future of ColdFusion from CFUnited

A number of sources were blogging live from CFUnited, which began today. At this morning's opening keynote, Ben Forta and my friend Adam Lehman from Adobe's ColdFusion team dropped some pretty big news about how the product is moving forward. As I do a majority of my development work in ColdFusion, this is clearly important to me.

The biggest news with the most immediate impact is that ColdFusion Enterprise is now free for qualifying educational institutions. That's pretty huge for folks like me who use ColdFusion to develop educational applications in a non-commercial environment. I'm waiting to see the licensing and how it works, but if it's like the "free for education" Flex 3 licenses (which is what Ben said it would be), then that means that we save a huge amount of money without having to worry about additional costs in deploying additional ColdFusion servers for new projects. I have one project on my plate that will suddenly be much easier to get funded without a $5K price tag for the application server. That's a pretty big win for Adobe, for adoption of ColdFusion, and the future of the platform. I guess government and commercial license sales are doing pretty darn well for them to open up CF to education in this manner.

The next biggest item, for me, was Hibernate integration. This is great news, and helps keep CF competitive with Ruby (with Rails) and Groovy or other dynamic scripting languages that handle object-relational mapping (ORM) out of the box. CF already has something like this with Mark Mandel's excellent Transfer framework, but now an industry standard is baked right in to CF9. I'm not as familiar with Hibernate as I should be, but I hope they expose a mechanism similar to Transfer's decorators for handling those situations where you need to extend the default behavior (or queries) of Hibernate. This will be a big time-saver for a lot of developers who create enterprise-level applications, and will be a great gateway to using more Java directly in CF for many of us.

Another excellent area of improvement comes in how components (objects) will be handled in CF9: using any method named "init" as a default constructor and implicit getters and setters. Both of these items, while pretty technical in nature and not a feature you can easily sell to managers, will be big time savers for the growing number of people who write CFC-based applications.

The Flex and AIR integration demonstrated is interesting, especially the new <cf:> prefix tag for ColdFusion-like functionality within Flex applications. My suspicion, however, is that this will be useful for very simple displays of data within Flex, but that most Flex developers using ColdFusion will quickly outgrow any usefulness provided by these Flex tags. If the Flex tags do automatic ActionScript object creation and let you treat queries as first-class citizens within Flex (rather than having to translate them into ArrayCollections or the like), then that could be quite useful.

All of this was demonstrated with the standard "this may or may not make it in to CF9" disclaimer, but the free educational license is here right now. That's a big win and makes it a big day for CF, at least to me.

UPDATE: According to Ben Forta, from a post on his blog, the free CF licenses for education are "for educational and training use - classrooms, student work, and so on. The free license is not for IT organizations within education." So that means even if you build student-centric applications in CF, you don't qualify for the free license. It's really just for teaching and learning purposes within the context of a classroom where you're learning about software development. Ah well...

Doing the Upsert

In the process of developing the contextual guidance API, I tried to push as much of the work as I could to the database. I knew that I would have to do multiple queries to check and see if a user had an existing record, when their last visit was to the application, and more. I had read some time ago about the concept of the upsert, and thought this was a good time to put it in to practice.

In many applications, you need to check to see if a record exists in a table, and do an UPDATE if a record does exist, or an INSERT if one does not exist. This is a common pattern in application development. Many developers do something like this:

Query the table to see if a record exists which matches the passed ID
IF match found {
   Run an UPDATE query
} else {
   Run an INSERT query
}

That's fine, but ultimately requires two trips to the database, or more, depending on what other conditional processing you need to do. The upsert combines all of this in to a single SQL statement. In addition, I needed to run some other database-related logic that would affect the SQL statement. If a user hadn't visited the application in more than 180 days (6 months), I wanted to treat them as a new user and delete their previous record in the log table. (You could set an "isDeleted" flag or delete the record, that's up to you.)

My solution is as follows, and combines some conditional logic in ColdFusion with conditional logic in the SQL as well. This is for MS SQL Server, but the concept applies to pretty much any database.

<!--- First check and see if a record for this user has been updated in the last 30 minutes. If so, don't do anything as that's not really a new session. --->
DECLARE @lastVisitDate datetime
set @lastVisitDate =
   (
   SELECT lastVisit
   FROM logTable
   WHERE userID = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />
   AND appName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
   <cfif Len(arguments.applicationSection)>
      AND sectionName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
   <cfelse>
      AND sectionName IS NULL
   </cfif>
   AND DateDiff(n, lastVisit, getDate()) > 30
   )
         
<!--- If the @lastVisitDate value is not null, a record was found matching the criteria, so we can do an update or insert, as needed. --->
IF @lastVisitDate IS NOT NULL
   BEGIN
      <!--- If the user hasn't visited the app in more than 180 days, treat them as a brand-new user by deleting their record and starting fresh. --->
      IF DateDiff(d, @lastVisitDate, getDate()) > 180
       BEGIN
            DELETE FROM logTable
            WHERE userID = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />
            AND appName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
            <cfif Len(arguments.applicationSection)>
               AND sectionName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
            <cfelse>
               AND sectionName IS NULL
            </cfif>
       END
               
<!--- Try the update first. If there's nothing to update, we need to do an insert instead. --->
      UPDATE contextualGuidanceLog
      SET totalVisits = totalVisits + 1,
         lastVisit = <cfqueryparam cfsqltype="cf_sql_timestamp" value="#CreateODBCDateTime(Now())#" />,
       previousLastVisit = (
                  SELECT lastVisit FROM contextualGuidanceLog
                  WHERE userID = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />
                  AND appName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
                  <cfif Len(arguments.applicationSection)>
                     AND sectionName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
                  <cfelse>
                     AND sectionName IS NULL
                  </cfif>
                  )
         WHERE userID = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />
       AND appName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
         <cfif Len(arguments.applicationSection)>
            AND sectionName = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
         <cfelse>
            AND sectionName IS NULL
         </cfif>
               
      <!--- If nothing was updated, then the resulting @@rowcount value will be zero. We need to do an insert in this case. --->
      IF @@rowcount = 0
         BEGIN
            INSERT INTO contextualGuidanceLog (
               userID,
               appName
               <cfif Len(arguments.applicationSection)>, sectionName</cfif>
               )
               <!--- totalVisits, lastVisit and previousLastVisit have defaults created in the table by the db --->
               VALUES (
                  <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.userID#" />,
                  <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationName#" />
                  <cfif Len(arguments.applicationSection)>
                        , <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.applicationSection#" />
                  </cfif>
               )
       END
   END <!--- End the "IF @lastVisitDate IS NOT NULL" IF --->

This may look a bit complex, but the key thing here is the UPDATE followed by the IF @@rowcount = 0 statement. If no records were updated (meaning there was no record to match the passed values), then we do an insert. This is just like our original logic, except we're doing everything in one query, rather than two. Because I also needed to check the last time a record for this person was updated and make sure we're not updating it more than once every 30 minutes, I combined 3 potential <cfquery> calls in to a single one.

There's one issue, though, that I'm not too happy with:

I realize that I'm repeating a bunch of code to generate the WHERE clause in these SELECT statements. I can't generate this text dynamically as a string and then <cfoutput> the WHERE clause because I'm using <cfqueryparam>, as I damn well better be. I guess I could build a dynamic string and then wrap it in the Evaluate() function, but that seems pretty ugly to me (and may not work because of how CF's engine parses the query text and does the binding to the values in a <cfqueryparam> tag). Though I guess using Evaluate() isn't any uglier than the repeated code for the WHERE statement.

I like the upsert approach. I'll be using it from here on out in these situations. If anyone has any suggestions about the repeated WHERE clause code, I'd love to hear them!

Hooray! Endless Errors Have Gone Away!

If you've been a visitor here before, you may have run in to more than your fair share of unexpected errors. I'm on a shared server for this blog, and, unfortunately, you always take your chances when you use shared servers.

After having talked to some colleagues who were experiencing similar "unexplainable" errors on servers upgraded to ColdFusion 8 at HostMySite, I was under the mistaken impression that the errors were caused by an issue with HMS's server virtualization technology and upgraded CF8 machines. I thought I had to wait until the CF 8.01 patch to get things resolved. Turns out, I was dead wrong.

The seemingly endless errors on this site were generated by an issue with maintaining connections to the MySQL database on the back end. It turns out that I couldn't change the setting to not maintain connections, but the good people at HostMySite.com (who, surprise, host this site) turned that off and the errors have magically ceased.

Woot!

More Entries

BlogCFC was created by Raymond Camden. This blog is running version 5.8.001.