2014 Spring President’s Message

Contents

Message From the President’s Desk – Michael Olin

Spring 2014

Owning up to my Technical Prejudices

There is really no other way to describe it. I’m a database bigot. The other day, I was asked to look at a query that was causing poor performance in a vendor product that my client is about use to go live. I was warned that, “It has a bunch of ‘COALESCE’s in it. Do you think you might be able to suggest a way to improve the performance?” My immediate response was: “How about not trying to run SQL Server queries against an Oracle database?” Of course Oracle supports the COALESCE function, but why would you ever want to use it? Oracle’s documentation scoffs:

This function is a generalization of the NVL function.
You can also use COALESCE as a variety of the CASE expression.

Later that day, I was sent another query with a construct that offended my Oracle-centric sensibilities: DATEADD(‘DD’, -100, SYSDATE). Oracle doesn’t even have a DATEADD function! A little digging revealed that DATEADD was a stored function that had been created in the main schema for the application. Instead of using a simple “SYSDATE – 100”, which would be executed entirely within the SQL engine, this application has “DATEADD”s in SQL statements sprinkled liberally throughout the code, with each execution requiring a context switch to PL/SQL just to do some simple date arithmetic using T-SQL syntax. While I’m on a roll, don’t even get me started about ANSI-SQL join syntax. To my eye, ANSI SQL joins are almost impossible to decipher. Oracle’s join syntax is much easier to read. All of the tables accessed in a query are grouped together (in the FROM clause, where they belong) as are the join conditions (in the WHERE clause). Oracle’s outer join syntax “(+)” is also more elegant than the clumsy “LEFT OUTER JOIN” or “RIGHT OUTER JOIN.” The only concession I’m willing to make to the ANSI syntax is for FULL OUTER JOINs. Whenever I use them, I make sure that all of my other joins and WHERE conditions are encapsulated in either factored sub queries or inline views. My FULL OUTER JOINs are crisp and clean, without any other ANSI nastiness cluttering up the query.

I see situations like this all of the time. Products that are originally developed on non-Oracle platforms, usually SQL Server, are ported to run on Oracle without any consideration of how the Oracle RDBMS is fundamentally different from SQL Server. This invariably leads to poor performance of the product when installed with an Oracle back-end and often includes the introduction of security vulnerabilities because of an assumption that the application is the only thing running in the database (typical for SQL Server where the concept of a “database” is analogous to a schema in Oracle, and not a database instance). Would an Oracle developer ever try to build a system where an application schema required the “DROP ANY TABLE” system privilege? I understand that maintaining multiple code bases for different platforms is a challenge, but somehow Oracle has been able to do that with its core RDBMS for decades. Back in the 1980s, I doubt that Oracle was writing a library for the IBM PC to emulate VAX/VMS system calls so that they only needed to maintain one version of the database kernel source code. If you want to sell a product to run against an Oracle database, wouldn’t it make sense to ensure that your code took advantage of the things that Oracle does best?

Different Ways of Viewing the World

There is much more at stake here than the readability and efficiency of varying syntax of SQL statements. Far too much of the database code that I see today reflects a fundamental misunderstanding of the strengths of the underlying database platform and how to best use them to the developer’s advantage.

When Oracle was a relatively new product, it did not support a built-in procedural programming language. Programs were developed as either scripts of SQL statements (and we were very creative in writing SQL that generated more SQL which was written to a file that was executed later on in the script), or as programs written in a conventional 3GL, such as Fortran, COBOL or C, with SQL statements that were converted to library calls in the native language by a pre-compiler and then compiled into executable code. Programmers had to get as much done in SQL as possible, since walking iteratively through records in the database was impossible in a SQL script and time consuming and expensive in a 3GL. For years, I explained to new Oracle users that they needed to stop thinking like COBOL or Fortran programmers. SQL was a much too powerful tool to waste on retrieving one record from the database at a time, examining the data field-by -field, or performing calculations in local variables and then writing the results back to the database. Instead, they needed to envision their database programs as performing operations on Venn diagrams. Figure out how to define the set of data that you needed to manipulate and, with a single SQL statement, perform your calculations and update the data. Of course, this approach would not work for everything. Until Oracle developed PL/SQL, clunky, precompiled 3GL code was one of the only alternatives. The original versions of PL/SQL helped solve many of those problems, but as PL/SQL has matured into a full featured development platform with stored procedures, file system and network access, support for objects and more, I see more and more code that reverts to the pre-1980, row-by-row, iterative view of the world. PL/SQL programs are much more efficient and infinitely more powerful now than the old precompiled 3GL code could ever hope to be. However, Moore’s Law has allowed us to hide programming practices that are inherently inefficient.

This move away from efficient SQL queries (which is what a database platform like Oracle does best) towards procedural, iterative code is being encouraged not only by programming languages such as T-SQL (which makes it a bit too easy to extract data into temporary datasets to be iterated through), but also by the entire object-oriented paradigm, which treats the database as nothing more than a persistent datastore used to store and retrieve data, but one which cannot perform complex manipulations. At NYOUG, for years, we have had expert speakers urging us to let the database do the heavy lifting, but the message is lost on those who never had to go through the code / precompile/compile/execute cycle in order to simply iterate through a set of records. However, there is hope as big data makes it impossible to iterate because of the sheer volume of data to be processed. The map-reduce model is actually forcing developers to think about sets of data again instead of looking at individual records.

Trying to Keep an Open Mind

As a student, I worked with all sorts of fledgling languages for accessing relational databases. I still have textbooks and manuals that contain the syntax for Quel, Square, QBE (Query-by-Example) and what was originally called SEQUEL. I have written code in COBOL and Fortran to access flat-file databases. I even developed a small prototype of an RDBMS that was constructed entirely in LISP. I did not start out hating them all. Working with Oracle’s SQL implementation changed the way that I looked at data, and to be honest, I tend to see sets of data where others see row after row. Now, I’m stuck. Iterating through data when it isn’t necessary really bothers me. Programming paradigms which encourage that view get me agitated. Bringing thousands of rows up to the middle tier, just to examine and discard almost all of them makes me crazy. Just seeing the syntax associated with any of these things is enough to make me dismissive. I know that not everyone sees things the same way that I do. I’m trying to keep an open mind, but decades of database bigotry make it something that I continue to struggle with. I have to assume that someday I’ll encounter a new paradigm that changes my view of the world. However, for now, if I’m going to be working with a database, it’s Oracle and SQL above all others.