Reiisi -- 零石: Oracle's patent #6061520

I posted at groklaw on this and I want to keep track of what I wrote.

At the current point in the proxy skirmish that is the patent face-off between Oracle and Google, several patents have been overturned, and one has been allowed, with claims narrowed, as I recall. Several remain in process or waiting in the queue.

The one which I understand to have been allowed, well, for all that the patent office has allowed it, is quite typical of a patent that should not have issued.

US Patent 6061520 describes, rather ambiguously, a series of steps for applying the technique of code optimization known among other things as constant elimination by "pre-playing the code" (or "play executing"), applied to initializing Java classes.

When we write software, we generally include a large number of constants (numbers, text, and other things which don't change while the program is running). Many of these constants are used to set up starting values of stuff the program references (thus, to "initialize variables").

In order to save the computer some calculation time, we often pre-calculate these constants. Say we want to use π. We can approximate it roughly with

22 / 7

but that still is a division the computer has to perform. 3.1416 is much closer, but may not be as accurate as the computer can represent internally. We could also write it as

arctangent( 1 ) * 4

which will be very close to as accurate as the computer can get (without special help).

(Before I go further, I should note that the Java language contains the constant π, as java.lang.Math.PI , so this is a bit of a contrived example when talking about Java, unless we want to talk about the arbitrary precision java.math class, which introduces a whole bunch of arcane programming that this conversation would quickly get lost in. Not entirely irrelevant, but not useful for the conversation at this point.)

But taking the arctangent of 1 does not come completely free.

Wouldn't it be great if we could get the computer to make that calculation once, then use the pre-calculated result everywhere the constant shows up in the program.

Yes, we can. Usually. Sometimes. A lot, with modern languages.

The patent references an example of an array initialization:

static int setup[ ]={1, 2, 3, 4};

and then says some stuff that lets you know why Java has a reputation for slow startup.

In C, that array declaration would be simply the values 1, 2, 3, and 4 stored in sequence in a table of all the initial values constants that the C startup code just reads directly into memory. No calculation whatsoever. No loading constants to the stack just to store them. No dealing with individual values. It's just part of the machine code image and gets loaded en-masse into memory with the machine code for the program.

Pre-calculating 22 / 7 is easy. Most mature compilers of any language, whether byte-code interpreted or compiling down to machine code, have been able to recognize constant expressions like this since time immemorial, or at least since the early-'90s. That is, I used compilers that were developed in the mid-'80s which could do it. If I were in Utah, I have two old AT&T Unix machines there from that time period, which have compilers that do that much. I have with me now a Metrowerks C compiler from the early-90s that does at least that well.

The process is called scanning for invariants, and it often proceeds in two passes. One pass will see the 22 and the 7 as constants and the next pass will see an ordinary arithmetic expression involving nothing but constants, calculate the expression, and leave only the calculated result for the main pass of the compiler to use later.

Calls to functions are a little more difficult, because functions can have side-effects, but some compilers are able to at least work with standard library calls that are known to have no side-effects and are passed constant parameters, and have been able to do so since before Gosling and his friends started working on Java in the late '80s. (Microsoft C claimed it could do proper initialization, but their implementation was incomplete and not really correct during the '80s.)

The language Java is virtual, and is compiled class-by-class to byte-code. Byte code is kind of like standard function calls, and a little hard to predict the side-effects of sometimes. And then there was the question of where the constants should be stored, and, for some reason, with the byte code for the class apparently wasn't an interesting answer. So Java couldn't just generate the constants and store them with the compiled byte code. It had to store them within the instructions to explicitly initialize the array, one element at a time.

If I read the patent right, this patent is a specification for performing initializer invariance analysis in Java by the supposedly cheap method variously known as pre-flighting the constant space, simulated initialization, play execution (Sun's chosen term), and such, and for additional instructions which initialize the various data constructs of Java.

Except it doesn't solve certain difficult problems, just waves its hands about allocating a piece of memory for the initialization code to simulate its work on and building the initialization tables according to the effects on said memory.

I'm not sure why Java needs more than a block copy for the initialization, but they seem to think it helps overcome unnamed issues. Perhaps side-effects, volatile storage, and such?

This may be an improvement on one-at-a-time for Java, but not particularly new to the world of software in general, especially .

Pre-processing the source code for invariants is also conceptually at odds with the idea of interpreting byte-codes. (One of the design goals of Java is to refrain from programming devices like the C macro pre-processor.) In the end, I think it's a culture thing, but anyway, they didn't want to call it that.

At the time of the patent, many of the popular interpreted languages (certain BASICs, perl, etc.) did more-or-less what this patent talks about, and more. Some would set up a limited mock-up of the run-time, and compile the source into the mock-up, turn the initialization code loose, and look for constants that fall out. Then those constants would go into a table.

(Perl has a byte-code compiler, but it has not been very successful, in terms of saving the byte-code and running from that instead of from source. I think they got stuck in the same kind of places Java gets stuck in, and, instead of claiming a -- partial -- solution, just generally recommended not using those features unless you knew what you were doing. Still, Perl does compile to byte-code as it starts up, and it includes initialization constant evaluation as part of the compile phase, as I understand it.)

I re-read their method, and there is nothing either new or original about what they are doing. Maybe it was new to Java, but it was not particularly original. (I suppose I should go dig up the old perl newsgroup posts on the subject to prove it?) And the patent doesn't provide us with any clues as to what specifically was really innovative even if the patent is just to apply to Java as a language (a bit of a stretch) or just to Sun's implementation of Java.

I suppose they could claim that a bytecode instruction that initilizes an array is an innovation, but that would be just playing with words. A bytecode instruction is a function call, and many languages have function calls that do the equivalent. Class and object oriented languages build such functions as necessary for the class instance initialization, as this spec indicates.

Constant pre-evaluation is obvious and part of the state of the art, whether by play (simulated) execution or by the more explicit pre-processing methods of compiled languages. In fact, it is (and was at the time) among the implicit goals pursued as a language matures.

After several read-throughs, I think the innovation claimed is that they pick out the initialization code and execute only that against a pre-cleared, throw-away region of memory. But you still have to track which expressions are known invariant and make sure all the rest get executed after the first constant pass of the initialization. Or you have to limit the expressions allowed in initialization code. Java does both.

As to whether using a throw-away memory allocation to run the initialization code against, coming back from a break bringing in the laundry, I realize --

Any attempt to calculate the results of initialization code has to use a throw-away region of memory. Where else does one store the results?

And, as I said before, it's not really an innovation, when compared to what was available in other languages.

Come to think of it, I remember fighting with the Java compilers and wondering why the code I was writing looked okay by the spec as I read it, but kept being rejected by the compiler as containing non-final code. If I'd read this patent spec first, I'd have had a better idea how far one could go in initializations, and why things I expected from my work in other languages didn't work in Java.

I wonder if the problem is the confusion caused by incompatible use of jargon, as in "pre-flight invarant analysis" versus "play execution". (Or as in byte-code vs. p-code vs. pseudo-code [sic] vs. i-code vs. intermediate code [sic].)

Anyway, that this patent was not overturned is more evidence of the practical results that are making it obvious that software patents are at best wrong in their current form.

Too many software patents are just feature lists lifted from the marketing materials, and dressed up with just enough technical information to make it look interesting and unusual, but not including enough information to implement, and not including enough information to determine the pre-requisites of patentability: originality and innovation. (Whether software patents can be otherwise is a rant for another day.)

Reiisi -- 零石

My Best Teaching Is One-on-One

Monday, July 25, 2011

Oracle's patent #6061520

No comments:

Post a Comment

Threads of Thought

Contributors

Blog Archive