Tuesday, April 29, 2008

Clouds and Meshes.

I mentioned Google's App Engine the other day. I finally got to play with it a little bit. Let me start off by saying that this is the next big thing in Internet related technologies. Few years from now most of our new applications will be built using some variation of this paradigm. I am sure some small company that may not even exist today would be a big player in the field and offer some good investment opportunities during the hype and bubble that follows.

The basic idea of the Cloud computing is that your code and data lives within the cloud. You do not worry about servers, database engines, scalability, performance, down time, etc... you just submit your stuff to the cloud and the cloud takes care of everything for you. 

I have been involved in similar themes for couple of decades now. A big focus of my Ph.D. thesis was on distributed computing using networks of work stations. I wrote hundreds of thousands of lines of code using Parallel C, the PVM and MPI libraries. Code that executed asynchronously on distributed networks of workstations and communicated over the network in non deterministic fashion.

Also in the good ol' days I was a developer on the Condor project, by far the most successful platform for harnessing the power of distributed networks of workstations and I published couple of research papers on the issue.  Cloud computing is not exactly the same but it is a nature marriage of distributed computing and the Internet.

 

The Players

There are several players in the cloud computing universe, not to be confused with the Grid computing being pushed by Sun, IBM and Oracle. I am not familiar with all the players but of the ones that I am familiar with the iCloud project stands out as the most admirable platform, bar none. The user interface is astonishing, it looks like a next generation desktop running right inside your browser. They currently have a development competition, with $5000 prizes for the best applications. If I had the time and needed the money I would definitely join as this technology is simply a game changer. Sadly, I do not have the time. As much as I admire what these people have done I do not believe that they can moneytize it on their own as they are a smallish company. Plus I am not fully ready for having all my desktop on a remote server. I am ok with having my data hosted remotely but I am not sure I am comfortable with everything living inside the browser on some remote server in Europe.  My guess is that they will be bought out by somebody else. If they get bought out by Google or Microsoft I will revisit them. If they get bought out by somebody like Oracle or Sun then I wouldn't bother.

The most seasoned player in the field is Amazon, with its various services including Simple Storage Solutions (S3). Amazon's offering is cool because it uses REST protocol for communication and thus you do not need to be married to any specific language or development platform. Also their offerings are loosely related, you can use one or many of them and integrate the offerings without much hassle. They just reduced the price for their disk/network usage. It is by far the quickest Cloud platform to prototype an application. They also have an active community and the development team is very active in support. Some of the applications that sprung up within days of releasing S3 were pretty impressive.

Google introduced its offering recently and it is the impetus for this post. I finally had couple of hours to explore their offering. The way the offering works is that you get access to a data store and a hosted web application. You develop your application including your data model and data access code. Then you send it all to the cloud. It lives and runs there, you do not have to worry about setting up a server or hosting. Nor do you need to worry about the actual database. The cloud will take care of all of that and will scale as your usage scales. Storage, bandwidth, backups, replication, scalability is all provided for you. More importantly you get tight access to Google's other services such as User accounts and search functionality.

It all sounds great. The only problem I have with it is their choice of platform. The current offering is only available for Python. I know what Python is but I have never used it before, nor have I ever looked at a piece of Python code. Python is what Google uses internally so it must be a good language. None the less I am not familiar with it. I figured I will just pick it up quickly and start using it. This was not the case, while the language offers superb constructs I could not find a decent development environment for it. They use indentation to group statement which is really odd. The error messages are really cryptic. I had misspelled one folder's name when I started, it took me over two hours to figure out what the error was.

The Google Apps engine itself uses an MVC (Model View Controller) model to architect the web app. I am not so crazy about this model. I guess it has its roots in Java and my guess is that it was created for lame programmers on large projects so that they do not screw up much and simply get a small independent task without screwing up the rest of the application.

Finally there is the data model. The Google Apps engine introduces a data model based on a popular Python framework called Django. The framework models the data for you as objects that you can access programmatically without having to worry about writing SQL code.  This is very similar to what Microsoft has been pushing lately as LINQ (Language Integrated Query). While both models offer rapid development I am not sure I want to give up control over my database. I have no problem using the data model to retrieve one record or a class made up of related database records. I have no problem with using the data model to perform generic searches.

However most complex applications break down in terms of scalability when complex database access and reports get involved. For those there is no substitute for writing fine tuned stored procedures to retrieve the data. Last year I had to help out with a project where an operation was taking 18 hours to build a large report. The programmers normalized the database to death, did not have good indexes and then started looping using cursors to process one record at a time. After about four hours of my development time I had denormalized the database, put together good indexes and rewrote the code to use bulk updates with clever joins instead of the cursors. The code started executing in under 30 seconds. The exact same operation that used to take 18 hours before.

Without such control, the data store and data models of the Google Apps will no doubt auto generate a database and an access layer that will break down under pressure. Especially platform programmers are basing their work on textbooks where Normalization is king.

Finally the Django framework seems to be pretty neat but I am having lots of problems integrating it with the Google Apps engine. They released a helper but with lacking documentation. I am sure an Python expert would have no problems figuring it all out but I am having issues figuring it all out. I can get the framework to work within the Google Apps engine but I cannot use it in a stand-alone fashion where development would be easier.

Rumor has it that Google is working hard to offer the engine using other languages such as Java, Ruby on Rails and PHP. I am not sure when would they be able to offer this. If I do not make quick progress on the Python front I will wait to experiment with their PHP offering.

 

The last player I am going to mention is Microsoft's Mesh. The mesh was just announced very recently. It seems a hybrid of Google and Amazon's offerings with less focus on the database. I have not had a chance to play with it but as I write this I received an email from them telling me that my invitation is now active and I can start development for the Mesh so chances are I will be doing that over the next week.

 

In summary, we are at the very early stages of Cloud computing. However my sense is that this will be the next big thing and I would like to be involved in it early on just like I was involved with the Web right after the initial release of the NCSA httpd in 1993, with blogging before the term was coined and with Ajax from the onset. It is not clear on who is going to win out in this arena. Since my time is limited I am going to stick to Google's and Microsoft's offerings for now and try to learn about both as much as I can.

0 comments: