Vladimir Vukićević — Words
 



HTML5 Web Storage and SQL

There’s been a lot of interest around the Web Storage spec (formerly part of whatwg HTML5), which exposes a SQL database to web applications to use for data storage, both for online and offline use.  It presents a simple API designed for executing SQL statements and reading result rows.  But there’s an interesting problem with this; unlike the rest of the HtML5, this section defines a core piece of functionality in terms of an undefined chunk referenced as “SQL”.

The initial implementations of Web Storage are both based on SQLite, and expose the dialect of SQL understood by SQLite to web content.  I’m actually a big fan of SQLite, and was one of the advocates for pulling it into the Gecko platform.  However, SQLite implements a variant of SQL, with a number of deviations from other SQL engines, especially in terms of the types of data that can be placed in columns.

Web content that is created to use database storage with SQLite as the backing is unlikely to work with any other backend database.  Similarly, if another database was chosen as a browser’s backing implementation, web content that works with it is unlikely to work with anything else.  This is a serious interop problem, the root of which is that there really isn’t a useful core SQL standard.  SQL92 is generally taken as a base, but is often extended or altered by implementations.  Even beyond the parser issues (which could be resolved by defining a strict syntax to be used by Web Storage), the underlying implementation details will affect results.

So, the only option is for the Web Storage portion of the spec to state “do what SQLite does”.  This isn’t specified in sufficient detail anywhere to be able to reimplement it from the documents, so it would be even worse — “do what this exact version of SQLite does”, because there are no guarantees that SQLite won’t make any incompatible changes.  For example, a future SQLite 4 may introduce some changes or some new syntax which wouldn’t be supported by earlier versions.  Thus, it requires every single browser developer to accept SQLite as part of their platform.  This may not be possible for any number of reasons, not the least of which is it essentially means that every web browser is on the hook for potential security issues within SQLite.

Instead of all of this, I think it’s worth stepping back and consider exactly what functionality web developers actually want.  It’s certainly much easier to say “well, server developers are used to working with SQL, so let’s just put SQL into the client”, but it’s certainly not ideal — most people working with SQL tend to end up writing wrappers to map their database into a saner object API.

So, I would propose stepping back from Web Storage as written and looking at the core pieces that we need to bring to web developers.  I believe that the solution needs to have a few characteristics.  First, it should be able to handle large data sets efficiently; in particular, it should not require that the entire data set fit into memory at one time.  Second, it should be able to execute queries over the entire dataset.  Finally, it should integrate well with the web, and in particular with JavaScript.

With these needs in mind, I think there are other options that should be considered, even beyond a subset of SQL; for example, an object-oriented database approach might serve those needs better.  A good prototype example of what such a system could look like is jLINQ, which implements client-side querying on JavaScript objects and arrays.  As such, a basic implementation is simple; more complex ones can have browser support for efficient indexing, triggers, rapid serialization to and deserialization from disk, etc.  An implementation could even map all of this on top of an underlying SQL engine.  Another option is something like CouchDB.  I was also just pointed at Persevere, which looks quite cool; much in the same way as CouchDB, the same API could be implemented both client-side and server-side, for efficient online/offline switching.  An approach such as one of these could well serve the web better than just throwing a SQL dialect over the web content fence.

This is a conversation that’s worth having, both to figure out what could be done about the issues with directly exposing SQL/SQLite, and also to step back and explore alternate approaches to getting the same functionality in web developers’ hands.  The mozilla.dev.platform group is as good a place as any for this conversation, so please post any thoughts you may have there. Hixie rightfully points out that public-webapps is probably a better place for this discussion (which I wasn’t even subscribed to for some reason).


23 Comments to “HTML5 Web Storage and SQL”  

  1. 1 Ian Hickson

    I encourage you to post this to the public-webapps list (or to the WHATWG list).

    I expect I’ll be reverse-engineering SQLite and speccing that, if nothing better is picked first. As it is, people are starting to use the database feature in actual Web apps (e.g. mobile GMail, iirc).

  2. 2 Simon

    As a Java developer, I’d hate to see extensive use of raw SQL creeping into javascript – it’s sometimes useful, but there’s good reason why many applications use frameworks like Hibernate to hide it from a developer who just wants to persist an object. So yeah, I’d much rather see a local-storage API built that way from the start, rather than having everyone start writing libraries to hide an SQL layer few people ever want to worry about.

  3. 3 David Dahl

    As a web developer who has crossed over into Mozilla front-end work, I can tell you that the web has become a wonderful place to do database development mainly through the ORMs that are ubiquitous in web frameworks. Libraries like django.db (python) and Rails’ ActiveRecord are great places to research how this problem has been solved . For those who would never use an ORM, most are designed so you are never more than a line of code away from a statement or cursor, giving you the best of both worlds. You can solve the easy to medium complex operations with an ORM, but you always need to option of raw SQL queries.

    Cheers,

    David

  4. 4 Rob Campbell

    I’m loathe to think of implementing a canonical ORM solution into the browser or defining one as a spec. They are varied beasts that often dictate application design and frequently don’t, if ever, live up to their promises of low-overhead and reliability. Plus, they can impose additional restrictions in the form of object caches and extra layers of code, separating you from the database. I’m no fan of SQL, but providing simple means to access the database through JavaScript and letting application developers wrestle with their own ORM solution if they think they need one is the best way to go, in my opinion.

    Or we could just implement an Object Oriented (or other model of) database instead as you suggest. It’s funny to me that we keep stringing SQL along, even though it’s probably the least compatible data model for dealing with highly-connected, object-based systems. It’s encouraging to see a lot of interest in document-based systems like CouchDB for server-side web storage. Having experienced the power of using an object oriented database before (GemStone/S), I can say that it’s really hard to go back to something as crude and complex as SQL. Though there are trade-offs to any solution, I think we could balance them in favor of ease of development for webdevs with acceptable performance.

  5. 5 J Chris A

    Plus one for CouchDB. Much closer fit than SQL for the problem space. Replication is also perfect for client-server sync. ps I’m extremely biased. ;)

  6. 6 Mark Constable

    CouchDB requires the rarely otherwise used Erlang language and Persevere requires Java OTOH SQLite is implemneted in straight forward public domain C code which can be readily, and easily, absorbed into any client side browser. There are some very good reasons why SQLite has so far made the cut so please do not throw out the baby with the bathwater. The SQL interface may be less than optimal but the client browser side implementation is dead simple and I’d rather have ANY offline storage NOW and wait for HTML6 to define a better standard based on real world hindsight than any delay because the interface is not acceptable to some developers and may introduce some tight interface bindings that will need to change in the future. I really don’t think SQLite will change it’s API if it’s widely adopted by HTML5 browsers, most likely the opposite, it will probably become even more HTML5 savvy in future versions. It would be much easier to implement a C/C code library wrapper around SQLite, if it does have shortcomings, than try to mess with anything to do with Erlang or Java based solutions… ie; define a HTML5 savvy ORB, write a spec and implement said C/C wrapper around SQLite. Replace “ORB” with any other OOanything and do the same.

  7. 7 ant

    What about a DOM-like API? Pretend it’s all XML and select rows using getElementByWhatever or XPath, maybe borrow a few things from the HTMLTableElement interface… just as long as the SQL is kept as an option.

  8. 8 vladimir

    Mark, did you read my post? There is no issue with sqlite itself, but the issue is mandating sqlite. There are many reasons why someone might be unable or unwilling to ship sqlite.

    ant, the whole point of this that SQL can’t be an option… there’s no reason to add additional standardized abstractions if you expose SQL; the problem is that there is no single “SQL” to expose. Someone suggested to me that server developers have been used to dealing with different SQL engines on the server (mysql, postgres, ms sql server, oracle, db2, …) and that they should just do the same in web apps. That terrifies me.

  9. 9 Dmitrii 'Mamut' Dimandt

    There’s absolutely no need for SQL on the client side in a web app. Key-value store at the most

  10. 10 Nicholas Orr

    The other thing to look at – is as soon as something is done and we the developers start using it, changing to something else is going to take a while. So better to sort it out now and take longer. Just look at IE6, we’ve been stuck with that for far too long….

  11. 11 Nate

    As a Javascript developer who has mostly done server side development in the past, I personally prefer to use some type of ORM in dealing with a backend data store, even for very simple applications. However, not all ORM solutions are equal – different ORM have different strengths and weaknesses. In the past year I have used 5 different ORMs. It would be terrible if I did not have the choice.

    My input into the matter is this: Make the Local Storage standard in HTML5 as low-level and flexible as possible. Framework developers will very quickly create many choices of javascript ORM frameworks that abstract away the use of SQL. The best will survive.

  12. 12 weather

    There should be something more than just sql, I think.

  13. 13 Matthew Raymond

    @Mamut:

    Key-pairs just don’t cut it for synchronizing complex sets of data between the client browser and a server-based database, especially when the application has an offline mode and updates, insertions and deletions must be submitted to the server DB when the client returns to an online state. This is not a theoretical situation I’m giving you. It’s an actual use case at the company I work for.

    @Nicholas Orr:

    The ability to store DB information on the browser is vital for business-oriented offline Web applications, so nobody’s going to wait 10 years for a standard before implementing such a feature. While a better standards would be preferred, I strongly suspect that the longer it takes to release a standard, the more likely it will be that the solutions provided by one or more vendors will either become the standard, make the standard irrelevant or delay the adoption of the standard significantly. Better a half-a**ed standard that everyone supports than a thoughtful, well-designed standard that no one supports.

  14. 14 Nikunj Mehta

    There are two camps of developers interested in WebStorage – one that develop local applications and another that develop offline-ready applications. The first camp probably doesn’t deal with HTTP or REST abstractions for their data but the second one has to deal with the HTTP interface to data as well as situations in which offline access to that data is required.

    For the second category, IMHO, it may be better to focus on getting the right HTTP abstractions emulated locally for offline purposes. If people are going to need to upgrade their browsers for off-line use, then might as well provide them one that will allow them to run an “embedded REST server” within the browser. I mean let people write their own server emulation logic in JavaScript using whatever storage that’s available to them in the browser, whether it is key-value pair or SQLite or another, and serve offline requests over the same interface that the server does – HTTP and URL.

    I have a proposal before the Webapps WG at W3C to consider adding JavaScript interceptors to enable various kinds of offline operations to take place without necessitating any standards about the SQL storage and access portion.

    Here’s the URL for my blog post about the proposal and its comparisons to HTML5, Gears, and Dojo Offline: http://o-micron.blogspot.com/2009/04/bitsy-050-develop-seamlessly-on-lineoff.html

  1. 1 Browser storage: Do we need SQL? Or would a JSON approach be better? on Dion Almaer's Blog
  2. 2 Ajaxian » Browser storage: What is the correct API? SQL? JSON?
  3. 3 Browser storage: What is the correct API? SQL? JSON? | Guilda Blog
  4. 4 Couches in Browsers at Toolness
  5. 5 Pinderkent: Browser-based databases will further encourage the undesirable Web application monoculture.
  6. 6 Mozilla Labs » Blog Archive » Towards Better Browser Storage
  7. 7 Mozilla Labs: Die Zukunft der Datenspeicherung im Browser » t3n Magazin
  8. 8 Mark Finkle’s Weblog » Day Dreaming about Web Storage
  9. 9 saving data with localStorage at hacks.mozilla.org