January 28, 2019

Google Datastore with a relational data model

I’ve been working on a medium-sized European software as service (SaaS) for the last two years. Almost everything we use comes from Google, which includes Cloud Datastore as our primary source of truth. I’m not advocating against it - depending on your use-case it might be a great fit. Instead, I’m presenting the issues we have with it (and other NoSQL/document databases) due to our domain models being very relational.

My current company is a five-year-old startup improving the automotive aftersales process in the Benelux area. We provide a ‘Cloud-based’ software as a service for car dealerships, cutting their communication time and increasing sales.

As mentioned earlier, except for developer’s laptops, we use everything from Google. Main reason - company’s co-founder and the ex-CTO are huge Google fans. Some of Google’s products we use daily:

  • G Suite (Google Accounts, Calendar)

  • Google Docs and Sheets for documentation, paperwork

  • Google Cloud Platform, AppEngine (GAE) and few ComputeEngine (GCE) instances

  • Google Chat (communication), Meet (video calls)

  • Go and Polymer (only later rewritten in React)

  • Datastore (primary database)

  • Chromebooks and Chromeboxes for non-developers

And many similar examples. This is not bad per se, but making major decisions affected by the company behind it not being evil brings some drawbacks with it. One of the biggest issue we have currently is having Datastore as our primary database. It was ‘sold’ to our founders by the ex-CTO on being ultra-fast and Google-size scalable (Web-scale anyone?). Even if both of these were correct, it’s not what we or similarly sized companies need.

On the other side, there are many downsides that cripple our development and operations processes, due to which we plan to migrate to a relational database this year. For reference, we use Go and both AppEngine Datastore (our first monolith) and Cloud Datastore libraries to work with Datastore.

Primary reasons for switching to relational database:

  • Lack of data consistency:

Since there are no relations (except for parent/child), managing data becomes hard. You need to either hold IDs (FKs) of other entities or embed data directly.

On some occasions we choose to embed data in a single entity/table, to avoid multiple queries. Problem with this approach is that data gets outdated, e.g. if reference data gets updated the embedded won’t.

Alternatively, keys (IDs) pointing to other entities are not ‘validated’ like foreign keys. If the foreign key is deleted you would only notice on querying, which happens quite often.

An example, say we store a history of tires that were attached to a car as an array of keys representing Tyre entity. If a tire gets deleted, fetching previous tires for a car by keys would fail with an error that entity does not exist. To prevent this, instead of fetching all tire keys at once, we iterate through them and check for error types.

  • Querying is difficult

Query options are very limited. For filtering data, only =, <, <=, >, >= operators are allowed. Operators such as OR, IN, and NOT EQUAL to which you might be used to are missing. In the end, you work with what you have available and do part of the transformation on the query result.

  • ORM

For some, ORMs are a blessing, while others despise it. Not going to take any side on this, do what you like.

When using Datastore, it enforces you to use their official library, which is an ORM. You can’t fetch or update only desired fields, which leads to more queries/slower response.

A problem we encounter almost daily related to this is ErrFieldMismatch error. Datastore requires your ‘models’ to have all fields as in the Datastore. As we connect multiple (micro)services to Datastore, we have to re-deploy all of them every time a new field gets added.

Other problems we’re facing are hard to manage backups and somewhat confusing pricing. If your domain model has little to no relations, you could use Datastore. Working on a relational model with a NoSQL database turns to be quite hard and inefficient.

One of our technology plans for this year is to migrate to Cloud SQL for PostgreSQL. With this, we plan to increase developer’s productivity, simplify codebase and decrease cost. I’ll write a new blog post covering this migration process once it’s completed.

2018 © Emir Ribic - Some rights reserved; please attribute properly and link back. Code snippets are MIT Licensed

Powered by Hugo & Kiss.