October 28, 2018

Refactoring Gorsk - Why and how

Refactoring source code should be a constant process in software’s lifecycle. I advocate for 20-25% of time spent on developing software to be used on refactoring exclusively. After working with Gorsk in two projects running now in production (one of them being a large SaaS), I’ve found many things I don’t like about it. During the development of those projects I refactored some things, but I decided the base needs to be updated too.

According to Wikipedia, Refactoring is the process of restructuring existing computer code without changing its external behavior. Refactoring improves nonfunctional attributes of the software. Advantages include improved code readability and reduced complexity; these can improve source-code maintainability and create a more expressive internal architecture or object model to improve extensibility.

Even though Gorsk itself is pretty simple and could go without any Refactoring, once I started using it in a larger project some of its drawbacks became obvious to me. Shared platform code (e.g. Database queries) using a shared interface, common code such as middleware, configuration, server being located in cmd, security-related methods (comparing password with hash, hashing) being found under authorization handler, fetching session data code was found under authorization handler and much more.

I’ll go step-by-step and mention things I changed with reasons behind it.

Reusable code moved from cmd/api to pkg/utl

Having code that is re-used among many services in a package that is supposed to contain only the executable part is a no-no.

Now, as it should, the cmd/serviceName contains only few lines of codes, initializing the service.

The pkg directory

At the beginning, all application services in gorsk were found inside root of the project - as advocated by Ben Johnson’s Standard Package layout. I didn’t like it, mainly as project’s root tends to contain many unrelated files such as gitignore, dep/gomod files, ci-cd configurations, docker related files, makefile and others. Keeping everything in the root, especially if working with many domains, tends to quickly get the repository cluttered

Afterward, I moved most of the code under internal. It’s a special folder naming - Go doesn’t allow imports from other projects under the directory named internal.

Now, I further refactored that. I’ve opted for using pkg folder, although I’m not too happy with it. Cmd folder now contains only a single Go file, launching the service, and configuration files (Config files, Dockerfile, PaaS configurations etc.) and pkg contains rest of the source code.

Inside pkg, I created a utl (utility) directory that holds all reusable packages - not actual business logic. Another option is to pull this directory on the root (I wanted to name it either pkg or utl) and keep service directories inside internal or pkg (depends on how you name it).

Next to utl are service directories. If you have cmd/api and cmd/deployment services (binaries), you will have to create api and deployment directories inside pkg. I think this improves code readability and helps with code organization once your services start working with dozens of domains/entities/tables.

The utl directory

Short for util/utility, utl directory holds re-usable code for all services.

The models/entities were moved here, and now the package is named [gorsk](https://github.com/ribice/gorsk). I’ve found model not to be good, especially if I have many projects with same package name for models - model, I waste time importing the correct one.

Configuration, server and middleware related code was moved here. Middleware is now further refactored into separate packages, e.g. jwt and secure are two packages instead of one, under the middleware directory. Mock and RBAC packages were moved from internal to here.

Platform was restructured completely (more about that later), but packages from platform were moved into utl as well.

SwaggerUI was not go-related, so I moved it into separate directory named assets.

Platform packages

Previously the platform directory contained three packages - structs (provides the function to merge two different structs with same field names), query (might get removed, used for preparing database queries depending on user role) and postgres.

The first two are not ‘platform/infrastructure’ packages (working with other platforms, such as database, redis, pubsub etc.), and as such were moved to pkg/utl.

The postgresql package is a platform one, but having reusable platform code is bad. Some services may need the platform methods to be different compared to other services, they need only a few of them etc. Defining the interface inside the package where you use it is far better and more readable than having shared interfaces. Now it’s much easier finding exactly what you need instead of lurking in the shared platform directory.

The initialization part (getting the db connection) is a reusable code and was moved into utl directory.

The downside is that it adds some code duplication (in a way going against DRY principle).

Now, the api/user service has its own platform directory, that contains postgresql related code (only queries) and should contain similar packages such as redis, email, rabbitmq etc. For all these platforms, initialization code should be put inside pkg/utl directory, into their own packages.

I’ve thought of renaming this package from platform to infra (from infrastructure), making imports shorter, but for now, I’ll keep it named platform as it’s better at self-describing.

The transport packages

As code from cmd/api was refactored to contain only initialization part, the service and request packages were moved and now every service contains own transport package that contains the functionality of these two.

  • service package (named badly previously) and request were merged into transport, but instead of being packages that contain code for all services, now they are separated per service. This lead to simpler names (Create/CreateReq instead of CreateUser/CreateUserReq), improved readability (not having to look at separate packages for one functionality), and better package structure (a service has everything it needs under a single directory - request, transport, platform, business code). This lead to removing the swagger package as well, now the needed annotations are located in transport package.

Gorsk has only HTTP related methods inside transport package. You may need other transport methods that will be put here, like AMQP (message broker), gRPC, pub/sub (Redis, RabbitMQ, NATS ….)

Service.go

Each service now has a new file named service.go. It contains the service interface (not present before) that will be used for one of the further feature additions (logging & metrics), interfaces for service/platform integrations and initializations.

Even though these are not needed in Gorsk, in larger projects where a single service works with dozens of platforms/database tables, the main file was cluttered with interfaces that decreased readability.

Next to New (that is used now used solely by tests), I’ve added Initialize that is used by transport package. Since each service has its own platform packages now, there is no need to initialize platform services independently in the main.go/serviceName.go, thus decreasing the clutter.

Initialiasing config

I’ve realized that passing *config.Something to a service initializer is a bad practice I need to avoid. As an example, prior to the refactor, the postgres database initializer expected single parameter, *config.Database. I replaced with actual values, string psn, int timeout and bool debug..

The utl/postgres package shouldn’t know and depend on config package. Later on you may change database initialization from config file to environment variables. By following this approach nothing needs to be changed in utl/postgres package.

Refactoring auth service

Previously, the auth service had methods for getting the user from session/jwt (User), hashing (Hash) and comparing text to hash (HashMatchesPassword). None of this belongs in the authorization/iam service.

Hashing methods have been moved to utl/secure package, and two new methods were added there - checking whether password is secure enough using Dropbox’s zxcvbn library, and creating random Token’s using sha1 (or any other) hashes instead of xid.

The User method has been moved to utl/rbac package, although it still needs a better place.

Removing account service and adding password one

The account service had endpoints for creating Account/User and changing passwords. The routing was ‘misaligned’ due to this, and it made no sense.

The user create endpoint was moved to user service, and change password to new service - password. This should (and maybe will) contain ResetPassword and ConfirmResetPassword methods.

Database tests

The database tests were changed - previously one docker container was instantiated and all tests were executed on it. This lead to a problem - each test was dependent on previous ones which is suboptimal (this worked for Gorsk but would be impossible to maintain for larger codebase).

Now, every table test starts its own container. This leads to more accurate tests but they run longer due to start-up time. Alternatively, one container can be shared between many tests, but the database should be wiped between the tests.

Further work

There are plenty of things that I’d still like to implement in Gorsk.

1 - Improve logging:

This will be done after releasing refactored Gorsk. Currently, only request info (endpoint, method, time-took, etc.) and database queries are logged, which is not enough. Request/response can be easily logged in middleware, but then sensitive data would be logged as well.

Instead, the application services will be wrapped (that’s why there is an implement) with a logging service and passed to transport. In the logging service, you can log things exactly what and how you want.

2 - Improve error handling:

Go’s standard library is great, but both logging and error handling are subpar (hopefully this changes with Go 2). Since there is no stack trace, error context gets easily lost and hard to debug. You either shadow it by new error or return it and have no idea where it came from.

And if you want to return exact errors, the client will get error messages from the database, 3rd party services and other errors you don’t want to return to the client.

Go 2 should provide the ability for wrapping errors, but until then pkg/errors can be used.

3 - Improve database transactions:

Transactions are already implemented using go-pg’s RunInTransaction method, but I’d like to avoid it for two reasons: simpler and testable functions. Currently, tests are unable to run if there is a transaction unless the *pg.DB parameter holds a connection to a real database (can’t be nil).

Transactions can be moved to context and pulled from there when needed. For this change, the echo.Context (or context.Context) needs to be passed to database implementations as well.

4 - Removing go-pg and echo:

In an ideal scenario, a Golang starter kit should depend on as few as possible libraries, including Echo (routing) and go-pg (database). Those should be replaced with the standard library equivalents, or at most something like gorilla/mux (routing) and sqlx (database). I might consider replacing go-pg with sqlx in the future. Replacing echo with stdlib + gorilla/mux would require quite an effort, and I have no plans to do that currently.

I feel there is still lots of room for improving Gorsk. I’m very open to critiques and suggestions. You can send them via email, GitHub issues and Twitter.

2018 © Emir Ribic - Some rights reserved; please attribute properly and link back. Code snippets are MIT Licensed

Powered by Hugo & Kiss.