Kickstart my code or how to become a wannabe one developer army (Part 1)

As a developer, I’ve spent most of my time working in a comfortable environment. Except for writing code, every part of the process of creating a product has already been taken care of. I didn’t need to configure CI/CD, think about the tech stack, bother with the deployment process, monitoring or whatever. Just write code, click “merge” once it’s been approved, and my job is done.

Some time ago, however, I’ve decided to create a complete solution from scratch. Today, I can finally invite you to join me in rediscovering the path I’ve taken while attempting to be a more conscious developer. I’m going to show you what I’ve achieved, what my decision-making process looked like, where I’ve struggled the most, and where I needed to retreat. And that happened often because as usual, I was relying heavily on the trial and error method.

The general idea and the matching tech stack

I wanted to build a very simple application containing minimum “business logic” to let me focus on all other aspects of development. Bearing that in mind, I’ve created an app consisting of two endpoints:

  1. listing visitors,
  2. adding visitors and incrementing their visits count.

This was simple enough not to spend too much time on coding itself and complex enough to face challenges that needed to be taken up. Like choosing a tech stack.

I’ve decided to go with the following technologies:

  • Flask – I needed a simple framework and Django, which I was more familiar with, was simply too complex for the purpose of my project,
  • Flake8 and Unittest – simple tools to do some testing stuff,
  • SQLAlchemy – a mature ORM that integrates nicely with the framework of my choice,
  • PostgreSQL – the project was so simple that for development purposes even SQLite would be enough. Nonetheless, I wanted to actually deploy it and to do so, I needed something I could manage. I went for PostgreSQL because I am familiar with this particular database and it has a good adoption among cloud providers.
  • Docker – I needed a tool to manage dependencies and this one is so standard I can’t imagine working with anything else. I worked with Vagrant before and in this case, it would be overkill – after all, I don’t need to virtualize the entire machine.
  • Docker-swarm – here I faced a hard choice between this orchestration tool and Kubernetes. However, taking into account the price and simplicity of the tool as well as the fact that I didn’t need anything really big, docker-swarm won.
  • Gitlab for hosting code – what convinced me here was the built-in CI/CD mechanism which took some work off my back. I didn’t need to set up Jenkins or integrate any 3rd party tools.
  • AWS – to be honest, I have mixed feelings when it comes to Amazon Web Services. On one hand, it offers a wide range of products and decent free tier limits; on the other, the learning curve seems to be quite steep.
  • Nginx + Gunicorn – I used these two as a load balancer and gateway server respectively since they integrated smoothly.
  • Ansible – I chose this one for automating the deployments out of curiosity. I wanted to see how it works and I’ve heard it was OK.
  • New Relic – for monitoring as its free plan was enough and integration was simple.

That’s it when it comes to tech stack. As you can see, some of the decisions were well-thought-out while others were made pretty quickly – because I knew I wanted to stick with something I’m already familiar with or I wanted to get to know new tools. Also, I didn’t add any frontend since I wanted to focus on the DevOps/administrative part. Still, learning a JavaScript framework may become one of my goals for the future.

Getting down to coding

Let’s go to GitLab and take a look at what the code looked like at the beginning of the project (I’ll be pointing to important commits since my flow changed during the development cycle and some parts were very messy). Here’s what I had back then:

  • Function for creating an application from settings,
  • Function for registering urls,
  • Command to create a database,
  • Model of Visitor,
  • 2 views handling incrementing the visits count and listing visitors,
  • Unit tests.

Some decisions made at this point were all right, e.g. extracting the creation process to separate functions or managing the database in unit tests. Unfortunately, as far as I know, PostgreSQL does not support in-memory databases like SQLite or MySQL do but I liked my solution using context managers and emptying tables after each test.

Other decisions, however, turned out to be worse.

First of all, the model which stored the number of visits and the way those visits were counted:

class Visitor(db.Model):
ID = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(80), unique=True)
visits = db.Column(db.Integer(), default=1)

And the way it was actually done:

def increment_visits(username: str) -> dict:
visitor = Visitor.query.filter_by(username=username).first()
if visitor is None:
visitor = Visitor(username=username, visits=1)
db.session.add(visitor)
else:
visitor.visits += 1
db.session.commit()
return {"id": visitor.ID, "username": visitor.username, "visits": visitor.visits}

You can immediately spot the race condition here, which actually happened. When doing benchmarks when I made, let’s say, 100 requests I ended up with the application state saying that there were about 70 visits. What’s more, both this and the other view had entire logic wrapped inside which ultimately made it impossible to subject them to unit testing. As a result, I needed to test entire views.

The next problem was improper handling of database connections. I did not close them and I had a big pool recycle time, which resulted in connection errors occurring after some time of idleness. In fact, I didn’t discover this issue until I actually deployed the project.

Another – fortunately, smaller – thing was the poor handling of the arguments passed to the script. This remains to be improved.

The last big issue was the lack of migrations. When I finally decided to address the first problem, I had a hard time going back and forth with the database state. Luckily, I could just drop the entire database although it wouldn’t be possible in the real-world scenario.

Introducing code improvements

Now that you know where I started from, it’s time to move on to the improved version of the code which can be found here.

The most important part of what’s been changed is located in models.py file. The race condition issues meant that I needed either to set a lock on the table or find another solution. My idea was to introduce some quasi-event sourcing architecture:

class Visitor(db.Model):
ID = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(80), unique=True)


class Visit(db.Model):
ID = db.Column(db.Integer, primary_key=True)
visitor = db.Column(db.Integer, db.ForeignKey('visitor.ID'), index=True)

Now, visits are counted as separate events. Each hit at the endpoint for incrementing them adds a new event (Visit). Visit has an index on the relation column to provide faster queries. I am only talking about quasi-event sourcing because what you see here is an extremely simplistic case. No additional databases or complicated data aggregating are involved.

Going for such architecture meant that I needed to change the way I was providing data to the user and saving it. Since I already had quasi-event sourcing, I needed to properly separate queries and commands (CQRS), ideally kicking them out of the views.

def index():
results = get_all_visits()
return {"results": results}


def increment_visits(username: str) -> dict:
if username == "favicon.ico":
return {}
visitor = get_visitor_by_username(username)
increment_visits_for_visitor(visitor)
return {
"id": visitor.ID,
"username": visitor.username,
"visits": get_visits_for_visitor(visitor)
}

This looks much cleaner and the logic is just where it should be. For example, delivering the count of visits per user for all users is now:

def get_all_visits() -> list:
visitors = db.session.query(
Visitor.ID,
Visitor.username,
func.count(Visit.ID).label('visits')
).outerjoin(
Visit
).group_by(
Visitor.ID
)
db.session.close()
response = [
{
'id': visitor.ID,
'username': visitor.username,
'visits': visitor.visits
}
for visitor in visitors
]
return response

The benefit is that the view does not know what database is used underneath. It just asks for some data and receives a simple structure for it.

On the command side, this is how adding the Visit event is handled:

def increment_visits_for_visitor(visitor: Visitor):
db.session.add(Visit(visitor=visitor.ID))
db.session.commit()

I also solved the problem of expiring sessions I’ve talked about earlier by adjusting the database settings:

DB_POOL_SIZE = int(os.environ.get('DB_POOL_SIZE', '10'))
DB_POOL_RECYCLE = int(os.environ.get('DB_POOL_RECYCLE', '60'))
SQLALCHEMY_DATABASE_URI = f'postgresql://{DB_USERNAME}:{DB_PASSWORD}@{DB_HOST}/{DB_NAME}'
SQLALCHEMY_ENGINE_OPTIONS = {
'pool_size': DB_POOL_SIZE,
'pool_recycle': DB_POOL_RECYCLE,
}

Unfortunately, the expiring session error is the one I don’t fully understand. I must admit that my code works and I don’t know exactly why. Well, I guess I’ll have to put it somewhere on my to-learn list.

Of course, the above-mentioned changes required me to think of a way to provide an interface to create migrations and apply them:

if __name__ == '__main__':
if 'createdb' in sys.argv:
app.app_context().push()
db.create_all()
elif 'init' in sys.argv:
app.app_context().push()
init()
elif 'migrate' in sys.argv:
try:
message = sys.argv[2]
app.app_context().push()
migrate(message=message)
except IndexError:
print('Usage: python main.py migrate <message>', flush=True)
elif 'upgrade' in sys.argv:
app.app_context().push()
upgrade()
elif 'downgrade' in sys.argv:
app.app_context().push()
downgrade()
else:
app.run(host=HOST, port=PORT, debug=DEBUG)

As I’ve said, this part of the application responsible for handling command-line arguments is still in need of improvements but it’s not that important for now.


That’s it for part 1. Next on the agenda: Docker, Swarm, testing, and monitoring performance. So if you’re curious how the project turned out, stay tuned for Part 2!

Care to hone your coding skills? Join our team and let’s do it together!

Navigate the changing IT landscape

Some highlighted content that we want to draw attention to to link to our other resources. It usually contains a link .