Skip to main content

Things to know about development

OOP

Programming paradigm based on the concept of objects, which can contain data and code: data in the form of fields (often known as attributes or properties), and code in the form of procedures (often known as methods)

Polymorphism

One interface, multiple implementations

Parametric

In Java, it is implemented using inheritance. The child class inherits the method signatures of the parent class, but the implementation of these methods can be different to suit the specifics of the child class. This is called method overriding. Other functions can operate on the parent class object, but one of the child classes will be substituted for it at runtime - late binding

Ad hock

When methods with the same signature take different parameters as input. This is called method overloading

Inheritance

An abstract data type can inherit the data and functionality of some existing type, facilitating the reuse of software components

Encapsulation

Hiding the internal implementation of the class and separating it from the external user interface

Abstraction

Highlighting significant information and excluding insignificant information from consideration. OOP considers only data abstraction, implying a set of the most significant characteristics of an object that are available to the rest of the program

SOLID

S — Single Responsibility

A class should have a single responsibility

If a Class has many responsibilities, it increases the possibility of bugs because making changes to one of its responsibilities, could affect the other ones without you knowing

O — Open-Closed

Classes should be open for extension, but closed for modification

Changing the current behaviour of a Class will affect all the systems using that Class. If you want the Class to perform more functions, the ideal approach is to add to the functions that already exist NOT change them

L — Liskov Substitution

If S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program

The child Class should be able to process the same requests and deliver the same result as the parent Class, or it could deliver a result that is of the same type

I — Interface Segregation

Clients should not be forced to depend on methods that they do not use

This principle aims at splitting a set of actions into smaller sets so that a Class executes ONLY the set of actions it requires

D — Dependency Inversion

High-level modules should not depend on low-level modules. Both should depend on the abstraction

Abstractions should not depend on details. Details should depend on abstractions

This principle says a Class should not be fused with the tool it uses to execute an action. Rather, it should be fused to the interface that will allow the tool to connect to the Class

It also says that both the Class and the interface should not know how the tool works. However, the tool needs to meet the specification of the interface

ACID

ACID Database Properties

A — Atomicity

All operations in a transaction succeed or every operation is rolled back.

C — Consistency

On the completion of a transaction, the database is structurally sound.

I — Isolation

Transactions do not contend with one another. Contentious access to data is moderated by the database so that transactions appear to run sequentially.

D — Durability

The results of applying a transaction are permanent, even in the presence of failures.

KISS

KISS and DRY Principles in Software Engineering

Keep It Simple, Stupid: This principle advocates for simplicity in design. It suggests that systems should be kept as simple as possible, avoiding unnecessary complexity

DRY

KISS and DRY Principles in Software Engineering

Don't Repeat Yourself: DRY principle states that every piece of knowledge or logic should have a single, unambiguous representation within a system. It encourages code reuse and helps in maintaining consistency

YAGNI

The Principles of Clean Code: DRY, KISS, and YAGNI

You Ain't Gonna Need It: YAGNI advises against adding functionality until it's actually needed. It discourages developers from implementing features based on speculative future requirements

OLTP vs OLAP

FeatureOLTP (Online Transaction Processing)OLAP (Online Analytical Processing)
PurposeHandle real-time transactionsAnalyze large volumes of historical data
FocusFast reads/writes for day-to-day operationsComplex queries for business insights
SpeedOptimized for transactional speed and consistencyOptimized for query performance and flexibility
Data freshnessReal-timeNear-real-time or batch updated

Gradle Lifecycle

Gradle goes through three main phases during build

Initialization ➝ Configuration ➝ Execution

  1. Initialization
  • Gradle determines which projects are involved
  • Gradle reads settings.gradle.kts
  1. Configuration Phase
  • Gradle evaluates all build.gradle(.kts) scripts
  • All tasks are created and configured
  • The entire task graph is built, even if you only run one task
  • This phase sets up inputs/outputs, dependencies, and logic
  1. Execution
  • Only the tasks explicitly requested, and their dependencies, are executed
  • Each task's doFirst, doLast, or actions are called here
  • Gradle checks if a task is up-to-date via inputs/outputs - if so, it can be skipped (incremental build)

Normal Forms in DBMS

Normalization is a systematic approach to organize data within a database to reduce redundancy and eliminate undesirable characteristics such as insertion, update, and deletion anomalies. Normal Forms are different stages of normalization, and each stage imposes certain rules to improve the structure and performance of a database.

Why?

  • Reduces Data Redundancy: duplicate data is stored efficiently, saving disk space and reducing inconsistency
  • Improves Data Integrity: ensures the accuracy and consistency of data by organizing it in a structured manner
  • Simplifies Database Design: by following a clear structure, database designs become easier to maintain and update
  • Optimizes Performance: Reduces the chance of anomalies and increases the efficiency of database operations

Stages

Each stage must satisfy the previous stage's requirements.

  1. First Normal Form (1NF): Eliminating Duplicate Records
  • All columns contain atomic values (i.e., indivisible values)
  • Each row is unique (i.e., no duplicate rows)
  • Each column has a unique name
  • The order in which data is stored does not matter
  1. Second Normal Form (2NF): Eliminating Partial Dependency

Every non-prime attribute (non-key attribute) must depend on the entire primary key, not just a part of it.

  1. Third Normal Form (3NF): Eliminating Transitive Dependency

Non-prime attributes should not depend on other non-prime attributes.

  1. Boyce-Codd Normal Form (BCNF): A stronger form of 3NF

A stricter version of 3NF where for every non-trivial functional dependency (XY), X must be a superkey (a unique identifier for a record in the table).

  1. Fourth Normal Form (4NF): Removing Multi-Valued Dependencies

A table is in 4NF if it is in BCNF and has no multivalued dependencies. A multivalued dependency occurs when one attribute determines another, and both attributes are independent of all other attributes in the table.

  1. Fifth Normal Form (5NF): Eliminating Join Dependency

When a table is in 4NF and all join dependencies are removed. This form ensures that every table is fully decomposed into smaller tables that are logically connected without losing information.

Over-Normalization

Excessive normalization can lead to

  • Complex Queries: Too many tables may result in multiple joins, making queries slow and difficult to manage
  • Performance Overhead: Additional processing required for joins in overly normalized databases may hurt performance, especially in large-scale systems

In many cases, denormalization (combining tables to reduce the need for complex joins) is used for performance optimization in specific applications, such as reporting systems.

Metric Structure

Every metric is uniquely identified by its metric name and optional key-value pairs called labels.

Names

  • Metric names SHOULD specify the general feature of a system that is measured (e.g. http_requests_total - the total number of HTTP requests received)
  • Metric names MAY use any UTF-8 characters
  • Metric names SHOULD match the regex [a-zA-Z_:][a-zA-Z0-9_:]* for the best experience. Metric names outside of that set will require quoting e.g. when used in PromQL

Labels

  • Label names MAY use any UTF-8 characters
  • Label names beginning with __ (two underscores) MUST be reserved for internal Prometheus use
  • Label names SHOULD match the regex [a-zA-Z_][a-zA-Z0-9_]* for the best experience. Label names outside of that regex will require quoting e.g. when used in PromQL (see the UTF-8 guide)
  • Label values MAY contain any UTF-8 characters
  • Labels with an empty label value are considered equivalent to labels that do not exist

Metric Types

Counter

A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.

Gauge

A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. Gauges are typically used for measured values like temperatures or current memory usage, but also "counts" that can go up and down, like the number of concurrent requests.

Histogram

A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values. A histogram with a base metric name of <basename> exposes multiple time series during a scrape

  • cumulative counters for the observation buckets, exposed as <basename>_bucket{le="<upper inclusive bound>"}
  • the total sum of all observed values, exposed as <basename>_sum
  • the count of events that have been observed, exposed as <basename>_count (identical to <basename>_bucket{le="+Inf"} above)

Summary

Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window. A summary with a base metric name of <basename> exposes multiple time series during a scrape:

  • streaming φ-quantiles (0 ≤ φ ≤ 1) of observed events, exposed as <basename>{quantile="<φ>"}
  • the total sum of all observed values, exposed as <basename>_sum
  • the count of events that have been observed, exposed as <basename>_count

Summary vs Histogram

If we use a summary, we control the error in the dimension of φ. If we use a histogram, we control the error in the dimension of the observed value (via choosing the appropriate bucket layout). With a broad distribution, small changes in φ result in large deviations in the observed value. With a sharp distribution, a small interval of observed values covers a large interval of φ.

Two rules of thumb

  1. If you need to aggregate, choose histograms
  2. Otherwise, choose a histogram if you have an idea of the range and distribution of values that will be observed. Choose a summary if you need an accurate quantile, no matter what the range and distribution of the values is

Database Partitioning & Sharding

AspectPartitioningSharding
DefinitionDividing a single database/table into parts (partitions) based on a key or strategySplitting data across multiple databases or servers (shards)
ScopeWithin a single database instanceAcross multiple database instances
Managed ByUsually the database engine itselfOften requires application-level logic or external middleware
GoalImprove query performance and data organizationEnable horizontal scaling, handle very large datasets
Failure DomainAll partitions are on the same DB — one server fails, all partitions are downEach shard is independent — one shard down ≠ total failure
ExamplesTable partitioning in PostgreSQL by date rangeUser data sharded by user ID hash across 5 database servers

OSI

#Layer NameProtocol Data UnitFunction
7ApplicationDataHigh-level protocols such as for resource sharing or remote file access, e.g. HTTP
6PresentationDataTranslation of data between a networking service and an application; including character encoding, data compression and encryption/decryption
5SessionDataManaging communication sessions, i.e., continuous exchange of information in the form of multiple back-and-forth transmissions between two nodes
4TransportSegmentReliable transmission of data segments between points on a network, including segmentation, acknowledgement and multiplexing
3NetworkPacket, DatagramStructuring and managing a multi-node network, including addressing, routing and traffic control
2Data linkFrameTransmission of data frames between two nodes connected by a physical layer
1PhysicalBit, SymbolTransmission and reception of raw bit streams over a physical medium