Thinking in Tables
What deserves a table, what deserves a column — and the relationships between them
Most database problems are not SQL problems — they are modelling problems. Before you write a single CREATE TABLE, you need to answer three questions: what are the things in my system? What do I know about each thing? And how do those things relate to each other? Get this right and the SQL writes itself. Get it wrong and you spend months fighting your own schema.
Entities and attributes
- Entity → table — User, Order, Product, Invoice, Organisation — each one is a separate table.
- Attribute → column — user.email, order.amount, product.sku — each attribute becomes a typed column on the table.
- Instance → row — One specific user, one specific order — a row is a single real-world instance of the entity.
An entity is a thing. An attribute is something you know about that thing.
Think of an invoice. The invoice itself is the entity — it is the thing that exists. The invoice number, the amount, the due date, the customer name — these are attributes. In database terms: the entity becomes a table, and the attributes become columns. Every row in the table is one instance of that entity — one specific invoice.
The "thing or property" test
When you are not sure whether something deserves a table of its own or just a column, apply this test: Can this thing exist independently? Does it have its own identity? Can it be referenced by more than one other thing? An address on an order: if an address only ever belongs to one order and you never reference it elsewhere, a column (or a JSONB blob) is fine. But if you want to store multiple addresses per user, reuse billing addresses across orders, or query "all orders delivered to Dubai" — now the address is a thing that needs its own table.
When in doubt, normalise first
It is much easier to denormalise later (copy data into a summary column) than to normalise later (split a column out into a new table and migrate all existing data). Start with separate tables and consolidate only when you have a measured performance reason.
The three relationship types
- One-to-many — One user has many orders. The "many" side stores the foreign key: orders.user_id points to users.id. This is the most common relationship in any schema.
- Many-to-many — A user can belong to many organisations, and an organisation can have many users. This requires a join table — often called a membership table — with two foreign keys: user_id and organisation_id.
- One-to-one — One user has one profile. Useful for splitting a wide table or keeping sensitive columns separate. The secondary table stores the primary key of the first: profiles.user_id is both a foreign key and the primary key.
Drawing an entity relationship diagram (ERD)
Before writing any SQL, sketch a diagram. Each entity is a box. Lines connect boxes where relationships exist. A single line on one end and a "crow's foot" on the other means one-to-many. Two crow's feet mean many-to-many. Even a napkin sketch helps. It forces you to name your entities, think through the relationships, and spot missing tables before you are knee-deep in migrations.
What breaks in production: the "everything in one table" trap
The most common first-version mistake is putting everything into a single table. A tasks table with columns like project_name (text), assignee_email (text), and tags (text, comma-separated) looks workable with twenty rows. At twenty thousand rows you discover: - Renaming a project requires an UPDATE across thousands of task rows, and two rows can disagree on the project name after a partial update. - Querying tasks for a specific assignee requires a case-insensitive text match because emails were entered inconsistently. - Filtering by a single tag requires LIKE '%urgent%', which cannot use a B-tree index and scans every row. These are not performance problems. They are data integrity problems that no amount of indexing will fix. The schema is the architecture. Fix it early.
Design the initial TaskFlow schema (naive version)
Write the naive single-table approach — what most beginners start with
CREATE TABLE tasks_naive (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
title text NOT NULL,
project_name text NOT NULL,
assignee_email text,
tags text,
done boolean NOT NULL DEFAULT false,
created_at timestamptz NOT NULL DEFAULT now()
);This is intentionally wrong. We will identify every problem with it before fixing it in later lessons.
Insert two tasks and expose the duplication problem
INSERT INTO tasks_naive (title, project_name, assignee_email, tags)
VALUES
('Design login page', 'TaskFlow MVP', 'priya@example.com', 'design,frontend'),
('Write API docs', 'Taskflow MVP', 'priya@example.com', 'docs,backend');→ Notice "TaskFlow MVP" and "Taskflow MVP" — one uppercase F, one lowercase. Two rows, two versions of the same project name. There is no way for the database to know these are the same project.
Attempt to rename the project — the update problem
UPDATE tasks_naive
SET project_name = 'TaskFlow v1'
WHERE project_name = 'TaskFlow MVP';→ 1 row affected. The second row still says "Taskflow MVP" because the WHERE clause was case-sensitive. The project is now split across two names. This is a data integrity failure — not a bug in the UPDATE, but a consequence of storing facts redundantly.
Attempt to find tasks with the "design" tag — the unsearchable column problem
SELECT id, title FROM tasks_naive
WHERE tags LIKE '%design%';→ 1 row returned. But LIKE '%design%' cannot use any index — it scans every row. At 100k tasks this takes seconds. More importantly, a tag called "redesign" would also match.
Identify what a better schema needs
Problems found: (1) project_name is duplicated — projects need their own table. (2) assignee_email is repeated — users need their own table. (3) tags in a comma-separated column cannot be indexed or queried reliably. (4) There is no concept of a workspace or team — who does this task belong to at the organisation level? We will fix all of this in Lesson 2.
Try this
Design the initial naive schema for TaskFlow — a multi-tenant SaaS project management tool. Write two CREATE TABLE statements: one for users (id, email, created_at) and one for tasks (id, title, project_name as text, assignee_email as text, done, created_at). Then write down at least three problems with this schema: where does the workspace or team live? What happens when a project is renamed? What if a task has multiple assignees? You will fix these problems in each subsequent lesson.