Foreign Keys in Relational Databases: Concepts, Benefits, and Best Practices
The real question isn't whether foreign keys are slow or whether they make sharding harder. It's who you want responsible for your data being correct: the database, or every developer who will ever touch your codebase. I've watched teams have the wrong version of this argument for twenty years.
Examples in this piece are in PostgreSQL, MySQL, and SQL Server. Those are the three I've shipped against most.
What a foreign key actually does
A foreign key is a column (or set of columns) in one table that references the primary key of another. It enforces a simple rule: no value can appear in the child column unless that value already exists in the parent. If you try to insert an order with a customer_id that doesn't exist in customers, the database rejects the insert. If you try to delete a customer who still has orders, the database rejects the delete (unless you've told it to cascade).
That's the whole guarantee. Cascades, indexing, performance trade-offs, all of it follows from that one rule.
It's worth separating foreign keys from primary keys, because people conflate them. A primary key uniquely identifies a row inside its own table. It cannot be null. A foreign key points at a primary key from another table. It can be null and it can repeat. A table has exactly one primary key. It can have as many foreign keys as it needs.
Why I default to using them
The case for foreign keys is unglamorous. They prevent a class of bugs that are easy to introduce and miserable to clean up.
Orphan rows. Without a foreign key, nothing stops you from accumulating order rows that point to customer IDs that no longer exist. The application code is fine. The query that builds the dashboard is fine. Then one day a report joins orders to customers and the numbers are off, and you spend a week figuring out that an import script three years ago deleted a customer without deleting their orders.
Race conditions. Two concurrent operations, one inserting a child row, one deleting the parent, will produce inconsistent state if you're enforcing the rule in application code. With a foreign key, the database serializes the check. One of the two transactions rolls back. You get to write code as if the race doesn't exist.
One rule, one place. The integrity rule lives in the schema. Every client of the database, your app, your reporting tool, the SQL someone runs in production at 11pm, is held to the same constraint. Application-level rules live in code, which means they live in whichever code paths the original author remembered to add them to.
Cascading behavior is part of the contract too. ON DELETE CASCADE deletes child rows automatically when their parent goes. ON DELETE SET NULL nullifies the link instead. ON DELETE RESTRICT blocks the parent delete while children exist. Pick deliberately. Cascade is convenient until the day someone deletes a top-level row and removes ten million dependent rows in a single transaction.
Where they cost you
Foreign keys are not free. The cost shows up in three places.
Write overhead. Every insert, update, and delete on a constrained table involves a check. Insert a child row, the database verifies the parent exists. Delete a parent, the database checks for surviving children. At small scale this is invisible. At hundreds of thousands of writes per second, it adds up. Bulk loads are slower with constraints enabled, which is why most ETL pipelines disable them during import and re-enable them after.
Schema change friction. You can't change the type of a column that's involved in a foreign key without dropping the constraint first. Renaming a primary key requires dropping every foreign key that references it, renaming, then re-adding them. This is manageable but it means schema migrations involve more steps.
Distributed systems. Foreign keys assume the parent and child live in the same database. They don't work across shards. They don't work across microservice boundaries. If your architecture splits data across nodes, you either keep foreign keys within each shard for local relationships, or you accept that referential integrity is now your application's problem.
None of these are deal-breakers. They just mean foreign keys aren't a free lunch. If you're running a typical web application, the cost is negligible compared to the bugs they prevent. If you're operating at the scale where engineers measure write latency in microseconds, you're already past the point where blog posts can advise you.
Implementations
The syntax is mostly portable. The differences that bite are around indexing behavior and the drop-constraint syntax.
PostgreSQL
-- Parent table
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name TEXT NOT NULL
);
-- Child table with foreign key constraint
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE,
amount NUMERIC,
customer_id INT,
CONSTRAINT fk_order_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
ON DELETE CASCADE
ON UPDATE NO ACTION
);
Name your constraints. The default name Postgres generates is fine until you need to drop the constraint and can't remember what it is. fk_order_customer is self-documenting; orders_customer_id_fkey is not.
Postgres does not automatically index the foreign key column on the child side. The parent side is already indexed because it's the primary key, but orders.customer_id will not get an index unless you create one. Every join from orders to customers and every delete of a customer will benefit from that index. Create it.
To add or drop later:
ALTER TABLE orders
ADD CONSTRAINT fk_order_customer
FOREIGN KEY (customer_id) REFERENCES customers(customer_id);
ALTER TABLE orders DROP CONSTRAINT fk_order_customer;
MySQL
CREATE TABLE customers (
customer_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL
) ENGINE=InnoDB;
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
order_date DATE,
amount DECIMAL(10,2),
customer_id INT,
CONSTRAINT fk_order_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
ON DELETE RESTRICT
ON UPDATE CASCADE
) ENGINE=InnoDB;
Two things to know about MySQL.
First, foreign keys only work on InnoDB. MyISAM will parse the syntax and silently ignore the constraint. On any modern MySQL install InnoDB is the default, but it's worth checking if you've inherited a legacy schema.
Second, MySQL auto-indexes the child column. If you don't index orders.customer_id yourself, InnoDB creates one for you, named after the constraint. This is different from Postgres and SQL Server.
The drop syntax is also different:
ALTER TABLE orders DROP FOREIGN KEY fk_order_customer;
DROP CONSTRAINT is not valid in MySQL for foreign keys. If you didn't name yours, MySQL generated a name like orders_ibfk_1, which you can find with SHOW CREATE TABLE orders.
SQL Server
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY IDENTITY(1,1),
Name NVARCHAR(255) NOT NULL
);
CREATE TABLE Orders (
OrderID INT PRIMARY KEY IDENTITY(1,1),
OrderDate DATE,
Amount DECIMAL(10,2),
CustomerID INT FOREIGN KEY REFERENCES Customers(CustomerID)
ON DELETE NO ACTION
ON UPDATE NO ACTION
);
SQL Server follows the Postgres pattern: it does not auto-index the foreign key column. Add a nonclustered index on the child column if you're going to join on it, which you almost always will.
Drop syntax matches Postgres:
ALTER TABLE Orders DROP CONSTRAINT fk_order_customer;
SQL Server also lets you temporarily disable a constraint with NOCHECK CONSTRAINT and re-enable with CHECK CONSTRAINT. Useful for bulk loads where you're confident the data is clean. Dangerous if you're not.
What happens when you skip them
Early in my career at a Louisville dev shop, I inherited a system that had run for years without a single foreign key anywhere in the schema. Some of that absence was deliberate. Most of it wasn't. The pattern when it isn't deliberate is consistent.
Year one, the schema looks clean. Application code does the right checks. Everyone is careful.
Year two, someone writes a script to bulk-import data. The script doesn't run through the application layer. A few rows go in with invalid references. Nobody notices.
Year three, a feature ships that assumes every child row has a parent. It crashes on the bad rows from year two. The fix is a query to find and clean up orphans, which finds many more than expected.
Year four, someone proposes adding foreign keys to lock down the schema. The migration won't run because there are now hundreds of thousands of orphan rows scattered across a dozen tables, each requiring its own data cleanup before the constraint can be applied. The project is shelved.
This is the actual cost of skipping foreign keys: not a single bug, but an accumulating debt that becomes impossible to pay off. Adding integrity to a database that has lived without it is much harder than starting with it.
If you are going to skip foreign keys, do it knowingly, for a specific reason, and put alternative safeguards in place. Background jobs that scan for orphans. Application-layer transactions that lock the relevant rows. A clear convention about which service owns which data and how cross-service deletes propagate. None of these are as good as a database constraint. All of them are better than nothing.
Database-enforced vs application-enforced
This is where most teams have the actual argument. The trade-off looks like this.
Database-enforced integrity is declarative. You write the rule once and it runs against every transaction, forever, regardless of how many clients connect to the database or who wrote them. The database is designed for this. It will not forget.
Application-enforced integrity is imperative. You write checks at every point where data could become invalid. Insert handlers check that parents exist. Delete handlers check for and handle children. New endpoints, new background jobs, new admin scripts all have to remember the rules. Some of them won't.
The case for application-enforced integrity is real but narrow:
- Your data is distributed across services or shards and a single database constraint isn't possible.
- You're loading data from external sources where rejecting rows is more expensive than accepting them and reconciling later.
- You need soft-delete or conditional-relationship semantics that foreign keys can't express.
- You're operating at write volumes where the constraint check is a measurable bottleneck.
Outside of those cases, the database is better at this than your application code. Use it.
What I'd tell you to do
Foreign keys are a way of telling the database about your data model. The database then enforces the model on your behalf, against every client, every transaction, every script anyone will ever write against the schema. That's a lot of leverage for a one-line constraint declaration.
The cost is real but bounded. Slower writes by a small amount. More steps in some schema migrations. No cross-database enforcement. For the vast majority of applications, those costs are worth paying. For the small minority where they aren't, you should know exactly why you're opting out and what you're putting in place instead.
Build the integrity in from day one. Retrofitting it later is the part that's hard.
Tyler Queen is Principal Software Engineer leading AI integration at a private organization and founder of Pixel Guild. He's been shipping against Postgres, MySQL, and SQL Server since the dot-com era.