Composition in SQL; Primary keys

Summary

This postmortem analyzes a common architectural flaw in relational database design: mistaking object-oriented composition for a database primary key strategy. The core issue arises when developers attempt to enforce domain constraints (like a receipt having exactly one specific item) using database keys in a way that violates First Normal Form (1NF) and creates brittle, unscalable schemas. The prompt describes a ReceiptItem with a composite key that likely attempts to encompass a Ticket identifier, leading to “bad constraints.”

Root Cause

The root cause is a direct translation of UML aggregation semantics into SQL primary keys without understanding the underlying relational algebra.

  • Overloaded Primary Keys: The developer attempts to use idReceipt as a primary key for ReceiptItem to enforce the “composition” relationship (i.e., an item cannot exist without a receipt).
  • Ticket ID Ambiguity: The prompt mentions idTicket in the item but implies the Receipt contains identifiers (JMBGSeller, JMBGFan) that might be intended to constrain the ticket ownership.
  • Violation of 1NF: If the developer attempts to include multiple columns in the Primary Key to satisfy the “0..*” relationship with Ticket while maintaining Receipt composition, they are creating a composite key that is likely unstable.

Why This Happens in Real Systems

Developers often carry over Entity-Relationship (ER) diagram habits or OOP identity patterns into SQL.

  • The “Strong Entity” Fallacy: In UML, ReceiptItem is a “Strong Entity” dependent on Receipt. Developers often map this by making idReceipt part of the ReceiptItem Primary Key.
  • Primary Key vs. Foreign Key Confusion: The developer sees idTicket (the foreign key) and thinks it needs to be unique across the whole system. They might try to combine idReceipt and idTicket into a composite key. If idTicket is already a surrogate key (e.g., UUID or Auto-Increment), adding idReceipt is redundant and bloats indexes on every related table (e.g., AuditLogs, Payments).
  • Missing Foreign Key Definitions: A “bad constraint” usually means the database does not physically enforce the relationship. If ReceiptItem is created with idReceipt but no FOREIGN KEY constraint exists, orphan records will appear when a Receipt is deleted.

Real-World Impact

  • Performance Degradation: Using composite keys (e.g., (idReceipt, idTicket)) for ReceiptItem forces the database to carry these values into every child table (e.g., ReceiptItem_Tax, ReceiptItem_Discount). This significantly increases storage and memory usage for indexes.
  • Update Anomalies: If idTicket is part of the Primary Key, you cannot change the ticket assigned to a receipt line item without breaking the key. In real systems, tickets get swapped or refunded, requiring key updates, which is a heavy operation and risks data corruption.
  • Query Complexity: Fetching a specific Ticket across all receipts requires scanning a composite index rather than a direct lookup on idTicket.

Example or Code

Below is the comparison between the “Bad” approach (trying to force composition with keys) and the “Good” approach (standard relational design).

The Bad Approach (Composite Keys & Redundancy)

Here, idReceipt is forced into the Primary Key of Ticket, which is terrible practice if Ticket is a reusable or unique entity.

-- BAD: Trying to force composition by making Ticket dependent on Receipt
CREATE TABLE Receipt (
    idReceipt INT PRIMARY KEY,
    totalAmount DECIMAL(10,2),
    purchaseDate DATETIME
);

CREATE TABLE Ticket (
    idTicket INT,
    idReceipt INT NOT NULL,
    price DECIMAL(10,2),
    -- BAD: Composite PK makes updates hard and bloats children
    PRIMARY KEY (idTicket, idReceipt), 
    FOREIGN KEY (idReceipt) REFERENCES Receipt(idReceipt)
);

The Good Approach (Surrogate Keys & Standard FKs)

Here, Ticket is an independent entity. ReceiptItem handles the relationship (the “lines” on the bill). This supports the 0..* relationship correctly.

-- GOOD: Independent Entities
CREATE TABLE Receipt (
    idReceipt INT PRIMARY KEY,
    totalAmount DECIMAL(10,2),
    purchaseDate DATETIME
);

-- Ticket exists independently
CREATE TABLE Ticket (
    idTicket INT PRIMARY KEY,
    price DECIMAL(10,2),
    status VARCHAR(20)
);

-- The "Linking" table (often called ReceiptLine in ERP systems)
CREATE TABLE ReceiptItem (
    idReceiptItem INT PRIMARY KEY, -- Surrogate key is best here
    idReceipt INT NOT NULL,
    idTicket INT NOT NULL,
    quantity INT DEFAULT 1,
    lineTotal DECIMAL(10,2),

    FOREIGN KEY (idReceipt) REFERENCES Receipt(idReceipt),
    FOREIGN KEY (idTicket) REFERENCES Ticket(idTicket),

    -- Constraint to enforce composition: 
    -- A specific Ticket can only appear once per Receipt
    CONSTRAINT UQ_TicketPerReceipt UNIQUE (idReceipt, idTicket)
);

How Senior Engineers Fix It

Senior engineers prioritize flexibility and referential integrity over mimicking UML shapes.

  1. Use Surrogate Primary Keys: Always prefer a single, independent integer/UUID column (e.g., idReceiptItem) as the Primary Key. Do not use Foreign Keys as Primary Keys.
  2. Enforce Composition via Unique Constraints: If the business rule is “A receipt cannot contain the same ticket twice,” use a UNIQUE constraint on (idReceipt, idTicket). This enforces the business rule without tying the physical data structure to it.
  3. Explicit Foreign Keys: Always define ON DELETE behaviors. For composition (strong dependency), ON DELETE CASCADE is usually appropriate for ReceiptItem if the Receipt is deleted.
  4. Normalize Identifiers: Do not store JMBGSeller or JMBGFan inside ReceiptItem or Ticket if they are already in Receipt. Use joins.

Why Juniors Miss It

  • UML Rigidity: Juniors often believe the database schema must look exactly like the UML class diagram.
  • Misunderstanding “Identity”: They confuse the identity of an object (which should be immutable and simple) with its relationship to other objects.
  • Fear of Nulls: Juniors sometimes try to avoid nullable columns by nesting everything into one table (denormalization), not realizing that proper joins handle optional relationships better.
  • Lack of Experience with Scale: They don’t realize that composite keys (A, B, C) break down performance when joined with tables that need to reference that key.