Schema Migration SQL Server → PostgreSQL — Identity, Constraints, Defaults, Sequences

SQL Server to PostgreSQL schema migration looks finished the moment the CREATE TABLE script runs without an error. That is exactly when the real trouble begins: the table is there, the data is loaded — and the first INSERT that should hand out a new ID collides with an existing key. The reason is not a typo but a change of mechanism. IDENTITY becomes a sequence in PostgreSQL, and after a data load with explicit key values that sequence still counts from 1.

The schema is the layer where SQL Server and PostgreSQL look alike enough to inspire trust, and differ enough to betray it. This article shows which DDL objects move across cleanly and which ones demand a deliberate decision.

The key points up front:

  • IDENTITY → GENERATED/sequence: the translation is easy — the sequence reset after the data load is the trap almost everyone hits once.
  • Constraints and defaults: CHECKUNIQUE and FOREIGN KEY move across practically 1:1; SQL Server’s auto-generated names (DF__…) should disappear in the process.
  • The missing clustered index: PostgreSQL tables are heaps — the biggest mental break for SQL Server converts.
  • The rest: computed columns → GENERATED … STOREDdbo → public, and identifier case.

Prerequisite: SQL Server 2017+ as the source, PostgreSQL 14+ as the target. PostgreSQL constructs are introduced briefly on first mention — Postgres background is not required, T-SQL basics are. The pure column-type mapping (which data type becomes what) is a topic of its own and covered in the sister article on data type mapping; here we look at the schema objects around it.

Contents

SQL Server to PostgreSQL Schema Migration: What Changes in the Schema

Before we get into the objects, the overview. Most schema components have a direct counterpart in PostgreSQL — a SQL Server to PostgreSQL schema migration does not fail on the majority, but on a handful of mechanisms that work differently than their name suggests.

SQL Server objectPostgreSQL counterpartAttention
PRIMARY KEY · UNIQUE · CHECK · FOREIGN KEYsame namelow — almost 1:1 portable
IDENTITY(1, 1)GENERATED … AS IDENTITY (sequence)high — sequence reset after load
named DEFAULT constraint (DF__…)column default (unnamed)medium — the name is dropped
GETDATE() · NEWID() defaultsnow() · gen_random_uuid()medium — translate the expression
clustered index (PK physically sorted)heap + B-tree indexhigh — conceptual break
computed columnGENERATED ALWAYS AS (…) STOREDmedium — STORED (virtual since PG 18)
filegrouptablespace (usually drop)low
schema dbopublic or a named schemalow — map the topology

The dividing line does not run between “works” and “does not work”, but between “copies itself” and “needs a deliberate decision”. The following sections walk exactly the rows marked high and medium.

IDENTITY → GENERATED and the Sequence Reset

In SQL Server, IDENTITY(1, 1) hands out consecutive keys automatically. PostgreSQL knows two ways to get there, and the choice matters.

The historical way is serial (or bigserial): a shorthand that creates a sequence behind the scenes and wires it up as the column default. The modern way, available since PostgreSQL 10, is the SQL standard GENERATED … AS IDENTITY — and that is the one to prefer today. It binds the sequence firmly to the column (it disappears together with the table), tracks it cleanly in the catalog, and avoids the permission and ownership pitfalls that serial brings along with its loosely coupled sequence.

That leaves the question of ALWAYS versus BY DEFAULT:

  • GENERATED ALWAYS AS IDENTITY forbids explicit values — an INSERT that fills the column itself fails (unless you add OVERRIDING SYSTEM VALUE). Clean for day-to-day operation, but impractical for the migration load, which carries exactly the old key values.
  • GENERATED BY DEFAULT AS IDENTITY allows explicit values and leaves the automatic behaviour only to the INSERTs that omit the column. That is the right choice for the migration: the load writes the existing IDs unchanged, later operation gets them automatically.

And that brings us to the actual trap. An INSERT with an explicit key value leaves the sequence untouched — it does not count along. After a load that writes keys 1 through 5000 directly, the sequence still sits at its start value 1. The next automatic INSERT therefore draws 1 — and runs into a primary-key collision. That is why every load with explicit IDs has to be followed by a sequence reset to the highest assigned value:

  1: -- After the load with explicit customer_id values, lift the identity counter
  2: -- to the highest assigned value:
  3: SELECT setval(
  4:     pg_get_serial_sequence('customer', 'customer_id')
  5:    ,(SELECT max(customer_id) FROM customer)
  6: );

pg_get_serial_sequence resolves the internal name of the sequence that belongs to the column — that saves you from guessing the automatically assigned sequence name. The step is needed once per table with an identity column; forget it, and the error is not a silent one but a loud one on the first production INSERT. (If the target table is empty after the load, max(...) returns NULL — then use setval(..., 1, false) so that allocation starts cleanly at 1.)

The SCOPE_IDENTITY() reflex from T-SQL — reading back the key just assigned — is replaced in PostgreSQL by the RETURNING clause directly on the INSERT. How that looks when porting procedures is covered in the article on code porting.

Constraints and Defaults

Here the news is mostly good. PRIMARY KEYUNIQUECHECK and FOREIGN KEY have the same syntax and the same semantics in PostgreSQL as in SQL Server, including the referential actions (ON DELETE CASCADEON UPDATE …). A condition named with CONSTRAINT … moves across verbatim.

Three points deserve attention:

  • Get rid of the auto-generated names. SQL Server assigns generated names like DF__customer__credi__1A2B3C or PK__customer__… for unnamed constraints. Porting is the moment to switch these to speaking names (df_customer_credit_limitpk_customer). If you carry the generated names over, the first later ALTER TABLE … DROP CONSTRAINT is a guessing game.
  • Defaults have no name of their own in PostgreSQL. In SQL Server a DEFAULT is a named constraint object; in PostgreSQL the default is simply a property of the column (ALTER TABLE … ALTER COLUMN … SET DEFAULT …). The default value moves across, the named default constraint as a separate object does not exist in the target — that is no loss, just different bookkeeping.
  • Translate the default expressions. The common SQL Server defaults have direct counterparts: GETDATE() → now() (or current_timestamp), GETUTCDATE() → now() AT TIME ZONE 'utc'NEWID() → gen_random_uuid(). Constant defaults (0''1) move across unchanged. The detailed reasoning on the type-related defaults lives in the data type article.

One detail in the other direction: PostgreSQL can mark constraints DEFERRABLE INITIALLY DEFERRED — their check is then postponed to the end of the transaction. SQL Server has no equivalent. For cyclic foreign keys or bulk reordering it is a useful tool worth keeping in mind along the way.

The Missing Clustered Index

This is where PostgreSQL departs most from what you are used to in SQL Server. In SQL Server the clustered index determines the physical ordering of the table: the data sits in key order on disk, and the primary key is the clustered index by default. Each table has at most one.

PostgreSQL does not have this concept. A table is a heap — an unsorted collection of rows in insertion order. The primary key is a perfectly ordinary B-tree index that points into the heap; it enforces no physical order. There is the command CLUSTER table USING index, which physically reorders the table once by an index — but it is not a self-maintaining property: new rows still land at the end of the heap, and the order decays until you run CLUSTER again.

Two things follow for index planning:

  • Do not dismiss it as “not needed”. The clustered index has real performance effects in SQL Server (range scans along the physical order). When it goes away, you plan indexes in PostgreSQL deliberately anew — by the actual query patterns, not by the old physical sorting.
  • What PostgreSQL gives you in return. Expression indexes (… (lower(email))) exist directly — in SQL Server that only works via the detour of a computed column. Partial indexes (… WHERE status = 'active') correspond to SQL Server’s filtered indexes, and covering indexes with INCLUDE exist in both engines (in PostgreSQL since version 11 — so no longer a new feature). The loss of the clustered index is therefore not a pure loss, but a different toolbox.

Computed Columns, Schemas and Other DDL Quirks

The remaining DDL quirks are individually small, but together they are exactly the spots where a naive converter stumbles.

  • computed column → GENERATED … STORED (virtual since PostgreSQL 18). A computed column like AS (price * quantity) becomes GENERATED ALWAYS AS (price * quantity) STORED. Up to PostgreSQL 17 — which is what most production systems run today — STORED (persisted) is the only variant; the non-persisted “virtual” from SQL Server (computed only at query time) could only be reproduced via a VIEW. The young PostgreSQL 18 has added virtual generated columns and even makes them the default — if you are already on 18, you get both SQL Server variants directly.
  • filegroup → tablespace. SQL Server filegroups mostly vanish into thin air during a migration — the default tablespace is almost always enough. A dedicated TABLESPACE only pays off when there is genuinely a separate disk or storage class behind it.
  • dbo → public or a named schema. SQL Server’s default schema dbo functionally corresponds to PostgreSQL’s public schema. If you like order, you instead create named schemas (appstaging) and set the search_path — the target schema idiom is laid out in the article on Postgres table conventions.
  • Identifier case. PostgreSQL folds unquoted identifiers to lowercase; names placed in double quotes ("CustomerID") keep their spelling and must then always be quoted that way. The clean consequence is snake_case without quoting — why that is not a matter of taste is covered in the sister article on case sensitivity of identifiers.

A Concrete CREATE TABLE — Before and After

To make the translation tangible, a table that bundles IDENTITY, a money default, a CHECK, a UNIQUE and a FOREIGN KEY all at once. First the source in T-SQL:

  1: CREATE TABLE dbo.customer
  2: (
  3:     customer_id    int            IDENTITY(1, 1)  NOT NULL
  4:    ,region_id      int            NOT NULL
  5:    ,email          nvarchar(256)  NOT NULL
  6:    ,credit_limit   money          NOT NULL  CONSTRAINT df_customer_limit   DEFAULT 0
  7:    ,created_at     datetime       NOT NULL  CONSTRAINT df_customer_created DEFAULT GETDATE()
  8:    ,CONSTRAINT pk_customer        PRIMARY KEY (customer_id)
  9:    ,CONSTRAINT uq_customer_email  UNIQUE (email)
 10:    ,CONSTRAINT ck_customer_limit  CHECK (credit_limit >= 0)
 11:    ,CONSTRAINT fk_customer_region FOREIGN KEY (region_id) REFERENCES dbo.region (region_id)
 12: );

And the same as a PostgreSQL target:

  1: CREATE TABLE customer
  2: (
  3:     customer_id    integer         GENERATED BY DEFAULT AS IDENTITY
  4:    ,region_id      integer         NOT NULL
  5:    ,email          text            NOT NULL
  6:    ,credit_limit   numeric(19, 4)  NOT NULL  DEFAULT 0
  7:    ,created_at     timestamptz     NOT NULL  DEFAULT now()
  8:    ,CONSTRAINT pk_customer        PRIMARY KEY (customer_id)
  9:    ,CONSTRAINT uq_customer_email  UNIQUE (email)
 10:    ,CONSTRAINT ck_customer_limit  CHECK (credit_limit >= 0)
 11:    ,CONSTRAINT fk_customer_region FOREIGN KEY (region_id) REFERENCES region (region_id)
 12: );

What changed, line by line:

  • Line 3: int IDENTITY(1, 1) → integer GENERATED BY DEFAULT AS IDENTITY — the auto key onto the SQL standard mechanism, BY DEFAULT for the sake of the migration load.
  • Line 6: money → numeric(19, 4) (type detail in the data type article); the named default constraint df_customer_limit becomes the plain column default DEFAULT 0.
  • Line 7: datetime → timestamptzGETDATE() → now(); here too the default loses its constraint name.
  • Lines 8–11: PRIMARY KEYUNIQUECHECK and FOREIGN KEY move across verbatim — the dbo. qualification of the referenced table drops away (schema public or search_path).

No converter that only swaps column types makes the decision between ALWAYS and BY DEFAULT or remembers the sequence reset — that stays manual work.

FAQ

What does IDENTITY become in PostgreSQL?

An identity column backed by a sequence: GENERATED BY DEFAULT AS IDENTITY (for the migration load) or GENERATED ALWAYS AS IDENTITY (for strict operation). The older serial way still works, but GENERATED … AS IDENTITY is the SQL standard and, since PostgreSQL 10, the recommended form, because the sequence is bound firmly to the column.

Why does my ID allocation start over at 1 after the import?

Because an INSERT with an explicit key value does not advance the sequence. After a load with the old IDs the sequence still sits at its start value. The next automatic INSERT therefore collides with an existing key. Fix: SELECT setval(pg_get_serial_sequence('table', 'column'), (SELECT max(column) FROM table)) — once per table with an identity column.

Do I get gap-free IDs after the migration?

No — and that was already the case in SQL Server. Both IDENTITY/sequences there and PostgreSQL sequences reserve values in advance (cache) and do not return them on a crash; rolled-back transactions consume their value too. The result is gaps, never duplicate keys — the counter jumps forward when in doubt. In practice PostgreSQL is even tamer: the default CACHE 1 reserves only one value, whereas SQL Server’s IDENTITY can jump by up to 1000 after an unclean restart. If you need consecutive, gap-free numbers (invoice numbers, say), you must not rely on the identity column either there or here.

Does PostgreSQL have clustered indexes?

No. PostgreSQL tables are heaps; the primary key is an ordinary B-tree index, not a physical sorting. The CLUSTER command reorders a table once by an index but does not keep that order — new rows land at the end of the heap. Indexes in PostgreSQL are planned by the query patterns, not by physical order.

What happens to my filegroups?

Mostly nothing — they go away. PostgreSQL’s default tablespace is almost always enough. Only when there is genuinely a separate disk or storage class behind a filegroup do you reproduce it with a named TABLESPACE.

Does the dbo schema survive?

Functionally dbo corresponds to PostgreSQL’s public schema — so you can put everything into public. It is often cleaner to create named schemas (appstaging) and set the search_path. Keeping the name dbo 1:1 is possible but uncommon.

This article is part of a series on migrating from SQL Server to PostgreSQL. The remaining parts follow as their own articles:

On the target schema idiom in PostgreSQL: