Moving data sounds like the easiest phase of a migration: the table is there, pour the data in, done. Until the first larger table takes a whole afternoon — because it runs row by row via INSERT instead of in one sweep via COPY. Anyone who needs to load data into PostgreSQL has four routes to choose from, and the choice of method decides between hours and minutes.
Unlike data types and schema, there’s no translation table here but a decision — which tool fits your data volume, downtime tolerance, and transformation needs.
The essentials up front:
- Four methods:
bcp+COPY(the flatfile bridge),pgloader(the direct automaton from MSSQL), ETL/SSIS (transformation in transit), and the manual CSV bridge for small tables. - The choice comes down to data volume × downtime tolerance × heterogeneity — not taste.
COPYbeatsINSERTby roughly an order of magnitude: drop indexes before the load, rebuild them after, don’t forgetANALYZE.- After the load come the easily forgotten steps: sequence reset, constraint revalidation, statistics.
Prerequisite: SQL Server as the source, PostgreSQL 14+ as the target. bcp, BULK INSERT, and SSIS may be assumed as SQL Server knowledge; the Postgres and tool counterparts (COPY, \copy, pgloader) are introduced on first mention. That the target schema already exists is assumed here — the DDL side is covered by the sister article on schema migration, the pure column-type mapping by the article on data type mapping.
Contents
- Load Data into PostgreSQL: the Four Routes
- Decision Matrix: Volume, Downtime, Heterogeneity
- COPY Beats INSERT — the Load Performance Levers
- What pgloader Handles — and Where It Ends
- After the Load: Don’t Forget
- FAQ
- Related Articles
Load Data into PostgreSQL: the Four Routes
The transfer methods boil down to four patterns. They don’t differ in the result, but in effort, in control, and in how much they do automatically.
bcp+COPY— the flatfile bridge. SQL Server exports withbcp … queryoutinto a text file, PostgreSQL reads it back in withCOPY … FROM. Maximum control over format, delimiter, and encoding, at the cost of two steps and an intermediate file. The robust classic for homogeneous bulk transfers, when you want to know exactly what happens.pgloader— the direct automaton. A single command reads straight from the MSSQL database, creates the target tables, casts the types, and loads the data — no intermediate file. The 80-percent route for standard cases; where it hits limits is covered below.- ETL (SSIS or custom) — transformation in transit. When the data has to change shape during the move (splitting columns, normalizing values, merging several sources), an ETL process is the right tool. That’s no longer a plain transfer — it’s processing that happens to switch target systems.
- Manual CSV bridge — for small tables. Export via the SSMS GUI or a
SELECTquery into a CSV, import via\copyin psql. For master data, lookup tables, and ad-hoc moves under a few thousand rows it’s perfectly sufficient — and done in five minutes.
The first two methods cover the bulk of real migrations. ETL enters once “copy” turns into “rebuild”; the CSV bridge is the pragmatic shortcut for anything small.
Decision Matrix: Volume, Downtime, Heterogeneity
The choice of method hangs on three axes: how much data, how much standstill you can tolerate, and whether the data has to be reshaped in transit. Almost always, a clear recommendation falls out.
| Situation | Data volume | Heterogeneity | Recommendation |
|---|---|---|---|
| Master data, lookups, ad-hoc | small (< ~10,000 rows) | 1:1 | manual CSV bridge (\copy) |
| Standard bulk, same structure | medium to large | 1:1 | pgloader (schema + cast + load in one run) |
| Full control over format/encoding | any | 1:1 | bcp + COPY (flatfile bridge) |
| Data must be reshaped | any | transformation needed | ETL (SSIS / custom) |
| Database must keep running | any | any | incremental / low-downtime (separate article coming) |
Two reading aids for the table. First: the downtime axis is only roughly captured here. All four methods assume a downtime window in which the source is no longer written to — otherwise you migrate an inconsistent state. Real migration on a running system (logical replication, CDC, dual-write) is its own, larger topic and gets a separate article. Second: the rows aren’t mutually exclusive. A real migration often uses pgloader for the bulk and the CSV bridge for three special tables the automaton doesn’t get right.
COPY Beats INSERT — the Load Performance Levers
Whatever the method: when loading, a single mechanism decides minutes versus hours. A single INSERT is its own small transaction with parsing, planning, and a WAL entry. A million of them is that overhead a million times over. COPY, by contrast, streams the rows into the table in one sweep — the same data often lands in the target roughly an order of magnitude faster. So anyone looking to load data into PostgreSQL uses COPY, not generated INSERT statements.
But COPY alone isn’t enough. Three more levers speed up the load noticeably:
- Drop indexes and foreign keys before the load, rebuild them after. Every index has to be maintained per inserted row, every foreign-key check run per row — at millions of rows, both are costlier than a single build at the end over the finished data set.
NOT NULLandCHECKconstraints, by contrast, you usually leave in place; they cost little during the load and catch broken rows early. Caution with the incremental load: there, drop-and-recreate doesn’t pay off, because the recreate time runs over the whole data set, not just the new rows. UNLOGGEDstaging table. AnUNLOGGEDtable writes no WAL and therefore loads markedly faster — the price is that its contents don’t survive a crash. As a staging table for the import, that’s ideal: fill it viaCOPYfirst, then, after checking, move it into the real (logged) target table. That this second step writes WAL again is priced in — the expensive raw load ran WAL-free, and the copy-over happens once and in bulk.- Raise
maintenance_work_mem, don’t forgetANALYZE. A generousmaintenance_work_mem(around512MB, depending on available RAM) speeds up the subsequent index build considerably. And after the load the planner needs fresh statistics — withoutANALYZE, PostgreSQL plans the first queries based on an empty table and picks catastrophic plans.
Assembled as a pattern, the fast load looks like this:
1: -- 1. Create a staging table without WAL overhead (UNLOGGED).
2: CREATE UNLOGGED TABLE staging_sales (LIKE sales INCLUDING DEFAULTS);
3:
4: -- 2. Bulk-load via COPY instead of many INSERTs.
5: COPY staging_sales FROM '/var/lib/postgresql/import/sales.csv'
6: WITH (FORMAT csv, HEADER);
7:
8: -- 3. Build indexes only after the load — once, not per row.
9: CREATE INDEX ix_sales_customer ON staging_sales (customer_id);
10:
11: -- 4. Refresh planner statistics, or bad plans loom.
12: ANALYZE staging_sales;
A word on the file path that otherwise costs an hour of troubleshooting: COPY … FROM 'file' runs server-side — the file must reside on the database server and be readable by the Postgres process. If you load from your own workstation, you use \copy in psql instead; it reads the file client-side and sends it over the connection. Same syntax, but \copy needs no server access to the file system. The mix-up is the most common stumbling block on a first flatfile load.
The flatfile bridge itself consists of two commands. On the SQL Server side, bcp exports:
bcp "SELECT customer_id, region_id, email, credit_limit, created_at FROM dbo.customer" ^
queryout customer.csv -c -t"|" -C 65001 -S localhost -d SourceDb -T
On the PostgreSQL side, COPY reads it back in:
1: -- Server-side: the file must reside on the DB server.
2: COPY customer (customer_id, region_id, email, credit_limit, created_at)
3: FROM '/var/lib/postgresql/import/customer.csv'
4: WITH (FORMAT csv, NULL '', ENCODING 'UTF8', DELIMITER '|');
There are three spots where the flatfile bridge keeps tipping over:
- Encoding —
bcp -wwrites UTF-16, whichCOPYwithENCODING 'UTF8'can’t read; hence-c -C 65001for UTF-8 above. - NULL marker —
bcpwrites an empty string forNULL, whichCOPYhas to interpret back asNULLviaNULL ''. - Delimiter inside text fields — a comma as the separator tips over the moment an address itself contains a comma; hence
|instead of,above.
What pgloader Handles — and Where It Ends
pgloader is the tool that takes the most manual work off your hands. A single command — or a small configuration file — connects to the MSSQL source and the PostgreSQL target database and does, in one run, depending on the enabled WITH options: create the target tables, cast the data types, load the data via COPY, build indexes and foreign keys, and — the bonus over manual work — reset the sequences.
A typical configuration file looks like this:
LOAD DATABASE
FROM mssql://migrator@mssql-host/SourceDb
INTO postgresql://migrator@pg-host/targetdb
WITH include drop,
create tables,
create indexes,
reset sequences,
foreign keys,
downcase identifiers,
workers = 8, concurrency = 1
SET maintenance_work_mem to '512MB',
work_mem to '64MB'
CAST type datetime to timestamptz,
type money to numeric;
The WITH options are the strength: reset sequences handles the sequence reset that’s so easy to forget by hand; downcase identifiers lowercases the identifiers (CustomerOrders becomes customerorders — not automatic snake_case, but exactly the quoting-free lowercase Postgres idiomatically expects); include drop makes the run repeatable. The CAST rules step in where the default conversion doesn’t fit.
So much for the happy path. There are, however, four spots where pgloader hits its limits — things it simply isn’t made for:
- Procedural logic is left behind. Stored procedures, functions, and triggers
pgloaderdoes not migrate. That’s a discipline of its own and the topic of the article on code porting. - Type edge cases need their own
CASTrules. The tricky conversions —money,datetime2,uniqueidentifier,bit— the automation doesn’t always get the way you want. Which types convert cleanly and which break is covered by the article on data type mapping. - Performance tuning stays manual.
pgloaderloads serviceably, but the fine-tuning of very large tables (partitioning, parallel workers, storage parameters) it won’t do for you. - Complex schema decisions it ignores. The
IDENTITYmechanism, named constraints, or the loss of the clustered indexpgloaderdecides by rote — the deliberate decisions there belong to schema migration.
In short: pgloader is an excellent starting point, not a complete solution. For the bulk of homogeneous tables it saves hours; the demanding 20 percent stay manual.
After the Load: Don’t Forget
The load is through, the rows are in — and here is where many migrations stop one step too early. Four follow-ups belong to the job:
- Sequence reset. If you loaded the old key values along, you have to lift the identity counter to the highest assigned value, or the first production
INSERTcollides.pgloaderdoes this automatically withreset sequences; withbcp+COPYand the CSV bridge it’s manual. The mechanism (setval(pg_get_serial_sequence(...))) and the trap behind it are described in detail in the article on schema migration. - Revalidate constraints. If foreign keys were disabled or created
NOT VALIDfor the fast load, they then have to be checked withALTER TABLE … VALIDATE CONSTRAINT …— otherwise the database carries unverified references. ANALYZEat the database level. RunANALYZE(orvacuumdb --analyze) once over the whole loaded data set, so the planner has fresh statistics for all tables.- A first plausibility check. A quick row reconciliation (
count(*)source against target) catches gross load errors right away. The real, dependable verification — checksums, sampling, data quality — is its own step and gets its own article.
FAQ
pgloader or bcp + COPY — which should I take? For the bulk of homogeneous tables, pgloader: it creates the schema, casts the types, loads, and resets the sequences — in one run. bcp + COPY pays off when you need full control over format, delimiter, and encoding, or when the source isn’t directly reachable and a flatfile arises anyway. Often you combine the two: pgloader for 80 percent, the flatfile bridge for the special cases.
COPY than INSERT? Typically by an order of magnitude — on large tables often more. The reason: each individual INSERT carries the overhead of parsing, planning, and writing WAL, while COPY streams the rows in one go. Replaying a dump of millions of individual INSERT statements means waiting needlessly; the same data as COPY is loaded in a fraction of the time.
For the simple route, yes: a downtime window in which the source is no longer written to guarantees a consistent state. How long the window has to be depends on data volume and method. Migrating on a running system (logical replication, CDC, dual-write) cuts downtime to near zero, but is considerably more involved — that’s its own topic and gets a separate article.
COPY or \copy — what’s the difference? COPY is a SQL command and runs server-side: the file path refers to the database server’s file system, and the Postgres process must be allowed to read the file. \copy is a psql meta-command and runs client-side: it reads the file on your machine and sends it over the connection. If you see “Permission denied” or “No such file” on COPY even though the file is right there locally, you almost always wanted \copy.
Pull the load into an UNLOGGED staging table without indexes, then build indexes and constraints once over the finished data set and run ANALYZE. Set maintenance_work_mem generously for the index build. If the source can be split into key ranges, you can run several COPY streams in parallel. And plan realistically: at this size, the index recreate time is its own significant block in the overall budget.
Related Articles
This article is part of a series on migrating SQL Server to PostgreSQL. The remaining parts:
- Overview: Data Migration SQL Server to PostgreSQL — the complete guide (coming)
- Data types: Data Type Mapping SQL Server → PostgreSQL — What Converts Cleanly and What Breaks
- Schema: Schema Migration SQL Server → PostgreSQL — Identity, Constraints, Defaults, Sequences
- Code porting: Porting T-SQL to PL/pgSQL (coming)
- Verification: Verifying the migration — data quality and row reconciliation (coming)
- Low-downtime: Cutover strategies for a running database (coming)
On data transfer as an ETL step: