Load Data into PostgreSQL: bcp, COPY, pgloader, ETL

Moving data sounds like the easiest phase of a migration: the table is there, pour the data in, done. Until the first larger table takes a whole afternoon — because it runs row by row via INSERT instead of in one sweep via COPY. Anyone who needs to load data into PostgreSQL has four routes to choose from, and the choice of method decides between hours and minutes.

Unlike data types and schema, there’s no translation table here but a decision — which tool fits your data volume, downtime tolerance, and transformation needs.

The essentials up front:

Four methods: bcp+COPY (the flatfile bridge), pgloader (the direct automaton from MSSQL), ETL/SSIS (transformation in transit), and the manual CSV bridge for small tables.
The choice comes down to data volume × downtime tolerance × heterogeneity — not taste.
COPY beats INSERT by roughly an order of magnitude: drop indexes before the load, rebuild them after, don’t forget ANALYZE.
After the load come the easily forgotten steps: sequence reset, constraint revalidation, statistics.

Prerequisite: SQL Server as the source, PostgreSQL 14+ as the target. bcp, BULK INSERT, and SSIS may be assumed as SQL Server knowledge; the Postgres and tool counterparts (COPY, \copy, pgloader) are introduced on first mention. That the target schema already exists is assumed here — the DDL side is covered by the sister article on schema migration, the pure column-type mapping by the article on data type mapping.

Load Data into PostgreSQL: the Four Routes
Decision Matrix: Volume, Downtime, Heterogeneity
COPY Beats INSERT — the Load Performance Levers
What pgloader Handles — and Where It Ends
After the Load: Don’t Forget
FAQ
Related Articles

Load Data into PostgreSQL: the Four Routes

The transfer methods boil down to four patterns. They don’t differ in the result, but in effort, in control, and in how much they do automatically.

bcp + COPY — the flatfile bridge. SQL Server exports with bcp … queryout into a text file, PostgreSQL reads it back in with COPY … FROM. Maximum control over format, delimiter, and encoding, at the cost of two steps and an intermediate file. The robust classic for homogeneous bulk transfers, when you want to know exactly what happens.
pgloader — the direct automaton. A single command reads straight from the MSSQL database, creates the target tables, casts the types, and loads the data — no intermediate file. The 80-percent route for standard cases; where it hits limits is covered below.
ETL (SSIS or custom) — transformation in transit. When the data has to change shape during the move (splitting columns, normalizing values, merging several sources), an ETL process is the right tool. That’s no longer a plain transfer — it’s processing that happens to switch target systems.
Manual CSV bridge — for small tables. Export via the SSMS GUI or a SELECT query into a CSV, import via \copy in psql. For master data, lookup tables, and ad-hoc moves under a few thousand rows it’s perfectly sufficient — and done in five minutes.

The first two methods cover the bulk of real migrations. ETL enters once “copy” turns into “rebuild”; the CSV bridge is the pragmatic shortcut for anything small.

Decision Matrix: Volume, Downtime, Heterogeneity

The choice of method hangs on three axes: how much data, how much standstill you can tolerate, and whether the data has to be reshaped in transit. Almost always, a clear recommendation falls out.

Situation	Data volume	Heterogeneity	Recommendation
Master data, lookups, ad-hoc	small (< ~10,000 rows)	1:1	manual CSV bridge (`\copy`)
Standard bulk, same structure	medium to large	1:1	`pgloader` (schema + cast + load in one run)
Full control over format/encoding	any	1:1	`bcp` + `COPY` (flatfile bridge)
Data must be reshaped	any	transformation needed	ETL (SSIS / custom)
Database must keep running	any	any	incremental / low-downtime (separate article coming)

Two reading aids for the table. First: the downtime axis is only roughly captured here. All four methods assume a downtime window in which the source is no longer written to — otherwise you migrate an inconsistent state. Real migration on a running system (logical replication, CDC, dual-write) is its own, larger topic and gets a separate article. Second: the rows aren’t mutually exclusive. A real migration often uses pgloader for the bulk and the CSV bridge for three special tables the automaton doesn’t get right.

COPY Beats INSERT — the Load Performance Levers

Whatever the method: when loading, a single mechanism decides minutes versus hours. A single INSERT is its own small transaction with parsing, planning, and a WAL entry. A million of them is that overhead a million times over. COPY, by contrast, streams the rows into the table in one sweep — the same data often lands in the target roughly an order of magnitude faster. So anyone looking to load data into PostgreSQL uses COPY, not generated INSERT statements.

But COPY alone isn’t enough. Three more levers speed up the load noticeably:

Drop indexes and foreign keys before the load, rebuild them after. Every index has to be maintained per inserted row, every foreign-key check run per row — at millions of rows, both are costlier than a single build at the end over the finished data set. NOT NULL and CHECK constraints, by contrast, you usually leave in place; they cost little during the load and catch broken rows early. Caution with the incremental load: there, drop-and-recreate doesn’t pay off, because the recreate time runs over the whole data set, not just the new rows.
UNLOGGED staging table. An UNLOGGED table writes no WAL and therefore loads markedly faster — the price is that its contents don’t survive a crash. As a staging table for the import, that’s ideal: fill it via COPY first, then, after checking, move it into the real (logged) target table. That this second step writes WAL again is priced in — the expensive raw load ran WAL-free, and the copy-over happens once and in bulk.
Raise maintenance_work_mem, don’t forget ANALYZE. A generous maintenance_work_mem (around 512MB, depending on available RAM) speeds up the subsequent index build considerably. And after the load the planner needs fresh statistics — without ANALYZE, PostgreSQL plans the first queries based on an empty table and picks catastrophic plans.

Assembled as a pattern, the fast load looks like this:

  1: -- 1. Create a staging table without WAL overhead (UNLOGGED).
  2: CREATE UNLOGGED TABLE staging_sales (LIKE sales INCLUDING DEFAULTS);
  3:
  4: -- 2. Bulk-load via COPY instead of many INSERTs.
  5: COPY staging_sales FROM '/var/lib/postgresql/import/sales.csv'
  6:    WITH (FORMAT csv, HEADER);
  7:
  8: -- 3. Build indexes only after the load — once, not per row.
  9: CREATE INDEX ix_sales_customer ON staging_sales (customer_id);
 10:
 11: -- 4. Refresh planner statistics, or bad plans loom.
 12: ANALYZE staging_sales;

  1: -- 1. Create a staging table without WAL overhead (UNLOGGED).
  2: CREATE UNLOGGED TABLE staging_sales (LIKE sales INCLUDING DEFAULTS);
  3:
  4: -- 2. Bulk-load via COPY instead of many INSERTs.
  5: COPY staging_sales FROM '/var/lib/postgresql/import/sales.csv'
  6:    WITH (FORMAT csv, HEADER);
  7:
  8: -- 3. Build indexes only after the load — once, not per row.
  9: CREATE INDEX ix_sales_customer ON staging_sales (customer_id);
 10:
 11: -- 4. Refresh planner statistics, or bad plans loom.
 12: ANALYZE staging_sales;

A word on the file path that otherwise costs an hour of troubleshooting: COPY … FROM 'file' runs server-side — the file must reside on the database server and be readable by the Postgres process. If you load from your own workstation, you use \copy in psql instead; it reads the file client-side and sends it over the connection. Same syntax, but \copy needs no server access to the file system. The mix-up is the most common stumbling block on a first flatfile load.

The flatfile bridge itself consists of two commands. On the SQL Server side, bcp exports:

powershell

bcp "SELECT customer_id, region_id, email, credit_limit, created_at FROM dbo.customer" `
    queryout customer.csv -c -t"|" -C 65001 -S localhost -d SourceDb -T

powershell

bcp "SELECT customer_id, region_id, email, credit_limit, created_at FROM dbo.customer" `
    queryout customer.csv -c -t"|" -C 65001 -S localhost -d SourceDb -T

On the PostgreSQL side, COPY reads it back in:

  1: -- Server-side: the file must reside on the DB server.
  2: COPY customer (customer_id, region_id, email, credit_limit, created_at)
  3: FROM '/var/lib/postgresql/import/customer.csv'
  4:    WITH (FORMAT csv, NULL '', ENCODING 'UTF8', DELIMITER '|');

  1: -- Server-side: the file must reside on the DB server.
  2: COPY customer (customer_id, region_id, email, credit_limit, created_at)
  3: FROM '/var/lib/postgresql/import/customer.csv'
  4:    WITH (FORMAT csv, NULL '', ENCODING 'UTF8', DELIMITER '|');

There are three spots where the flatfile bridge keeps tipping over:

Encoding — bcp -w writes UTF-16, which COPY with ENCODING 'UTF8' can’t read; hence -c -C 65001 for UTF-8 above.
NULL marker — bcp writes an empty string for NULL, which COPY has to interpret back as NULL via NULL ''.
Delimiter inside text fields — a comma as the separator tips over the moment an address itself contains a comma; hence | instead of , above.

What pgloader Handles — and Where It Ends

pgloader is the tool that takes the most manual work off your hands. A single command — or a small configuration file — connects to the MSSQL source and the PostgreSQL target database and does, in one run, depending on the enabled WITH options: create the target tables, cast the data types, load the data via COPY, build indexes and foreign keys, and — the bonus over manual work — reset the sequences.

A typical configuration file looks like this:

LOAD DATABASE
     FROM     mssql://migrator@mssql-host/SourceDb
     INTO postgresql://migrator@pg-host/targetdb

 WITH include drop,
      create tables,
      create indexes,
      reset sequences,
      foreign keys,
      downcase identifiers,
      workers = 8, concurrency = 1

  SET maintenance_work_mem to '512MB',
      work_mem to '64MB'

 CAST type datetime to timestamptz,
      type money    to numeric;

LOAD DATABASE
     FROM     mssql://migrator@mssql-host/SourceDb
     INTO postgresql://migrator@pg-host/targetdb

 WITH include drop,
      create tables,
      create indexes,
      reset sequences,
      foreign keys,
      downcase identifiers,
      workers = 8, concurrency = 1

  SET maintenance_work_mem to '512MB',
      work_mem to '64MB'

 CAST type datetime to timestamptz,
      type money    to numeric;

The WITH options are the strength: reset sequences handles the sequence reset that’s so easy to forget by hand; downcase identifiers lowercases the identifiers (CustomerOrders becomes customerorders — not automatic snake_case, but exactly the quoting-free lowercase Postgres idiomatically expects); include drop makes the run repeatable. The CAST rules step in where the default conversion doesn’t fit.

So much for the happy path. There are, however, four spots where pgloader hits its limits — things it simply isn’t made for:

Procedural logic is left behind. Stored procedures, functions, and triggers pgloader does not migrate. That’s a discipline of its own and the topic of the article on code porting.
Type edge cases need their own CAST rules. The tricky conversions — money, datetime2, uniqueidentifier, bit — the automation doesn’t always get the way you want. Which types convert cleanly and which break is covered by the article on data type mapping.
Performance tuning stays manual. pgloader loads serviceably, but the fine-tuning of very large tables (partitioning, parallel workers, storage parameters) it won’t do for you.
Complex schema decisions it ignores. The IDENTITY mechanism, named constraints, or the loss of the clustered index pgloader decides by rote — the deliberate decisions there belong to schema migration.

In short: pgloader is an excellent starting point, not a complete solution. For the bulk of homogeneous tables it saves hours; the demanding 20 percent stay manual.

After the Load: Don’t Forget

The load is through, the rows are in — and here is where many migrations stop one step too early. Four follow-ups belong to the job:

Sequence reset. If you loaded the old key values along, you have to lift the identity counter to the highest assigned value, or the first production INSERT collides. pgloader does this automatically with reset sequences; with bcp+COPY and the CSV bridge it’s manual. The mechanism (setval(pg_get_serial_sequence(...))) and the trap behind it are described in detail in the article on schema migration.
Revalidate constraints. If foreign keys were disabled or created NOT VALID for the fast load, they then have to be checked with ALTER TABLE … VALIDATE CONSTRAINT … — otherwise the database carries unverified references.
ANALYZE at the database level. Run ANALYZE (or vacuumdb --analyze) once over the whole loaded data set, so the planner has fresh statistics for all tables.
A first plausibility check. A quick row reconciliation (count(*) source against target) catches gross load errors right away. The real, dependable verification — checksums, sampling, data quality — is its own step and gets its own article.

FAQ

pgloader or bcp + COPY — which should I take?

For the bulk of homogeneous tables, pgloader: it creates the schema, casts the types, loads, and resets the sequences — in one run. bcp + COPY pays off when you need full control over format, delimiter, and encoding, or when the source isn’t directly reachable and a flatfile arises anyway. Often you combine the two: pgloader for 80 percent, the flatfile bridge for the special cases.

How much faster is COPY than INSERT?

Typically by an order of magnitude — on large tables often more. The reason: each individual INSERT carries the overhead of parsing, planning, and writing WAL, while COPY streams the rows in one go. Replaying a dump of millions of individual INSERT statements means waiting needlessly; the same data as COPY is loaded in a fraction of the time.

Do I need downtime for the migration?

For the simple route, yes: a downtime window in which the source is no longer written to guarantees a consistent state. How long the window has to be depends on data volume and method. Migrating on a running system (logical replication, CDC, dual-write) cuts downtime to near zero, but is considerably more involved — that’s its own topic and gets a separate article.

COPY or \copy — what’s the difference?

COPY is a SQL command and runs server-side: the file path refers to the database server’s file system, and the Postgres process must be allowed to read the file. \copy is a psql meta-command and runs client-side: it reads the file on your machine and sends it over the connection. If you see “Permission denied” or “No such file” on COPY even though the file is right there locally, you almost always wanted \copy.

What do I do with very large tables (hundreds of millions of rows)?

Pull the load into an UNLOGGED staging table without indexes, then build indexes and constraints once over the finished data set and run ANALYZE. Set maintenance_work_mem generously for the index build. If the source can be split into key ranges, you can run several COPY streams in parallel. And plan realistically: at this size, the index recreate time is its own significant block in the overall budget.

This article is part of a series on migrating SQL Server to PostgreSQL. The remaining parts:

Overview: Data Migration SQL Server to PostgreSQL — the Complete Guide
Data types: Data Type Mapping SQL Server → PostgreSQL — What Converts Cleanly and What Breaks
Schema: Schema Migration SQL Server → PostgreSQL — Identity, Constraints, Defaults, Sequences
Code porting: Porting T-SQL to PL/pgSQL — Migrating Procedures and Functions
Verification: Verifying the Migration — Data Quality and Row Reconciliation
Low-downtime: Cutover strategies for a running database (coming)

On data transfer as an ETL step:

Transferring Data: bcp, COPY, pgloader, ETL — Which Method When

Contents

Load Data into PostgreSQL: the Four Routes

Decision Matrix: Volume, Downtime, Heterogeneity

COPY Beats INSERT — the Load Performance Levers

What pgloader Handles — and Where It Ends

After the Load: Don’t Forget

FAQ

Contents

Load Data into PostgreSQL: the Four Routes

Decision Matrix: Volume, Downtime, Heterogeneity

COPY Beats INSERT — the Load Performance Levers

What pgloader Handles — and Where It Ends

After the Load: Don’t Forget

FAQ

Related Articles