PDM Vault Data Migration Tool

A collection of Python scripts for migrating and managing SOLIDWORKS PDM Professional vault data. The project began as a pair of SQL-to-SQL migration scripts (for folder and file variable values) and has grown to include a suite of PDM API helpers for batch workflow transitions, Copy Tree exports, and interactive SQL tasks.

Overview

The project is split into two layers:

  1. Top-level migration scripts — one-shot SQL migrations between source and target PDM databases (migrate_folderdata.py, migrate_filedata.py, rollback_filedata.py), plus verification utilities (check_var_clashing.py, check_paths.py).
  2. helpers/ toolkit — a newer set of scripts for live vault operations and ad-hoc SQL work:
    • Batch workflow state transitions via the PDM COM API
    • Batch "Copy Tree" exports
    • Interactive SQL helper with named query files and preview-and-confirm on every write

Prerequisites

  • Python 3.x (3.12 tested)
  • SQL Server access to both source and target databases (ODBC Driver 17+)
  • SOLIDWORKS PDM Professional client (for helpers/batch_* scripts only — these use the local ConisioLib.EdmVault COM component)
  • Required packages (see requirements.txt):
    • pyodbc — SQL Server connections
    • pywin32 — PDM COM API via win32com.client / pythoncom
    • comtypes — vtable-level COM calls (used for ChangeState3)

Install dependencies:

pip install -r requirements.txt

Project Structure

data_migration_project/
├── config.json                 # Your real config (DB credentials, mappings)
├── config.json.template        # Template for new installs — copy to config.json
├── db_utils.py                 # Shared SQL Server connection wrapper
├── migrate_folderdata.py       # Folder/project variable value migration
├── migrate_filedata.py         # File variable value migration (latest revision)
├── rollback_filedata.py        # Rolls back a filedata migration
├── check_var_clashing.py       # Finds variable name conflicts before migration
├── check_paths.py              # Verifies folder path mapping
├── requirements.txt            # Python dependencies
├── BATCH_NOTES.md              # Deep-dive on PDM COM ChangeState3 internals
├── README.md                   # This file
│
├── helpers/
│   ├── batch_workflows_paths.py   # Batch workflow state transitions
│   ├── batch_copy_tree.py         # Batch Copy Tree export
│   ├── db_helper.py               # Interactive SQL helper + tasks
│   ├── test_batch_api.py          # Dev-time prototype for PDM COM bridging
│   └── queries/                   # Reusable named SQL queries (.sql files)
│
├── documentation/              # PDM API reference (.chm files)
└── logs/                       # Migration log files

Configuration Setup

1. Create config.json

Copy the template and fill in your real values:

cp config.json.template config.json

The project supports both SQL Authentication and Windows Authentication:

{
  "source_db": {
    "driver": "{ODBC Driver 17 for SQL Server}",
    "server": "your-server",
    "database": "source-database",
    "username": "sql-user",
    "password": "sql-password",
    "trusted_connection": false
  },
  "target_db": {
    "driver": "{ODBC Driver 17 for SQL Server}",
    "server": "your-server",
    "database": "target-database",
    "trusted_connection": true
  }
}
  • Set trusted_connection: true to use Windows Authentication (ignores username/password).
  • Set trusted_connection: false and provide username/password for SQL Server Authentication.

2. Path Mapping (for folder data)

{
  "path_mapping": {
    "target_root_folder": "DWS",
    "case_sensitive": false
  }
}
  • target_root_folder: Root folder name in your target vault. Source folder paths are prepended with this folder name.

3. Migration Settings

{
  "migration": {
    "duplicate_handling": "ignore",
    "batch_size": 500,
    "commit_interval": 10,
    "document_status_batch_size": 5000
  }
}

4. Configuration Mapping Overrides (file data migration only)

{
  "configuration_mapping_overrides": {
    "165": 11250,
    "167": 11359
  }
}

Required when duplicate configuration names exist in the target database. Format: "source_id": target_id.

To find duplicates:

SELECT ConfigurationID, ConfigurationName
FROM DocumentConfiguration
WHERE ConfigurationName IN (
    SELECT ConfigurationName
    FROM DocumentConfiguration
    GROUP BY ConfigurationName
    HAVING COUNT(*) > 1
)
ORDER BY ConfigurationName, ConfigurationID;

Top-Level Migration Scripts

Folder Data Migration

Migrates variable values for folders/projects (DocumentID = 1).

python migrate_folderdata.py

What it does:

  • Maps ProjectIDs based on folder paths
  • Maps VariableIDs based on variable names
  • Migrates all revisions of folder-level variable values
  • Validates the migration
  • Creates mapping CSV files for review

Output files:

  • mapping_projects_{timestamp}.csv
  • mapping_variables_{timestamp}.csv
  • folderdata_migration_{timestamp}.log
  • validation_missing_folderdata_{timestamp}.csv (if issues)

File Data Migration

Migrates variable values for files (ProjectID = 2, DocumentID != 1).

IMPORTANT: Only migrates the latest revision of each variable for each file configuration.

python migrate_filedata.py

What it does:

  • Maps VariableIDs by name, DocumentIDs by full file path, ConfigurationIDs (with manual overrides from config.json if specified)
  • Fetches only the latest revision per VariableID+DocumentID+ConfigurationID
  • Inserts all records with RevisionNo = 1
  • Validates the migration and emits mapping CSVs

You will be prompted to confirm configuration mapping overrides before the migration runs — review carefully.

Output files:

  • mapping_variables_filedata_{timestamp}.csv
  • mapping_documents_filedata_{timestamp}.csv
  • filedata_migration_{timestamp}.log
  • validation_missing_filedata_{timestamp}.csv (if issues)
  • Progress files (auto-deleted on success)

Rollback File Data Migration

python rollback_filedata.py mapping_documents_filedata_YYYYMMDD_HHMMSS.csv

Reads the document mapping CSV from a previous migration and deletes all VariableValue records for those documents from the target database. Shows preview and prompts for confirmation before deleting.

WARNING: This permanently deletes data from the target database. Always back up first.


Helper Scripts (helpers/)

The helpers are live-vault tools that talk to PDM directly via the COM API, plus an interactive SQL runner. They are independent of the top-level migration scripts and can be used any time.

batch_workflows_paths.py — Batch Workflow Transitions

Drives IEdmFile13::ChangeState3 against hundreds or thousands of files at once, transitioning each through a named workflow transition. Implements escalating-backoff retries and vault reconnect to handle PDM's in-process DLL state corruption on large batches.

python helpers/batch_workflows_paths.py -v "Drilling_Test" -c files.csv -t "AA"

Options:

  • -v, --vault — PDM vault name
  • -c, --csv — Path to a text/CSV file with one full vault path per line (e.g. C:\PDM\Drilling_Test\DWS\Parts\widget.sldprt)
  • -t, --transition — Name of the workflow transition (e.g. "AA")
  • --comment — Optional transition comment
  • -u, --username — PDM username (prompts if omitted)

Output files:

  • batch_workflow_paths_{timestamp}.log — detailed log
  • failed_transitions_{timestamp}.txt — real failures worth retrying
  • not_available_{timestamp}.txt — files whose transition wasn't valid (typically already in the target state from a prior run)

For implementation details on the restricted ChangeState3 COM method and why it requires ctypes/comtypes vtable access, see BATCH_NOTES.md.


batch_copy_tree.py — Batch Copy Tree Export

Reads part numbers from a CSV, runs PDM's Copy Tree function for each, and exports each part's file tree to its own subfolder.

python helpers/batch_copy_tree.py -c parts.csv -o "C:\Temp\Output" --vault "Drilling_Test"

db_helper.py — Interactive SQL Helper

Runs SELECT queries, multi-step tasks, and confirmed INSERTs against either database from config.json. Queries are stored as .sql files in helpers/queries/ and referenced by name.

List saved queries:

python helpers/db_helper.py --list-queries

Run a saved query by name:

python helpers/db_helper.py --db target_db --query get_var47

Run raw SQL (anything with a space in it is treated as a literal query):

python helpers/db_helper.py --db target_db --query "SELECT TOP 10 * FROM Documents"

Run a predefined task:

python helpers/db_helper.py --db target_db --task copy_57_to_50 --dry-run
python helpers/db_helper.py --db target_db --task copy_57_to_50

Safety features

  • Every INSERT or UPDATE goes through preview_and_confirm — you see the SQL, the row count, and a sample of the data and must type y before it executes.
  • --dry-run shows the preview but skips execution entirely.
  • All writes run inside a transaction. On any per-row error you're asked whether to commit or rollback.
  • Every query, parameter set, and decision is logged to db_helper_{timestamp}.log.

Saved SQL Queries

Drop a .sql file into helpers/queries/ and it becomes callable by its filename (without extension). Leave a comment on the first line for an inline description — it shows up in --list-queries.

Current queries:

  • DWS_GET_VV-57.sql — Documents in DWS paths that have VariableID=57
  • DWS_VV-57_FullList.sql — Full VariableValue rows for VV-57 in DWS paths
  • Get_All_VV_Per_DocID.sql — All distinct VariableIDs for a given DocumentID (parameterized with ?)
  • INSERT_VV50_Copy.sql — Inserts a VV-50 copy of a VV-57 row

Tasks

Tasks are Python functions in db_helper.py that chain multiple queries and transforms together — e.g. run a SELECT, loop the results, run a second parameterized SELECT per row, validate, then INSERT filtered rows with confirmation.

Each task is registered in TASK_REGISTRY near the bottom of the file. Current tasks:

Task Purpose
check_vv50 For every doc with VV-57, check whether it also has VV-50. Writes has_vv50_{timestamp}.txt.
copy_57_to_50 Insert VV-50 rows mirroring existing VV-57 rows, skipping any DocumentIDs already in a has_vv50_*.txt file.
copy_with_new_id Example/template task — copy rows with a transformed ID.

Adding a new task: write a function def task_foo(db, args): ... and add it to TASK_REGISTRY. The building blocks run_select, load_query, preview_and_confirm, and run_insert are all at the top of the file.


Understanding the Logs

Migration Progress

Processing batch 10/100 (500 records)...
Batch 10 complete: inserted=450, updated=50, errors=0
[COMMIT] Transaction committed at batch 10

Validation Results

==================================================
$  Migration Validation Completed!
==================================================
Gross Success rate: 95.39%
Success rate w/o Ignored Files: 100.00%
371630 of 397043 Rows were found
--------------------------------------------------
MISSING ROW COUNT: 0 - See CSV output for details
We ignored a total of 25413 rows. We couldn't map these to the TargetDB
  • Gross Success rate — % of all source records found in target
  • Success rate w/o Ignored Files — % of mappable records found (should be 100%)
  • MISSING ROW COUNT — Records that should exist but don't (should be 0)
  • Ignored — Records that couldn't be mapped (unmapped variables, documents, or configurations)

Important Notes

File Data Migration Behavior

  1. Only Latest Revisions — File data migration only migrates the most recent revision of each variable for each file configuration. Historical revisions are not migrated.
  2. RevisionNo Reset — All migrated file data is inserted with RevisionNo = 1 in the target database.
  3. Configuration Mapping — You MUST verify manual overrides in config.json before running.

Progress Tracking and Resume

Both migration scripts support automatic resume:

  • Progress is saved every 10 batches.
  • If a migration fails, re-run the script and it will offer to resume.
  • Progress files are automatically cleaned up on success.

Validation

All migrations include automatic validation:

  • Compares source records (after mapping) to target records using set-based comparison.
  • Reports any missing records to CSV.
  • Should show 100% success rate for mappable records.

Troubleshooting

"Migration failed at batch X"

Check the log file, then re-run and choose y to resume from the last checkpoint.

"We ignored a total of X rows"

Expected for unmapped variables, documents, or configurations. Check the mapping CSV files to see what was skipped.

"MISSING ROW COUNT: X" (where X > 0)

Indicates a real problem:

  1. Check validation_missing_*.csv for details.
  2. Verify ID mappings in the mapping CSV files.
  3. Check the migration log for insert errors.

Configuration Mapping Issues

If you see warnings about duplicate ConfigurationNames:

  1. Run the SQL query above to find duplicates.
  2. Determine the correct target ID for each source configuration.
  3. Add manual overrides to config.json.
  4. Re-run the migration.

Database Connection Timeouts

  • Progress is saved automatically — re-run to resume.
  • Consider reducing batch_size in config.json.

Batch Workflow Transition Failures

If you see [CS3] Phase-2 access violation ... warnings in batch_workflow_paths_*.log:

  • The script automatically retries with escalating backoff (3s → 10s → 30s).
  • After 3 consecutive persistent failures it automatically reconnects the vault to reset PDM's in-process DLL state.
  • Genuine failures end up in failed_transitions_{timestamp}.txt — feed that file straight back in to retry just the failures.
  • Files that appear in not_available_{timestamp}.txt aren't really failures; they were already in the target state (e.g. from a previous successful run).

See BATCH_NOTES.md for full background on why ChangeState3 is difficult to call and how the COM bridging works.

Best Practices

  1. Always back up the target database before running migrations.
  2. Test on a dev/test environment first.
  3. Review mapping CSV files to verify ID mappings are correct.
  4. Check validation results — 100% success for mappable records.
  5. Keep config.json with any manual overrides for future reference.
  6. Use --dry-run with db_helper.py tasks before real runs.
  7. Save the has_vv50_*.txt / failed_transitions_*.txt output files — they let you incrementally mop up residual work without re-processing everything.

Support

For issues or questions:

  1. Check the log files for detailed error messages.
  2. Review the mapping CSV files to verify ID mappings.
  3. Ensure config.json is properly configured.
  4. Verify database connectivity and permissions.
  5. For PDM COM API internals, see BATCH_NOTES.md.
Description
No description provided
Readme 8.1 MiB
Languages
Python 98.8%
TSQL 1.2%