PDM Vault Data Migration Tool
A collection of Python scripts for migrating and managing SOLIDWORKS PDM Professional vault data. The project began as a pair of SQL-to-SQL migration scripts (for folder and file variable values) and has grown to include a suite of PDM API helpers for batch workflow transitions, Copy Tree exports, and interactive SQL tasks.
Overview
The project is split into two layers:
- Top-level migration scripts — one-shot SQL migrations between source
and target PDM databases (
migrate_folderdata.py,migrate_filedata.py,rollback_filedata.py), plus verification utilities (check_var_clashing.py,check_paths.py). helpers/toolkit — a newer set of scripts for live vault operations and ad-hoc SQL work:- Batch workflow state transitions via the PDM COM API
- Batch "Copy Tree" exports
- Interactive SQL helper with named query files and preview-and-confirm on every write
Prerequisites
- Python 3.x (3.12 tested)
- SQL Server access to both source and target databases (ODBC Driver 17+)
- SOLIDWORKS PDM Professional client (for
helpers/batch_*scripts only — these use the localConisioLib.EdmVaultCOM component) - Required packages (see
requirements.txt):pyodbc— SQL Server connectionspywin32— PDM COM API viawin32com.client/pythoncomcomtypes— vtable-level COM calls (used forChangeState3)
Install dependencies:
pip install -r requirements.txt
Project Structure
data_migration_project/
├── config.json # Your real config (DB credentials, mappings)
├── config.json.template # Template for new installs — copy to config.json
├── db_utils.py # Shared SQL Server connection wrapper
├── migrate_folderdata.py # Folder/project variable value migration
├── migrate_filedata.py # File variable value migration (latest revision)
├── rollback_filedata.py # Rolls back a filedata migration
├── check_var_clashing.py # Finds variable name conflicts before migration
├── check_paths.py # Verifies folder path mapping
├── requirements.txt # Python dependencies
├── BATCH_NOTES.md # Deep-dive on PDM COM ChangeState3 internals
├── README.md # This file
│
├── helpers/
│ ├── batch_workflows_paths.py # Batch workflow state transitions
│ ├── batch_copy_tree.py # Batch Copy Tree export
│ ├── db_helper.py # Interactive SQL helper + tasks
│ ├── test_batch_api.py # Dev-time prototype for PDM COM bridging
│ └── queries/ # Reusable named SQL queries (.sql files)
│
├── documentation/ # PDM API reference (.chm files)
└── logs/ # Migration log files
Configuration Setup
1. Create config.json
Copy the template and fill in your real values:
cp config.json.template config.json
The project supports both SQL Authentication and Windows Authentication:
{
"source_db": {
"driver": "{ODBC Driver 17 for SQL Server}",
"server": "your-server",
"database": "source-database",
"username": "sql-user",
"password": "sql-password",
"trusted_connection": false
},
"target_db": {
"driver": "{ODBC Driver 17 for SQL Server}",
"server": "your-server",
"database": "target-database",
"trusted_connection": true
}
}
- Set
trusted_connection: trueto use Windows Authentication (ignores username/password). - Set
trusted_connection: falseand provideusername/passwordfor SQL Server Authentication.
2. Path Mapping (for folder data)
{
"path_mapping": {
"target_root_folder": "DWS",
"case_sensitive": false
}
}
target_root_folder: Root folder name in your target vault. Source folder paths are prepended with this folder name.
3. Migration Settings
{
"migration": {
"duplicate_handling": "ignore",
"batch_size": 500,
"commit_interval": 10,
"document_status_batch_size": 5000
}
}
4. Configuration Mapping Overrides (file data migration only)
{
"configuration_mapping_overrides": {
"165": 11250,
"167": 11359
}
}
Required when duplicate configuration names exist in the target database.
Format: "source_id": target_id.
To find duplicates:
SELECT ConfigurationID, ConfigurationName
FROM DocumentConfiguration
WHERE ConfigurationName IN (
SELECT ConfigurationName
FROM DocumentConfiguration
GROUP BY ConfigurationName
HAVING COUNT(*) > 1
)
ORDER BY ConfigurationName, ConfigurationID;
Top-Level Migration Scripts
Folder Data Migration
Migrates variable values for folders/projects (DocumentID = 1).
python migrate_folderdata.py
What it does:
- Maps ProjectIDs based on folder paths
- Maps VariableIDs based on variable names
- Migrates all revisions of folder-level variable values
- Validates the migration
- Creates mapping CSV files for review
Output files:
mapping_projects_{timestamp}.csvmapping_variables_{timestamp}.csvfolderdata_migration_{timestamp}.logvalidation_missing_folderdata_{timestamp}.csv(if issues)
File Data Migration
Migrates variable values for files (ProjectID = 2, DocumentID != 1).
IMPORTANT: Only migrates the latest revision of each variable for each file configuration.
python migrate_filedata.py
What it does:
- Maps VariableIDs by name, DocumentIDs by full file path, ConfigurationIDs
(with manual overrides from
config.jsonif specified) - Fetches only the latest revision per VariableID+DocumentID+ConfigurationID
- Inserts all records with
RevisionNo = 1 - Validates the migration and emits mapping CSVs
You will be prompted to confirm configuration mapping overrides before the migration runs — review carefully.
Output files:
mapping_variables_filedata_{timestamp}.csvmapping_documents_filedata_{timestamp}.csvfiledata_migration_{timestamp}.logvalidation_missing_filedata_{timestamp}.csv(if issues)- Progress files (auto-deleted on success)
Rollback File Data Migration
python rollback_filedata.py mapping_documents_filedata_YYYYMMDD_HHMMSS.csv
Reads the document mapping CSV from a previous migration and deletes all VariableValue records for those documents from the target database. Shows preview and prompts for confirmation before deleting.
WARNING: This permanently deletes data from the target database. Always back up first.
Helper Scripts (helpers/)
The helpers are live-vault tools that talk to PDM directly via the COM API, plus an interactive SQL runner. They are independent of the top-level migration scripts and can be used any time.
batch_workflows_paths.py — Batch Workflow Transitions
Drives IEdmFile13::ChangeState3 against hundreds or thousands of files at
once, transitioning each through a named workflow transition. Implements
escalating-backoff retries and vault reconnect to handle PDM's in-process
DLL state corruption on large batches.
python helpers/batch_workflows_paths.py -v "Drilling_Test" -c files.csv -t "AA"
Options:
-v, --vault— PDM vault name-c, --csv— Path to a text/CSV file with one full vault path per line (e.g.C:\PDM\Drilling_Test\DWS\Parts\widget.sldprt)-t, --transition— Name of the workflow transition (e.g."AA")--comment— Optional transition comment-u, --username— PDM username (prompts if omitted)
Output files:
batch_workflow_paths_{timestamp}.log— detailed logfailed_transitions_{timestamp}.txt— real failures worth retryingnot_available_{timestamp}.txt— files whose transition wasn't valid (typically already in the target state from a prior run)
For implementation details on the restricted ChangeState3 COM method and
why it requires ctypes/comtypes vtable access, see
BATCH_NOTES.md.
batch_copy_tree.py — Batch Copy Tree Export
Reads part numbers from a CSV, runs PDM's Copy Tree function for each, and exports each part's file tree to its own subfolder.
python helpers/batch_copy_tree.py -c parts.csv -o "C:\Temp\Output" --vault "Drilling_Test"
db_helper.py — Interactive SQL Helper
Runs SELECT queries, multi-step tasks, and confirmed INSERTs against either
database from config.json. Queries are stored as .sql files in
helpers/queries/ and referenced by name.
List saved queries:
python helpers/db_helper.py --list-queries
Run a saved query by name:
python helpers/db_helper.py --db target_db --query get_var47
Run raw SQL (anything with a space in it is treated as a literal query):
python helpers/db_helper.py --db target_db --query "SELECT TOP 10 * FROM Documents"
Run a predefined task:
python helpers/db_helper.py --db target_db --task copy_57_to_50 --dry-run
python helpers/db_helper.py --db target_db --task copy_57_to_50
Safety features
- Every INSERT or UPDATE goes through
preview_and_confirm— you see the SQL, the row count, and a sample of the data and must typeybefore it executes. --dry-runshows the preview but skips execution entirely.- All writes run inside a transaction. On any per-row error you're asked whether to commit or rollback.
- Every query, parameter set, and decision is logged to
db_helper_{timestamp}.log.
Saved SQL Queries
Drop a .sql file into helpers/queries/ and it becomes callable by its
filename (without extension). Leave a comment on the first line for an
inline description — it shows up in --list-queries.
Current queries:
DWS_GET_VV-57.sql— Documents in DWS paths that have VariableID=57DWS_VV-57_FullList.sql— Full VariableValue rows for VV-57 in DWS pathsGet_All_VV_Per_DocID.sql— All distinct VariableIDs for a given DocumentID (parameterized with?)INSERT_VV50_Copy.sql— Inserts a VV-50 copy of a VV-57 row
Tasks
Tasks are Python functions in db_helper.py that chain multiple queries
and transforms together — e.g. run a SELECT, loop the results, run a
second parameterized SELECT per row, validate, then INSERT filtered rows
with confirmation.
Each task is registered in TASK_REGISTRY near the bottom of the file.
Current tasks:
| Task | Purpose |
|---|---|
check_vv50 |
For every doc with VV-57, check whether it also has VV-50. Writes has_vv50_{timestamp}.txt. |
copy_57_to_50 |
Insert VV-50 rows mirroring existing VV-57 rows, skipping any DocumentIDs already in a has_vv50_*.txt file. |
copy_with_new_id |
Example/template task — copy rows with a transformed ID. |
Adding a new task: write a function def task_foo(db, args): ... and
add it to TASK_REGISTRY. The building blocks run_select, load_query,
preview_and_confirm, and run_insert are all at the top of the file.
Understanding the Logs
Migration Progress
Processing batch 10/100 (500 records)...
Batch 10 complete: inserted=450, updated=50, errors=0
[COMMIT] Transaction committed at batch 10
Validation Results
==================================================
$ Migration Validation Completed!
==================================================
Gross Success rate: 95.39%
Success rate w/o Ignored Files: 100.00%
371630 of 397043 Rows were found
--------------------------------------------------
MISSING ROW COUNT: 0 - See CSV output for details
We ignored a total of 25413 rows. We couldn't map these to the TargetDB
- Gross Success rate — % of all source records found in target
- Success rate w/o Ignored Files — % of mappable records found (should be 100%)
- MISSING ROW COUNT — Records that should exist but don't (should be 0)
- Ignored — Records that couldn't be mapped (unmapped variables, documents, or configurations)
Important Notes
File Data Migration Behavior
- Only Latest Revisions — File data migration only migrates the most recent revision of each variable for each file configuration. Historical revisions are not migrated.
- RevisionNo Reset — All migrated file data is inserted with
RevisionNo = 1in the target database. - Configuration Mapping — You MUST verify manual overrides in
config.jsonbefore running.
Progress Tracking and Resume
Both migration scripts support automatic resume:
- Progress is saved every 10 batches.
- If a migration fails, re-run the script and it will offer to resume.
- Progress files are automatically cleaned up on success.
Validation
All migrations include automatic validation:
- Compares source records (after mapping) to target records using set-based comparison.
- Reports any missing records to CSV.
- Should show 100% success rate for mappable records.
Troubleshooting
"Migration failed at batch X"
Check the log file, then re-run and choose y to resume from the last
checkpoint.
"We ignored a total of X rows"
Expected for unmapped variables, documents, or configurations. Check the mapping CSV files to see what was skipped.
"MISSING ROW COUNT: X" (where X > 0)
Indicates a real problem:
- Check
validation_missing_*.csvfor details. - Verify ID mappings in the mapping CSV files.
- Check the migration log for insert errors.
Configuration Mapping Issues
If you see warnings about duplicate ConfigurationNames:
- Run the SQL query above to find duplicates.
- Determine the correct target ID for each source configuration.
- Add manual overrides to
config.json. - Re-run the migration.
Database Connection Timeouts
- Progress is saved automatically — re-run to resume.
- Consider reducing
batch_sizeinconfig.json.
Batch Workflow Transition Failures
If you see [CS3] Phase-2 access violation ... warnings in
batch_workflow_paths_*.log:
- The script automatically retries with escalating backoff (3s → 10s → 30s).
- After 3 consecutive persistent failures it automatically reconnects the vault to reset PDM's in-process DLL state.
- Genuine failures end up in
failed_transitions_{timestamp}.txt— feed that file straight back in to retry just the failures. - Files that appear in
not_available_{timestamp}.txtaren't really failures; they were already in the target state (e.g. from a previous successful run).
See BATCH_NOTES.md for full background on why
ChangeState3 is difficult to call and how the COM bridging works.
Best Practices
- Always back up the target database before running migrations.
- Test on a dev/test environment first.
- Review mapping CSV files to verify ID mappings are correct.
- Check validation results — 100% success for mappable records.
- Keep
config.jsonwith any manual overrides for future reference. - Use
--dry-runwithdb_helper.pytasks before real runs. - Save the
has_vv50_*.txt/failed_transitions_*.txtoutput files — they let you incrementally mop up residual work without re-processing everything.
Support
For issues or questions:
- Check the log files for detailed error messages.
- Review the mapping CSV files to verify ID mappings.
- Ensure
config.jsonis properly configured. - Verify database connectivity and permissions.
- For PDM COM API internals, see BATCH_NOTES.md.