# PDM Vault Data Migration Tool A collection of Python scripts for migrating and managing SOLIDWORKS PDM Professional vault data. The project began as a pair of SQL-to-SQL migration scripts (for folder and file variable values) and has grown to include a suite of PDM API helpers for batch workflow transitions, Copy Tree exports, and interactive SQL tasks. ## Overview The project is split into two layers: 1. **Top-level migration scripts** — one-shot SQL migrations between source and target PDM databases (`migrate_folderdata.py`, `migrate_filedata.py`, `rollback_filedata.py`), plus verification utilities (`check_var_clashing.py`, `check_paths.py`). 2. **`helpers/` toolkit** — a newer set of scripts for live vault operations and ad-hoc SQL work: - Batch workflow state transitions via the PDM COM API - Batch "Copy Tree" exports - Interactive SQL helper with named query files and preview-and-confirm on every write ## Prerequisites - Python 3.x (3.12 tested) - SQL Server access to both source and target databases (ODBC Driver 17+) - SOLIDWORKS PDM Professional client (for `helpers/batch_*` scripts only — these use the local `ConisioLib.EdmVault` COM component) - Required packages (see `requirements.txt`): - `pyodbc` — SQL Server connections - `pywin32` — PDM COM API via `win32com.client` / `pythoncom` - `comtypes` — vtable-level COM calls (used for `ChangeState3`) Install dependencies: ```bash pip install -r requirements.txt ``` ## Project Structure ``` data_migration_project/ ├── config.json # Your real config (DB credentials, mappings) ├── config.json.template # Template for new installs — copy to config.json ├── db_utils.py # Shared SQL Server connection wrapper ├── migrate_folderdata.py # Folder/project variable value migration ├── migrate_filedata.py # File variable value migration (latest revision) ├── rollback_filedata.py # Rolls back a filedata migration ├── check_var_clashing.py # Finds variable name conflicts before migration ├── check_paths.py # Verifies folder path mapping ├── requirements.txt # Python dependencies ├── BATCH_NOTES.md # Deep-dive on PDM COM ChangeState3 internals ├── README.md # This file │ ├── helpers/ │ ├── batch_workflows_paths.py # Batch workflow state transitions │ ├── batch_copy_tree.py # Batch Copy Tree export │ ├── db_helper.py # Interactive SQL helper + tasks │ ├── test_batch_api.py # Dev-time prototype for PDM COM bridging │ └── queries/ # Reusable named SQL queries (.sql files) │ ├── documentation/ # PDM API reference (.chm files) └── logs/ # Migration log files ``` ## Configuration Setup ### 1. Create `config.json` Copy the template and fill in your real values: ```bash cp config.json.template config.json ``` The project supports both **SQL Authentication** and **Windows Authentication**: ```json { "source_db": { "driver": "{ODBC Driver 17 for SQL Server}", "server": "your-server", "database": "source-database", "username": "sql-user", "password": "sql-password", "trusted_connection": false }, "target_db": { "driver": "{ODBC Driver 17 for SQL Server}", "server": "your-server", "database": "target-database", "trusted_connection": true } } ``` - Set `trusted_connection: true` to use Windows Authentication (ignores username/password). - Set `trusted_connection: false` and provide `username`/`password` for SQL Server Authentication. ### 2. Path Mapping (for folder data) ```json { "path_mapping": { "target_root_folder": "DWS", "case_sensitive": false } } ``` - `target_root_folder`: Root folder name in your target vault. Source folder paths are prepended with this folder name. ### 3. Migration Settings ```json { "migration": { "duplicate_handling": "ignore", "batch_size": 500, "commit_interval": 10, "document_status_batch_size": 5000 } } ``` ### 4. Configuration Mapping Overrides (file data migration only) ```json { "configuration_mapping_overrides": { "165": 11250, "167": 11359 } } ``` Required when duplicate configuration names exist in the target database. Format: `"source_id": target_id`. **To find duplicates:** ```sql SELECT ConfigurationID, ConfigurationName FROM DocumentConfiguration WHERE ConfigurationName IN ( SELECT ConfigurationName FROM DocumentConfiguration GROUP BY ConfigurationName HAVING COUNT(*) > 1 ) ORDER BY ConfigurationName, ConfigurationID; ``` --- ## Top-Level Migration Scripts ### Folder Data Migration Migrates variable values for folders/projects (DocumentID = 1). ```bash python migrate_folderdata.py ``` **What it does:** - Maps ProjectIDs based on folder paths - Maps VariableIDs based on variable names - Migrates all revisions of folder-level variable values - Validates the migration - Creates mapping CSV files for review **Output files:** - `mapping_projects_{timestamp}.csv` - `mapping_variables_{timestamp}.csv` - `folderdata_migration_{timestamp}.log` - `validation_missing_folderdata_{timestamp}.csv` (if issues) --- ### File Data Migration Migrates variable values for files (ProjectID = 2, DocumentID != 1). **IMPORTANT:** Only migrates the **latest revision** of each variable for each file configuration. ```bash python migrate_filedata.py ``` **What it does:** - Maps VariableIDs by name, DocumentIDs by full file path, ConfigurationIDs (with manual overrides from `config.json` if specified) - Fetches only the latest revision per VariableID+DocumentID+ConfigurationID - Inserts all records with `RevisionNo = 1` - Validates the migration and emits mapping CSVs **You will be prompted** to confirm configuration mapping overrides before the migration runs — review carefully. **Output files:** - `mapping_variables_filedata_{timestamp}.csv` - `mapping_documents_filedata_{timestamp}.csv` - `filedata_migration_{timestamp}.log` - `validation_missing_filedata_{timestamp}.csv` (if issues) - Progress files (auto-deleted on success) --- ### Rollback File Data Migration ```bash python rollback_filedata.py mapping_documents_filedata_YYYYMMDD_HHMMSS.csv ``` Reads the document mapping CSV from a previous migration and deletes all VariableValue records for those documents from the target database. **Shows preview and prompts for confirmation before deleting.** > **WARNING:** This permanently deletes data from the target database. > Always back up first. --- ## Helper Scripts (`helpers/`) The helpers are live-vault tools that talk to PDM directly via the COM API, plus an interactive SQL runner. They are independent of the top-level migration scripts and can be used any time. ### `batch_workflows_paths.py` — Batch Workflow Transitions Drives `IEdmFile13::ChangeState3` against hundreds or thousands of files at once, transitioning each through a named workflow transition. Implements escalating-backoff retries and vault reconnect to handle PDM's in-process DLL state corruption on large batches. ```bash python helpers/batch_workflows_paths.py -v "Drilling_Test" -c files.csv -t "AA" ``` **Options:** - `-v, --vault` — PDM vault name - `-c, --csv` — Path to a text/CSV file with one full vault path per line (e.g. `C:\PDM\Drilling_Test\DWS\Parts\widget.sldprt`) - `-t, --transition` — Name of the workflow transition (e.g. `"AA"`) - `--comment` — Optional transition comment - `-u, --username` — PDM username (prompts if omitted) **Output files:** - `batch_workflow_paths_{timestamp}.log` — detailed log - `failed_transitions_{timestamp}.txt` — real failures worth retrying - `not_available_{timestamp}.txt` — files whose transition wasn't valid (typically already in the target state from a prior run) For implementation details on the restricted `ChangeState3` COM method and why it requires ctypes/comtypes vtable access, see [BATCH_NOTES.md](BATCH_NOTES.md). --- ### `batch_copy_tree.py` — Batch Copy Tree Export Reads part numbers from a CSV, runs PDM's Copy Tree function for each, and exports each part's file tree to its own subfolder. ```bash python helpers/batch_copy_tree.py -c parts.csv -o "C:\Temp\Output" --vault "Drilling_Test" ``` --- ### `db_helper.py` — Interactive SQL Helper Runs SELECT queries, multi-step tasks, and confirmed INSERTs against either database from `config.json`. Queries are stored as `.sql` files in `helpers/queries/` and referenced by name. **List saved queries:** ```bash python helpers/db_helper.py --list-queries ``` **Run a saved query by name:** ```bash python helpers/db_helper.py --db target_db --query get_var47 ``` **Run raw SQL (anything with a space in it is treated as a literal query):** ```bash python helpers/db_helper.py --db target_db --query "SELECT TOP 10 * FROM Documents" ``` **Run a predefined task:** ```bash python helpers/db_helper.py --db target_db --task copy_57_to_50 --dry-run python helpers/db_helper.py --db target_db --task copy_57_to_50 ``` #### Safety features - Every INSERT or UPDATE goes through `preview_and_confirm` — you see the SQL, the row count, and a sample of the data and must type `y` before it executes. - `--dry-run` shows the preview but skips execution entirely. - All writes run inside a transaction. On any per-row error you're asked whether to commit or rollback. - Every query, parameter set, and decision is logged to `db_helper_{timestamp}.log`. #### Saved SQL Queries Drop a `.sql` file into `helpers/queries/` and it becomes callable by its filename (without extension). Leave a comment on the first line for an inline description — it shows up in `--list-queries`. Current queries: - `DWS_GET_VV-57.sql` — Documents in DWS paths that have VariableID=57 - `DWS_VV-57_FullList.sql` — Full VariableValue rows for VV-57 in DWS paths - `Get_All_VV_Per_DocID.sql` — All distinct VariableIDs for a given DocumentID (parameterized with `?`) - `INSERT_VV50_Copy.sql` — Inserts a VV-50 copy of a VV-57 row #### Tasks Tasks are Python functions in `db_helper.py` that chain multiple queries and transforms together — e.g. run a SELECT, loop the results, run a second parameterized SELECT per row, validate, then INSERT filtered rows with confirmation. Each task is registered in `TASK_REGISTRY` near the bottom of the file. Current tasks: | Task | Purpose | |------|---------| | `check_vv50` | For every doc with VV-57, check whether it also has VV-50. Writes `has_vv50_{timestamp}.txt`. | | `copy_57_to_50` | Insert VV-50 rows mirroring existing VV-57 rows, skipping any DocumentIDs already in a `has_vv50_*.txt` file. | | `copy_with_new_id` | Example/template task — copy rows with a transformed ID. | **Adding a new task:** write a function `def task_foo(db, args): ...` and add it to `TASK_REGISTRY`. The building blocks `run_select`, `load_query`, `preview_and_confirm`, and `run_insert` are all at the top of the file. --- ## Understanding the Logs ### Migration Progress ``` Processing batch 10/100 (500 records)... Batch 10 complete: inserted=450, updated=50, errors=0 [COMMIT] Transaction committed at batch 10 ``` ### Validation Results ``` ================================================== $ Migration Validation Completed! ================================================== Gross Success rate: 95.39% Success rate w/o Ignored Files: 100.00% 371630 of 397043 Rows were found -------------------------------------------------- MISSING ROW COUNT: 0 - See CSV output for details We ignored a total of 25413 rows. We couldn't map these to the TargetDB ``` - **Gross Success rate** — % of all source records found in target - **Success rate w/o Ignored Files** — % of mappable records found (should be 100%) - **MISSING ROW COUNT** — Records that should exist but don't (should be 0) - **Ignored** — Records that couldn't be mapped (unmapped variables, documents, or configurations) ## Important Notes ### File Data Migration Behavior 1. **Only Latest Revisions** — File data migration only migrates the most recent revision of each variable for each file configuration. Historical revisions are not migrated. 2. **RevisionNo Reset** — All migrated file data is inserted with `RevisionNo = 1` in the target database. 3. **Configuration Mapping** — You MUST verify manual overrides in `config.json` before running. ### Progress Tracking and Resume Both migration scripts support automatic resume: - Progress is saved every 10 batches. - If a migration fails, re-run the script and it will offer to resume. - Progress files are automatically cleaned up on success. ### Validation All migrations include automatic validation: - Compares source records (after mapping) to target records using set-based comparison. - Reports any missing records to CSV. - Should show 100% success rate for mappable records. ## Troubleshooting ### "Migration failed at batch X" Check the log file, then re-run and choose `y` to resume from the last checkpoint. ### "We ignored a total of X rows" Expected for unmapped variables, documents, or configurations. Check the mapping CSV files to see what was skipped. ### "MISSING ROW COUNT: X" (where X > 0) Indicates a real problem: 1. Check `validation_missing_*.csv` for details. 2. Verify ID mappings in the mapping CSV files. 3. Check the migration log for insert errors. ### Configuration Mapping Issues If you see warnings about duplicate ConfigurationNames: 1. Run the SQL query above to find duplicates. 2. Determine the correct target ID for each source configuration. 3. Add manual overrides to `config.json`. 4. Re-run the migration. ### Database Connection Timeouts - Progress is saved automatically — re-run to resume. - Consider reducing `batch_size` in `config.json`. ### Batch Workflow Transition Failures If you see `[CS3] Phase-2 access violation ...` warnings in `batch_workflow_paths_*.log`: - The script automatically retries with escalating backoff (3s → 10s → 30s). - After 3 consecutive persistent failures it automatically reconnects the vault to reset PDM's in-process DLL state. - Genuine failures end up in `failed_transitions_{timestamp}.txt` — feed that file straight back in to retry just the failures. - Files that appear in `not_available_{timestamp}.txt` aren't really failures; they were already in the target state (e.g. from a previous successful run). See [BATCH_NOTES.md](BATCH_NOTES.md) for full background on why `ChangeState3` is difficult to call and how the COM bridging works. ## Best Practices 1. **Always back up** the target database before running migrations. 2. **Test on a dev/test environment first**. 3. **Review mapping CSV files** to verify ID mappings are correct. 4. **Check validation results** — 100% success for mappable records. 5. **Keep `config.json`** with any manual overrides for future reference. 6. **Use `--dry-run`** with `db_helper.py` tasks before real runs. 7. **Save the `has_vv50_*.txt` / `failed_transitions_*.txt` output files** — they let you incrementally mop up residual work without re-processing everything. ## Support For issues or questions: 1. Check the log files for detailed error messages. 2. Review the mapping CSV files to verify ID mappings. 3. Ensure `config.json` is properly configured. 4. Verify database connectivity and permissions. 5. For PDM COM API internals, see [BATCH_NOTES.md](BATCH_NOTES.md).