Initial Commit of the PDM project (ready for DWS migration)

This commit is contained in:
will
2026-04-20 08:42:38 -05:00
commit dda7b664e7
2721 changed files with 442772 additions and 0 deletions

469
README.md Normal file
View File

@@ -0,0 +1,469 @@
# PDM Vault Data Migration Tool
A collection of Python scripts for migrating and managing SOLIDWORKS PDM
Professional vault data. The project began as a pair of SQL-to-SQL migration
scripts (for folder and file variable values) and has grown to include a
suite of PDM API helpers for batch workflow transitions, Copy Tree exports,
and interactive SQL tasks.
## Overview
The project is split into two layers:
1. **Top-level migration scripts** — one-shot SQL migrations between source
and target PDM databases (`migrate_folderdata.py`, `migrate_filedata.py`,
`rollback_filedata.py`), plus verification utilities
(`check_var_clashing.py`, `check_paths.py`).
2. **`helpers/` toolkit** — a newer set of scripts for live vault operations
and ad-hoc SQL work:
- Batch workflow state transitions via the PDM COM API
- Batch "Copy Tree" exports
- Interactive SQL helper with named query files and preview-and-confirm
on every write
## Prerequisites
- Python 3.x (3.12 tested)
- SQL Server access to both source and target databases (ODBC Driver 17+)
- SOLIDWORKS PDM Professional client (for `helpers/batch_*` scripts only —
these use the local `ConisioLib.EdmVault` COM component)
- Required packages (see `requirements.txt`):
- `pyodbc` — SQL Server connections
- `pywin32` — PDM COM API via `win32com.client` / `pythoncom`
- `comtypes` — vtable-level COM calls (used for `ChangeState3`)
Install dependencies:
```bash
pip install -r requirements.txt
```
## Project Structure
```
data_migration_project/
├── config.json # Your real config (DB credentials, mappings)
├── config.json.template # Template for new installs — copy to config.json
├── db_utils.py # Shared SQL Server connection wrapper
├── migrate_folderdata.py # Folder/project variable value migration
├── migrate_filedata.py # File variable value migration (latest revision)
├── rollback_filedata.py # Rolls back a filedata migration
├── check_var_clashing.py # Finds variable name conflicts before migration
├── check_paths.py # Verifies folder path mapping
├── requirements.txt # Python dependencies
├── BATCH_NOTES.md # Deep-dive on PDM COM ChangeState3 internals
├── README.md # This file
├── helpers/
│ ├── batch_workflows_paths.py # Batch workflow state transitions
│ ├── batch_copy_tree.py # Batch Copy Tree export
│ ├── db_helper.py # Interactive SQL helper + tasks
│ ├── test_batch_api.py # Dev-time prototype for PDM COM bridging
│ └── queries/ # Reusable named SQL queries (.sql files)
├── documentation/ # PDM API reference (.chm files)
└── logs/ # Migration log files
```
## Configuration Setup
### 1. Create `config.json`
Copy the template and fill in your real values:
```bash
cp config.json.template config.json
```
The project supports both **SQL Authentication** and **Windows Authentication**:
```json
{
"source_db": {
"driver": "{ODBC Driver 17 for SQL Server}",
"server": "your-server",
"database": "source-database",
"username": "sql-user",
"password": "sql-password",
"trusted_connection": false
},
"target_db": {
"driver": "{ODBC Driver 17 for SQL Server}",
"server": "your-server",
"database": "target-database",
"trusted_connection": true
}
}
```
- Set `trusted_connection: true` to use Windows Authentication (ignores
username/password).
- Set `trusted_connection: false` and provide `username`/`password` for SQL
Server Authentication.
### 2. Path Mapping (for folder data)
```json
{
"path_mapping": {
"target_root_folder": "DWS",
"case_sensitive": false
}
}
```
- `target_root_folder`: Root folder name in your target vault. Source folder
paths are prepended with this folder name.
### 3. Migration Settings
```json
{
"migration": {
"duplicate_handling": "ignore",
"batch_size": 500,
"commit_interval": 10,
"document_status_batch_size": 5000
}
}
```
### 4. Configuration Mapping Overrides (file data migration only)
```json
{
"configuration_mapping_overrides": {
"165": 11250,
"167": 11359
}
}
```
Required when duplicate configuration names exist in the target database.
Format: `"source_id": target_id`.
**To find duplicates:**
```sql
SELECT ConfigurationID, ConfigurationName
FROM DocumentConfiguration
WHERE ConfigurationName IN (
SELECT ConfigurationName
FROM DocumentConfiguration
GROUP BY ConfigurationName
HAVING COUNT(*) > 1
)
ORDER BY ConfigurationName, ConfigurationID;
```
---
## Top-Level Migration Scripts
### Folder Data Migration
Migrates variable values for folders/projects (DocumentID = 1).
```bash
python migrate_folderdata.py
```
**What it does:**
- Maps ProjectIDs based on folder paths
- Maps VariableIDs based on variable names
- Migrates all revisions of folder-level variable values
- Validates the migration
- Creates mapping CSV files for review
**Output files:**
- `mapping_projects_{timestamp}.csv`
- `mapping_variables_{timestamp}.csv`
- `folderdata_migration_{timestamp}.log`
- `validation_missing_folderdata_{timestamp}.csv` (if issues)
---
### File Data Migration
Migrates variable values for files (ProjectID = 2, DocumentID != 1).
**IMPORTANT:** Only migrates the **latest revision** of each variable for
each file configuration.
```bash
python migrate_filedata.py
```
**What it does:**
- Maps VariableIDs by name, DocumentIDs by full file path, ConfigurationIDs
(with manual overrides from `config.json` if specified)
- Fetches only the latest revision per VariableID+DocumentID+ConfigurationID
- Inserts all records with `RevisionNo = 1`
- Validates the migration and emits mapping CSVs
**You will be prompted** to confirm configuration mapping overrides before
the migration runs — review carefully.
**Output files:**
- `mapping_variables_filedata_{timestamp}.csv`
- `mapping_documents_filedata_{timestamp}.csv`
- `filedata_migration_{timestamp}.log`
- `validation_missing_filedata_{timestamp}.csv` (if issues)
- Progress files (auto-deleted on success)
---
### Rollback File Data Migration
```bash
python rollback_filedata.py mapping_documents_filedata_YYYYMMDD_HHMMSS.csv
```
Reads the document mapping CSV from a previous migration and deletes all
VariableValue records for those documents from the target database.
**Shows preview and prompts for confirmation before deleting.**
> **WARNING:** This permanently deletes data from the target database.
> Always back up first.
---
## Helper Scripts (`helpers/`)
The helpers are live-vault tools that talk to PDM directly via the COM API,
plus an interactive SQL runner. They are independent of the top-level
migration scripts and can be used any time.
### `batch_workflows_paths.py` — Batch Workflow Transitions
Drives `IEdmFile13::ChangeState3` against hundreds or thousands of files at
once, transitioning each through a named workflow transition. Implements
escalating-backoff retries and vault reconnect to handle PDM's in-process
DLL state corruption on large batches.
```bash
python helpers/batch_workflows_paths.py -v "Drilling_Test" -c files.csv -t "AA"
```
**Options:**
- `-v, --vault` — PDM vault name
- `-c, --csv` — Path to a text/CSV file with one full vault path per line
(e.g. `C:\PDM\Drilling_Test\DWS\Parts\widget.sldprt`)
- `-t, --transition` — Name of the workflow transition (e.g. `"AA"`)
- `--comment` — Optional transition comment
- `-u, --username` — PDM username (prompts if omitted)
**Output files:**
- `batch_workflow_paths_{timestamp}.log` — detailed log
- `failed_transitions_{timestamp}.txt` — real failures worth retrying
- `not_available_{timestamp}.txt` — files whose transition wasn't valid
(typically already in the target state from a prior run)
For implementation details on the restricted `ChangeState3` COM method and
why it requires ctypes/comtypes vtable access, see
[BATCH_NOTES.md](BATCH_NOTES.md).
---
### `batch_copy_tree.py` — Batch Copy Tree Export
Reads part numbers from a CSV, runs PDM's Copy Tree function for each, and
exports each part's file tree to its own subfolder.
```bash
python helpers/batch_copy_tree.py -c parts.csv -o "C:\Temp\Output" --vault "Drilling_Test"
```
---
### `db_helper.py` — Interactive SQL Helper
Runs SELECT queries, multi-step tasks, and confirmed INSERTs against either
database from `config.json`. Queries are stored as `.sql` files in
`helpers/queries/` and referenced by name.
**List saved queries:**
```bash
python helpers/db_helper.py --list-queries
```
**Run a saved query by name:**
```bash
python helpers/db_helper.py --db target_db --query get_var47
```
**Run raw SQL (anything with a space in it is treated as a literal query):**
```bash
python helpers/db_helper.py --db target_db --query "SELECT TOP 10 * FROM Documents"
```
**Run a predefined task:**
```bash
python helpers/db_helper.py --db target_db --task copy_57_to_50 --dry-run
python helpers/db_helper.py --db target_db --task copy_57_to_50
```
#### Safety features
- Every INSERT or UPDATE goes through `preview_and_confirm` — you see the
SQL, the row count, and a sample of the data and must type `y` before it
executes.
- `--dry-run` shows the preview but skips execution entirely.
- All writes run inside a transaction. On any per-row error you're asked
whether to commit or rollback.
- Every query, parameter set, and decision is logged to
`db_helper_{timestamp}.log`.
#### Saved SQL Queries
Drop a `.sql` file into `helpers/queries/` and it becomes callable by its
filename (without extension). Leave a comment on the first line for an
inline description — it shows up in `--list-queries`.
Current queries:
- `DWS_GET_VV-57.sql` — Documents in DWS paths that have VariableID=57
- `DWS_VV-57_FullList.sql` — Full VariableValue rows for VV-57 in DWS paths
- `Get_All_VV_Per_DocID.sql` — All distinct VariableIDs for a given
DocumentID (parameterized with `?`)
- `INSERT_VV50_Copy.sql` — Inserts a VV-50 copy of a VV-57 row
#### Tasks
Tasks are Python functions in `db_helper.py` that chain multiple queries
and transforms together — e.g. run a SELECT, loop the results, run a
second parameterized SELECT per row, validate, then INSERT filtered rows
with confirmation.
Each task is registered in `TASK_REGISTRY` near the bottom of the file.
Current tasks:
| Task | Purpose |
|------|---------|
| `check_vv50` | For every doc with VV-57, check whether it also has VV-50. Writes `has_vv50_{timestamp}.txt`. |
| `copy_57_to_50` | Insert VV-50 rows mirroring existing VV-57 rows, skipping any DocumentIDs already in a `has_vv50_*.txt` file. |
| `copy_with_new_id` | Example/template task — copy rows with a transformed ID. |
**Adding a new task:** write a function `def task_foo(db, args): ...` and
add it to `TASK_REGISTRY`. The building blocks `run_select`, `load_query`,
`preview_and_confirm`, and `run_insert` are all at the top of the file.
---
## Understanding the Logs
### Migration Progress
```
Processing batch 10/100 (500 records)...
Batch 10 complete: inserted=450, updated=50, errors=0
[COMMIT] Transaction committed at batch 10
```
### Validation Results
```
==================================================
$ Migration Validation Completed!
==================================================
Gross Success rate: 95.39%
Success rate w/o Ignored Files: 100.00%
371630 of 397043 Rows were found
--------------------------------------------------
MISSING ROW COUNT: 0 - See CSV output for details
We ignored a total of 25413 rows. We couldn't map these to the TargetDB
```
- **Gross Success rate** — % of all source records found in target
- **Success rate w/o Ignored Files** — % of mappable records found (should
be 100%)
- **MISSING ROW COUNT** — Records that should exist but don't (should be 0)
- **Ignored** — Records that couldn't be mapped (unmapped variables,
documents, or configurations)
## Important Notes
### File Data Migration Behavior
1. **Only Latest Revisions** — File data migration only migrates the most
recent revision of each variable for each file configuration. Historical
revisions are not migrated.
2. **RevisionNo Reset** — All migrated file data is inserted with
`RevisionNo = 1` in the target database.
3. **Configuration Mapping** — You MUST verify manual overrides in
`config.json` before running.
### Progress Tracking and Resume
Both migration scripts support automatic resume:
- Progress is saved every 10 batches.
- If a migration fails, re-run the script and it will offer to resume.
- Progress files are automatically cleaned up on success.
### Validation
All migrations include automatic validation:
- Compares source records (after mapping) to target records using set-based
comparison.
- Reports any missing records to CSV.
- Should show 100% success rate for mappable records.
## Troubleshooting
### "Migration failed at batch X"
Check the log file, then re-run and choose `y` to resume from the last
checkpoint.
### "We ignored a total of X rows"
Expected for unmapped variables, documents, or configurations. Check the
mapping CSV files to see what was skipped.
### "MISSING ROW COUNT: X" (where X > 0)
Indicates a real problem:
1. Check `validation_missing_*.csv` for details.
2. Verify ID mappings in the mapping CSV files.
3. Check the migration log for insert errors.
### Configuration Mapping Issues
If you see warnings about duplicate ConfigurationNames:
1. Run the SQL query above to find duplicates.
2. Determine the correct target ID for each source configuration.
3. Add manual overrides to `config.json`.
4. Re-run the migration.
### Database Connection Timeouts
- Progress is saved automatically — re-run to resume.
- Consider reducing `batch_size` in `config.json`.
### Batch Workflow Transition Failures
If you see `[CS3] Phase-2 access violation ...` warnings in
`batch_workflow_paths_*.log`:
- The script automatically retries with escalating backoff (3s → 10s → 30s).
- After 3 consecutive persistent failures it automatically reconnects the
vault to reset PDM's in-process DLL state.
- Genuine failures end up in `failed_transitions_{timestamp}.txt` — feed
that file straight back in to retry just the failures.
- Files that appear in `not_available_{timestamp}.txt` aren't really
failures; they were already in the target state (e.g. from a previous
successful run).
See [BATCH_NOTES.md](BATCH_NOTES.md) for full background on why
`ChangeState3` is difficult to call and how the COM bridging works.
## Best Practices
1. **Always back up** the target database before running migrations.
2. **Test on a dev/test environment first**.
3. **Review mapping CSV files** to verify ID mappings are correct.
4. **Check validation results** — 100% success for mappable records.
5. **Keep `config.json`** with any manual overrides for future reference.
6. **Use `--dry-run`** with `db_helper.py` tasks before real runs.
7. **Save the `has_vv50_*.txt` / `failed_transitions_*.txt` output files**
— they let you incrementally mop up residual work without re-processing
everything.
## Support
For issues or questions:
1. Check the log files for detailed error messages.
2. Review the mapping CSV files to verify ID mappings.
3. Ensure `config.json` is properly configured.
4. Verify database connectivity and permissions.
5. For PDM COM API internals, see [BATCH_NOTES.md](BATCH_NOTES.md).