Initial Commit of the PDM project (ready for DWS migration)
This commit is contained in:
469
README.md
Normal file
469
README.md
Normal file
@@ -0,0 +1,469 @@
|
||||
# PDM Vault Data Migration Tool
|
||||
|
||||
A collection of Python scripts for migrating and managing SOLIDWORKS PDM
|
||||
Professional vault data. The project began as a pair of SQL-to-SQL migration
|
||||
scripts (for folder and file variable values) and has grown to include a
|
||||
suite of PDM API helpers for batch workflow transitions, Copy Tree exports,
|
||||
and interactive SQL tasks.
|
||||
|
||||
## Overview
|
||||
|
||||
The project is split into two layers:
|
||||
|
||||
1. **Top-level migration scripts** — one-shot SQL migrations between source
|
||||
and target PDM databases (`migrate_folderdata.py`, `migrate_filedata.py`,
|
||||
`rollback_filedata.py`), plus verification utilities
|
||||
(`check_var_clashing.py`, `check_paths.py`).
|
||||
2. **`helpers/` toolkit** — a newer set of scripts for live vault operations
|
||||
and ad-hoc SQL work:
|
||||
- Batch workflow state transitions via the PDM COM API
|
||||
- Batch "Copy Tree" exports
|
||||
- Interactive SQL helper with named query files and preview-and-confirm
|
||||
on every write
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.x (3.12 tested)
|
||||
- SQL Server access to both source and target databases (ODBC Driver 17+)
|
||||
- SOLIDWORKS PDM Professional client (for `helpers/batch_*` scripts only —
|
||||
these use the local `ConisioLib.EdmVault` COM component)
|
||||
- Required packages (see `requirements.txt`):
|
||||
- `pyodbc` — SQL Server connections
|
||||
- `pywin32` — PDM COM API via `win32com.client` / `pythoncom`
|
||||
- `comtypes` — vtable-level COM calls (used for `ChangeState3`)
|
||||
|
||||
Install dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
data_migration_project/
|
||||
├── config.json # Your real config (DB credentials, mappings)
|
||||
├── config.json.template # Template for new installs — copy to config.json
|
||||
├── db_utils.py # Shared SQL Server connection wrapper
|
||||
├── migrate_folderdata.py # Folder/project variable value migration
|
||||
├── migrate_filedata.py # File variable value migration (latest revision)
|
||||
├── rollback_filedata.py # Rolls back a filedata migration
|
||||
├── check_var_clashing.py # Finds variable name conflicts before migration
|
||||
├── check_paths.py # Verifies folder path mapping
|
||||
├── requirements.txt # Python dependencies
|
||||
├── BATCH_NOTES.md # Deep-dive on PDM COM ChangeState3 internals
|
||||
├── README.md # This file
|
||||
│
|
||||
├── helpers/
|
||||
│ ├── batch_workflows_paths.py # Batch workflow state transitions
|
||||
│ ├── batch_copy_tree.py # Batch Copy Tree export
|
||||
│ ├── db_helper.py # Interactive SQL helper + tasks
|
||||
│ ├── test_batch_api.py # Dev-time prototype for PDM COM bridging
|
||||
│ └── queries/ # Reusable named SQL queries (.sql files)
|
||||
│
|
||||
├── documentation/ # PDM API reference (.chm files)
|
||||
└── logs/ # Migration log files
|
||||
```
|
||||
|
||||
## Configuration Setup
|
||||
|
||||
### 1. Create `config.json`
|
||||
|
||||
Copy the template and fill in your real values:
|
||||
|
||||
```bash
|
||||
cp config.json.template config.json
|
||||
```
|
||||
|
||||
The project supports both **SQL Authentication** and **Windows Authentication**:
|
||||
|
||||
```json
|
||||
{
|
||||
"source_db": {
|
||||
"driver": "{ODBC Driver 17 for SQL Server}",
|
||||
"server": "your-server",
|
||||
"database": "source-database",
|
||||
"username": "sql-user",
|
||||
"password": "sql-password",
|
||||
"trusted_connection": false
|
||||
},
|
||||
"target_db": {
|
||||
"driver": "{ODBC Driver 17 for SQL Server}",
|
||||
"server": "your-server",
|
||||
"database": "target-database",
|
||||
"trusted_connection": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Set `trusted_connection: true` to use Windows Authentication (ignores
|
||||
username/password).
|
||||
- Set `trusted_connection: false` and provide `username`/`password` for SQL
|
||||
Server Authentication.
|
||||
|
||||
### 2. Path Mapping (for folder data)
|
||||
|
||||
```json
|
||||
{
|
||||
"path_mapping": {
|
||||
"target_root_folder": "DWS",
|
||||
"case_sensitive": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- `target_root_folder`: Root folder name in your target vault. Source folder
|
||||
paths are prepended with this folder name.
|
||||
|
||||
### 3. Migration Settings
|
||||
|
||||
```json
|
||||
{
|
||||
"migration": {
|
||||
"duplicate_handling": "ignore",
|
||||
"batch_size": 500,
|
||||
"commit_interval": 10,
|
||||
"document_status_batch_size": 5000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Configuration Mapping Overrides (file data migration only)
|
||||
|
||||
```json
|
||||
{
|
||||
"configuration_mapping_overrides": {
|
||||
"165": 11250,
|
||||
"167": 11359
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Required when duplicate configuration names exist in the target database.
|
||||
Format: `"source_id": target_id`.
|
||||
|
||||
**To find duplicates:**
|
||||
```sql
|
||||
SELECT ConfigurationID, ConfigurationName
|
||||
FROM DocumentConfiguration
|
||||
WHERE ConfigurationName IN (
|
||||
SELECT ConfigurationName
|
||||
FROM DocumentConfiguration
|
||||
GROUP BY ConfigurationName
|
||||
HAVING COUNT(*) > 1
|
||||
)
|
||||
ORDER BY ConfigurationName, ConfigurationID;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Top-Level Migration Scripts
|
||||
|
||||
### Folder Data Migration
|
||||
|
||||
Migrates variable values for folders/projects (DocumentID = 1).
|
||||
|
||||
```bash
|
||||
python migrate_folderdata.py
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Maps ProjectIDs based on folder paths
|
||||
- Maps VariableIDs based on variable names
|
||||
- Migrates all revisions of folder-level variable values
|
||||
- Validates the migration
|
||||
- Creates mapping CSV files for review
|
||||
|
||||
**Output files:**
|
||||
- `mapping_projects_{timestamp}.csv`
|
||||
- `mapping_variables_{timestamp}.csv`
|
||||
- `folderdata_migration_{timestamp}.log`
|
||||
- `validation_missing_folderdata_{timestamp}.csv` (if issues)
|
||||
|
||||
---
|
||||
|
||||
### File Data Migration
|
||||
|
||||
Migrates variable values for files (ProjectID = 2, DocumentID != 1).
|
||||
|
||||
**IMPORTANT:** Only migrates the **latest revision** of each variable for
|
||||
each file configuration.
|
||||
|
||||
```bash
|
||||
python migrate_filedata.py
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Maps VariableIDs by name, DocumentIDs by full file path, ConfigurationIDs
|
||||
(with manual overrides from `config.json` if specified)
|
||||
- Fetches only the latest revision per VariableID+DocumentID+ConfigurationID
|
||||
- Inserts all records with `RevisionNo = 1`
|
||||
- Validates the migration and emits mapping CSVs
|
||||
|
||||
**You will be prompted** to confirm configuration mapping overrides before
|
||||
the migration runs — review carefully.
|
||||
|
||||
**Output files:**
|
||||
- `mapping_variables_filedata_{timestamp}.csv`
|
||||
- `mapping_documents_filedata_{timestamp}.csv`
|
||||
- `filedata_migration_{timestamp}.log`
|
||||
- `validation_missing_filedata_{timestamp}.csv` (if issues)
|
||||
- Progress files (auto-deleted on success)
|
||||
|
||||
---
|
||||
|
||||
### Rollback File Data Migration
|
||||
|
||||
```bash
|
||||
python rollback_filedata.py mapping_documents_filedata_YYYYMMDD_HHMMSS.csv
|
||||
```
|
||||
|
||||
Reads the document mapping CSV from a previous migration and deletes all
|
||||
VariableValue records for those documents from the target database.
|
||||
**Shows preview and prompts for confirmation before deleting.**
|
||||
|
||||
> **WARNING:** This permanently deletes data from the target database.
|
||||
> Always back up first.
|
||||
|
||||
---
|
||||
|
||||
## Helper Scripts (`helpers/`)
|
||||
|
||||
The helpers are live-vault tools that talk to PDM directly via the COM API,
|
||||
plus an interactive SQL runner. They are independent of the top-level
|
||||
migration scripts and can be used any time.
|
||||
|
||||
### `batch_workflows_paths.py` — Batch Workflow Transitions
|
||||
|
||||
Drives `IEdmFile13::ChangeState3` against hundreds or thousands of files at
|
||||
once, transitioning each through a named workflow transition. Implements
|
||||
escalating-backoff retries and vault reconnect to handle PDM's in-process
|
||||
DLL state corruption on large batches.
|
||||
|
||||
```bash
|
||||
python helpers/batch_workflows_paths.py -v "Drilling_Test" -c files.csv -t "AA"
|
||||
```
|
||||
|
||||
**Options:**
|
||||
- `-v, --vault` — PDM vault name
|
||||
- `-c, --csv` — Path to a text/CSV file with one full vault path per line
|
||||
(e.g. `C:\PDM\Drilling_Test\DWS\Parts\widget.sldprt`)
|
||||
- `-t, --transition` — Name of the workflow transition (e.g. `"AA"`)
|
||||
- `--comment` — Optional transition comment
|
||||
- `-u, --username` — PDM username (prompts if omitted)
|
||||
|
||||
**Output files:**
|
||||
- `batch_workflow_paths_{timestamp}.log` — detailed log
|
||||
- `failed_transitions_{timestamp}.txt` — real failures worth retrying
|
||||
- `not_available_{timestamp}.txt` — files whose transition wasn't valid
|
||||
(typically already in the target state from a prior run)
|
||||
|
||||
For implementation details on the restricted `ChangeState3` COM method and
|
||||
why it requires ctypes/comtypes vtable access, see
|
||||
[BATCH_NOTES.md](BATCH_NOTES.md).
|
||||
|
||||
---
|
||||
|
||||
### `batch_copy_tree.py` — Batch Copy Tree Export
|
||||
|
||||
Reads part numbers from a CSV, runs PDM's Copy Tree function for each, and
|
||||
exports each part's file tree to its own subfolder.
|
||||
|
||||
```bash
|
||||
python helpers/batch_copy_tree.py -c parts.csv -o "C:\Temp\Output" --vault "Drilling_Test"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `db_helper.py` — Interactive SQL Helper
|
||||
|
||||
Runs SELECT queries, multi-step tasks, and confirmed INSERTs against either
|
||||
database from `config.json`. Queries are stored as `.sql` files in
|
||||
`helpers/queries/` and referenced by name.
|
||||
|
||||
**List saved queries:**
|
||||
```bash
|
||||
python helpers/db_helper.py --list-queries
|
||||
```
|
||||
|
||||
**Run a saved query by name:**
|
||||
```bash
|
||||
python helpers/db_helper.py --db target_db --query get_var47
|
||||
```
|
||||
|
||||
**Run raw SQL (anything with a space in it is treated as a literal query):**
|
||||
```bash
|
||||
python helpers/db_helper.py --db target_db --query "SELECT TOP 10 * FROM Documents"
|
||||
```
|
||||
|
||||
**Run a predefined task:**
|
||||
```bash
|
||||
python helpers/db_helper.py --db target_db --task copy_57_to_50 --dry-run
|
||||
python helpers/db_helper.py --db target_db --task copy_57_to_50
|
||||
```
|
||||
|
||||
#### Safety features
|
||||
|
||||
- Every INSERT or UPDATE goes through `preview_and_confirm` — you see the
|
||||
SQL, the row count, and a sample of the data and must type `y` before it
|
||||
executes.
|
||||
- `--dry-run` shows the preview but skips execution entirely.
|
||||
- All writes run inside a transaction. On any per-row error you're asked
|
||||
whether to commit or rollback.
|
||||
- Every query, parameter set, and decision is logged to
|
||||
`db_helper_{timestamp}.log`.
|
||||
|
||||
#### Saved SQL Queries
|
||||
|
||||
Drop a `.sql` file into `helpers/queries/` and it becomes callable by its
|
||||
filename (without extension). Leave a comment on the first line for an
|
||||
inline description — it shows up in `--list-queries`.
|
||||
|
||||
Current queries:
|
||||
- `DWS_GET_VV-57.sql` — Documents in DWS paths that have VariableID=57
|
||||
- `DWS_VV-57_FullList.sql` — Full VariableValue rows for VV-57 in DWS paths
|
||||
- `Get_All_VV_Per_DocID.sql` — All distinct VariableIDs for a given
|
||||
DocumentID (parameterized with `?`)
|
||||
- `INSERT_VV50_Copy.sql` — Inserts a VV-50 copy of a VV-57 row
|
||||
|
||||
#### Tasks
|
||||
|
||||
Tasks are Python functions in `db_helper.py` that chain multiple queries
|
||||
and transforms together — e.g. run a SELECT, loop the results, run a
|
||||
second parameterized SELECT per row, validate, then INSERT filtered rows
|
||||
with confirmation.
|
||||
|
||||
Each task is registered in `TASK_REGISTRY` near the bottom of the file.
|
||||
Current tasks:
|
||||
|
||||
| Task | Purpose |
|
||||
|------|---------|
|
||||
| `check_vv50` | For every doc with VV-57, check whether it also has VV-50. Writes `has_vv50_{timestamp}.txt`. |
|
||||
| `copy_57_to_50` | Insert VV-50 rows mirroring existing VV-57 rows, skipping any DocumentIDs already in a `has_vv50_*.txt` file. |
|
||||
| `copy_with_new_id` | Example/template task — copy rows with a transformed ID. |
|
||||
|
||||
**Adding a new task:** write a function `def task_foo(db, args): ...` and
|
||||
add it to `TASK_REGISTRY`. The building blocks `run_select`, `load_query`,
|
||||
`preview_and_confirm`, and `run_insert` are all at the top of the file.
|
||||
|
||||
---
|
||||
|
||||
## Understanding the Logs
|
||||
|
||||
### Migration Progress
|
||||
```
|
||||
Processing batch 10/100 (500 records)...
|
||||
Batch 10 complete: inserted=450, updated=50, errors=0
|
||||
[COMMIT] Transaction committed at batch 10
|
||||
```
|
||||
|
||||
### Validation Results
|
||||
```
|
||||
==================================================
|
||||
$ Migration Validation Completed!
|
||||
==================================================
|
||||
Gross Success rate: 95.39%
|
||||
Success rate w/o Ignored Files: 100.00%
|
||||
371630 of 397043 Rows were found
|
||||
--------------------------------------------------
|
||||
MISSING ROW COUNT: 0 - See CSV output for details
|
||||
We ignored a total of 25413 rows. We couldn't map these to the TargetDB
|
||||
```
|
||||
|
||||
- **Gross Success rate** — % of all source records found in target
|
||||
- **Success rate w/o Ignored Files** — % of mappable records found (should
|
||||
be 100%)
|
||||
- **MISSING ROW COUNT** — Records that should exist but don't (should be 0)
|
||||
- **Ignored** — Records that couldn't be mapped (unmapped variables,
|
||||
documents, or configurations)
|
||||
|
||||
## Important Notes
|
||||
|
||||
### File Data Migration Behavior
|
||||
|
||||
1. **Only Latest Revisions** — File data migration only migrates the most
|
||||
recent revision of each variable for each file configuration. Historical
|
||||
revisions are not migrated.
|
||||
2. **RevisionNo Reset** — All migrated file data is inserted with
|
||||
`RevisionNo = 1` in the target database.
|
||||
3. **Configuration Mapping** — You MUST verify manual overrides in
|
||||
`config.json` before running.
|
||||
|
||||
### Progress Tracking and Resume
|
||||
|
||||
Both migration scripts support automatic resume:
|
||||
- Progress is saved every 10 batches.
|
||||
- If a migration fails, re-run the script and it will offer to resume.
|
||||
- Progress files are automatically cleaned up on success.
|
||||
|
||||
### Validation
|
||||
|
||||
All migrations include automatic validation:
|
||||
- Compares source records (after mapping) to target records using set-based
|
||||
comparison.
|
||||
- Reports any missing records to CSV.
|
||||
- Should show 100% success rate for mappable records.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Migration failed at batch X"
|
||||
Check the log file, then re-run and choose `y` to resume from the last
|
||||
checkpoint.
|
||||
|
||||
### "We ignored a total of X rows"
|
||||
Expected for unmapped variables, documents, or configurations. Check the
|
||||
mapping CSV files to see what was skipped.
|
||||
|
||||
### "MISSING ROW COUNT: X" (where X > 0)
|
||||
Indicates a real problem:
|
||||
1. Check `validation_missing_*.csv` for details.
|
||||
2. Verify ID mappings in the mapping CSV files.
|
||||
3. Check the migration log for insert errors.
|
||||
|
||||
### Configuration Mapping Issues
|
||||
If you see warnings about duplicate ConfigurationNames:
|
||||
1. Run the SQL query above to find duplicates.
|
||||
2. Determine the correct target ID for each source configuration.
|
||||
3. Add manual overrides to `config.json`.
|
||||
4. Re-run the migration.
|
||||
|
||||
### Database Connection Timeouts
|
||||
- Progress is saved automatically — re-run to resume.
|
||||
- Consider reducing `batch_size` in `config.json`.
|
||||
|
||||
### Batch Workflow Transition Failures
|
||||
|
||||
If you see `[CS3] Phase-2 access violation ...` warnings in
|
||||
`batch_workflow_paths_*.log`:
|
||||
|
||||
- The script automatically retries with escalating backoff (3s → 10s → 30s).
|
||||
- After 3 consecutive persistent failures it automatically reconnects the
|
||||
vault to reset PDM's in-process DLL state.
|
||||
- Genuine failures end up in `failed_transitions_{timestamp}.txt` — feed
|
||||
that file straight back in to retry just the failures.
|
||||
- Files that appear in `not_available_{timestamp}.txt` aren't really
|
||||
failures; they were already in the target state (e.g. from a previous
|
||||
successful run).
|
||||
|
||||
See [BATCH_NOTES.md](BATCH_NOTES.md) for full background on why
|
||||
`ChangeState3` is difficult to call and how the COM bridging works.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always back up** the target database before running migrations.
|
||||
2. **Test on a dev/test environment first**.
|
||||
3. **Review mapping CSV files** to verify ID mappings are correct.
|
||||
4. **Check validation results** — 100% success for mappable records.
|
||||
5. **Keep `config.json`** with any manual overrides for future reference.
|
||||
6. **Use `--dry-run`** with `db_helper.py` tasks before real runs.
|
||||
7. **Save the `has_vv50_*.txt` / `failed_transitions_*.txt` output files**
|
||||
— they let you incrementally mop up residual work without re-processing
|
||||
everything.
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
1. Check the log files for detailed error messages.
|
||||
2. Review the mapping CSV files to verify ID mappings.
|
||||
3. Ensure `config.json` is properly configured.
|
||||
4. Verify database connectivity and permissions.
|
||||
5. For PDM COM API internals, see [BATCH_NOTES.md](BATCH_NOTES.md).
|
||||
Reference in New Issue
Block a user