🇳🇵 Nepal Entity Service

Open Source, open data, and open API for Nepali public entities

Migration Maintainer Guide

This guide is for maintainers who review migration pull requests and execute migrations to update the Nepal Entity Service database. You'll learn how to review migrations, execute them safely, and troubleshoot common issues.

Table of Contents

  1. Introduction
  2. Maintainer Responsibilities
  3. Reviewing Migration Pull Requests
  4. Executing Migrations
  5. CI/CD Workflows
  6. Troubleshooting
  7. Database Repository Management
  8. Best Practices

Introduction

What is a Maintainer?

Maintainers are trusted individuals who:
- Review migration pull requests for quality and correctness
- Execute approved migrations to update the database
- Manage the Database Repository
- Ensure data integrity and quality
- Monitor CI/CD workflows

Two-Repository Architecture

The system operates across two repositories:

  1. Service API Repository (NepalEntityService)
    - Contains application code and migration scripts
    - Contributors submit PRs here
    - Lightweight (~10MB)

  2. Database Repository (nes-db)
    - Contains entity/relationship JSON files (100k-1M files)
    - Managed as Git submodule at nes-db/
    - Large repository (~1GB+)
    - Modified by migration execution

Key Concept: Contributors submit migration code to the Service API Repository. Maintainers review and merge the code, then execute migrations which modify files in the Database Repository.


Maintainer Responsibilities

Code Review

Data Quality

Execution

Documentation


Reviewing Migration Pull Requests

Step 1: Initial Review

When a migration PR is submitted, check:

PR Structure:
- [ ] Migration folder follows naming convention (NNN-descriptive-name)
- [ ] Contains migrate.py with required metadata
- [ ] Contains README.md with documentation
- [ ] Data files are included if needed

Code Quality:
- [ ] Migration script has AUTHOR, DATE, DESCRIPTION metadata
- [ ] Implements async migrate(context) function
- [ ] Includes error handling
- [ ] Uses appropriate author ID
- [ ] Logs progress messages

Documentation:
- [ ] README.md explains purpose clearly
- [ ] Data sources are cited with URLs
- [ ] Changes are documented
- [ ] Dependencies are noted

Step 2: Data Source Verification

Verify the data sources:

## Checklist for Data Sources

- [ ] Sources are credible (government, official records, reputable organizations)
- [ ] URLs are accessible and valid
- [ ] Data matches the source (spot check)
- [ ] Date of data collection is noted
- [ ] Licensing allows use (public domain, open data, etc.)

Step 3: Code Review

Review the migration script:

Safety Checks:

# ✓ Good: Checks if entity exists before creating
entity = await context.search.find_entity_by_name(name, type)
if not entity:
    await context.publication.create_entity(...)

# ✗ Bad: Creates without checking (may cause duplicates)
await context.publication.create_entity(...)

Error Handling:

# ✓ Good: Handles errors gracefully
try:
    await context.publication.create_entity(...)
except Exception as e:
    context.log(f"Failed: {e}")
    # Continue processing

# ✗ Bad: No error handling (migration crashes on first error)
await context.publication.create_entity(...)

Data Quality:

# ✓ Good: Validates data before processing
if not row.get("name_en") or not row.get("name_ne"):
    context.log(f"Skipping row with missing names: {row}")
    continue

# ✗ Bad: No validation (may create incomplete entities)
entity_data = {"names": [{"en": {"full": row["name_en"]}}]}

Step 4: Local Testing

Test the migration locally before merging:

# Pull the PR branch
git fetch origin pull/123/head:pr-123
git checkout pr-123

# Update database submodule
git submodule update

# Run migration
nes migration run NNN-migration-name

# Review output for errors or warnings

What to Check:
- Migration executes without errors
- Entities are created correctly
- Relationships are valid
- No duplicate entities
- Logs are informative

Step 5: Request Changes or Approve

If changes are needed:

Thanks for the contribution! A few items to address:

1. **Data Source**: Please add a URL to the official government source
2. **Error Handling**: Add try/except around entity creation (line 45)
3. **Validation**: Check for empty names before creating entities
4. **Documentation**: Add the date when data was collected

Let me know if you have questions!

If approved:

Looks great! The migration is well-structured and data sources are verified.

I'll merge this and execute the migration.

Step 6: Merge the PR

Once approved:

# Merge via GitHub UI or command line
git checkout main
git merge pr-123
git push origin main

Executing Migrations

Step 1: Check Pending Migrations

# See all pending migrations
nes migration list --pending

# Output:
# Pending migrations:
#   005-add-cabinet-ministers
#   006-update-party-leadership

Step 2: Execute Single Migration

# Run specific migration
nes migration run 005-add-cabinet-ministers

# Output:
# Running migration: 005-add-cabinet-ministers
# [Migration] Processing 25 entities...
# [Migration] Created 25 entities
# [Migration] Migration completed
# ✓ Migration completed in 12.3s
# ✓ Committed to database repository
# ✓ Pushed to remote

What Happens:
1. Migration script executes
2. Entities/relationships are created in nes-db/v2/
3. Changes are committed to Database Repository with metadata
4. Commit is pushed to remote
5. Submodule reference is updated in Service API Repository

Step 3: Execute All Pending Migrations

# Run all pending migrations in order
nes migration run --all

# Output:
# Running 2 pending migrations...
# 
# [1/2] Running: 005-add-cabinet-ministers
# ✓ Completed in 12.3s
# 
# [2/2] Running: 006-update-party-leadership
# ✓ Completed in 8.7s
# 
# All migrations completed successfully

Step 4: Verify Results

# Check database repository
cd nes-db
git log -3

# Output:
# commit abc123...
# Author: Migration System
# Date: 2024-03-15
# 
#     Migration: 005-add-cabinet-ministers
#     
#     Import current cabinet ministers from official records
#     
#     Author: contributor@example.com
#     Date: 2024-03-15
#     Entities created: 25
#     Relationships created: 25
#     Duration: 12.3s

# Check created files
ls v2/entity/person/ | tail -5

Step 5: Update Service API Repository

# Return to main repository
cd ..

# Commit submodule update
git add nes-db
git commit -m "Update database submodule after migration 005"
git push origin main

CI/CD Workflows

The migration system includes two GitHub Actions workflows:

Migration Preview Workflow

Trigger: Pull request to migrations/ directory

Purpose: Preview migration results before merging

What it does:
1. Checks out PR branch
2. Runs migrations in isolated environment
3. Generates statistics (entities/relationships created)
4. Posts comment on PR with results

Example Comment:

## Migration Preview: 005-add-cabinet-ministers

✓ Migration executed successfully

### Statistics
- Entities created: 25
- Relationships created: 25
- Duration: 12.3s

### Logs
[View full logs](https://github.com/.../actions/runs/123)

Migration Persistence Workflow

Triggers:
- PR merge to main branch (if migrations/ changed)
- Daily schedule (2 AM UTC)

Purpose: Automatically execute pending migrations

What it does:
1. Checks for pending migrations
2. Executes pending migrations
3. Commits to Database Repository
4. Pushes to remote
5. Updates submodule reference

Monitoring:

# Check workflow status on GitHub
# https://github.com/NewNepal-org/NepalEntityService/actions

# View workflow logs for details

Manual Trigger:

# Trigger workflow manually via GitHub UI
# Actions → Migration Persistence → Run workflow


Troubleshooting

Issue 1: Migration Fails with Validation Error

Symptom:

ValueError: Entity must have at least one name with kind='PRIMARY'

Solution:
1. Review migration script for data validation
2. Check input data files for missing fields
3. Add validation before entity creation:

if not row.get("name_en"):
    context.log(f"Skipping row with missing name: {row}")
    continue

Issue 2: Duplicate Entity Error

Symptom:

ValueError: Entity with slug 'ram-sharma' already exists

Solution:
1. Check if entity exists before creating:

from nes.core.identifiers import build_entity_id

entity_id = build_entity_id("person", None, "ram-sharma")
existing = await context.db.get_entity(entity_id)

if existing:
    # Update instead of create
    await context.publication.update_entity(...)
else:
    await context.publication.create_entity(...)

Issue 3: Migration Already Applied

Symptom:

Migration already applied, skipping

Explanation: The migration was already executed and committed to the Database Repository. The system detects the persisted snapshot and skips re-execution.

Solution:
- This is expected behavior (ensures determinism)
- To re-run, delete the migration log first: rm -rf nes-db/v2/migration-logs/NNN-name/
- For production, create a new migration instead

Issue 4: Large Commit Performance

Symptom: Migration creates 10,000+ files and Git commit is slow

Solution: The system automatically batches commits:

# Automatic batching (no action needed)
# Commits are split into batches of 1000 files each
# Each batch gets its own commit with metadata

Manual Optimization:

# Configure Git for large repositories
cd nes-db
git config core.preloadindex true
git config core.fscache true
git config gc.auto 256

Issue 5: Relationship Target Not Found

Symptom:

ValueError: Target entity 'entity:organization/...' does not exist

Solution:
1. Verify target entity exists:

target = await context.db.get_entity(target_id)
if not target:
    context.log(f"Target entity not found: {target_id}")
    continue
  1. Check migration dependencies in README.md
  2. Ensure prerequisite migrations are executed first

Issue 6: Git Push Fails

Symptom:

error: failed to push some refs to 'https://github.com/...'

Solution:
1. Pull latest changes:

cd nes-db
git pull origin main
  1. Resolve conflicts if any
  2. Push again:
git push origin main

Issue 7: Submodule Out of Sync

Symptom: Service API repo shows outdated submodule reference

Solution:

# Update submodule to latest
git submodule update --remote nes-db

# Commit the update
git add nes-db
git commit -m "Update database submodule"
git push origin main


Database Repository Management

Cloning for Maintainers

Full Clone (for migration execution):

# Clone with submodules
git clone --recurse-submodules https://github.com/NewNepal-org/NepalEntityService.git
cd NepalEntityService

# This downloads the full database (~1GB+)

Shallow Clone (for development):

# Clone with limited history
git clone --depth 1 --recurse-submodules https://github.com/NewNepal-org/NepalEntityService.git

Database Repository Optimization

For Large Repositories:

cd nes-db

# Configure Git for performance
git config core.preloadindex true
git config core.fscache true
git config gc.auto 256
git config pack.threads 4

# Periodic maintenance
git gc --aggressive

Backup Strategy

Automated Backups:
- GitHub provides automatic backups
- Git history serves as complete backup
- All changes are versioned and recoverable

Manual Backup:

# Clone database repository
git clone https://github.com/NewNepal-org/NepalEntityService-database.git backup-$(date +%Y%m%d)

# Or create archive
cd nes-db
git archive --format=tar.gz --output=../backup-$(date +%Y%m%d).tar.gz HEAD

Rollback Procedure

Rollback Single Migration:

cd nes-db

# Find migration commit
git log --grep="Migration: 005-add-cabinet-ministers"

# Revert the commit
git revert <commit-sha>

# Push revert
git push origin main

Rollback Multiple Migrations:

# Revert last 3 migrations
git revert HEAD~2..HEAD

# Or reset to specific commit (destructive)
git reset --hard <commit-sha>
git push --force origin main  # Use with caution!


Best Practices

1. Review Before Merging

Never merge a migration PR without:
- Local testing
- Data source verification
- Code review
- Documentation check

2. Execute Migrations Promptly

3. Monitor Database Size

# Check database size
cd nes-db
du -sh .

# Check entity counts
find v2/entity -type f | wc -l
find v2/relationship -type f | wc -l

4. Communicate with Contributors

5. Document Issues

When encountering issues:
- Document the problem and solution
- Update this guide if needed
- Share knowledge with other maintainers

6. Regular Maintenance

Weekly:
- Review pending migrations
- Check CI/CD workflow status
- Monitor database repository size

Monthly:
- Review migration history
- Audit data quality
- Update documentation
- Optimize database repository

7. Security Considerations

8. Coordinate with Other Maintainers


Migration Execution Checklist

Use this checklist when executing migrations:

## Pre-Execution

- [ ] Migration PR is approved and merged
- [ ] Local repository is up to date
- [ ] Database submodule is up to date
- [ ] No pending changes in database repository

## Execution

- [ ] Run `nes migration list --pending` to see pending migrations
- [ ] Run `nes migration run <name>` or `--all`
- [ ] Monitor output for errors
- [ ] Verify entities/relationships/versions created
- [ ] Review migration log in `nes-db/v2/migration-logs/`

## Post-Execution

- [ ] Check database repository commit
- [ ] Verify commit message includes metadata
- [ ] Confirm push to remote succeeded
- [ ] Update submodule reference in Service API repo
- [ ] Verify CI/CD workflow completed (if applicable)
- [ ] Spot-check created entities for quality

## Documentation

- [ ] Update migration status in tracking system (if any)
- [ ] Document any issues encountered
- [ ] Communicate completion to contributor

Additional Resources


Support

For maintainer questions or issues:

  1. Check this guide and other documentation
  2. Review existing migrations for examples
  3. Consult with other maintainers
  4. Open an issue for bugs or unclear documentation

Last Updated: 2024
Version: 2.0