Skip to main content

What is Data Ingestion?

gu1’s Data Ingestion system allows you to seamlessly import data from any source (CSV files, APIs, databases, or custom formats) by defining custom schemas and field mappings. This intelligent mapping system ensures your data is properly structured for risk analysis.

How It Works

1

Define Your Schema

Create a custom schema that describes your data structure with field definitions, types, and validation rules.
2

Map Fields

Create field mappings that translate your data fields to gu1’s unified entity model.
3

Transform Data

Apply transformations (formatting, calculations, conditionals) as data flows through the mapping.
4

Import Entities

Use the mapped schema to create entities via API or bulk upload.

Key Features

Schema Types

gu1 supports multiple schema types for different data sources:
TypeDescriptionUse Case
databaseRelational database schemasDirect database integration
apiAPI response structuresThird-party API integration
fileFile formats (CSV, JSON, XML)File-based imports
customCustom data structuresProprietary formats

Schema Categories

Organize schemas by business domain:
Bank accounts, transactions, financial statements, payment data
Personal information, identity documents, KYC data
Sanctions lists, PEPs, adverse media, regulatory data
Payment transactions, wire transfers, transaction history
Any other type of structured data

Field Types

Supported field types for schema definition:
TypeDescriptionExample
stringText data”Acme Corp”, β€œ[email protected]”
numberNumeric values1000, 99.99, -50
booleanTrue/falsetrue, false
dateDate/timestamp”2025-10-03T12:00:00Z”
arrayList of values[β€œtag1”, β€œtag2”]
objectNested structure{"city": "NYC", "country": "US"}

Transformation Types

Apply transformations during field mapping:

Direct

Copy field as-is with no changes

Calculate

Perform mathematical calculations

Format

Format strings, dates, numbers

Conditional

Apply if/then logic based on conditions

Lookup

Look up values from reference tables

Custom

Custom JavaScript expressions

Validation Rules

Ensure data quality with built-in validations:
{
  "constraints": {
    "minLength": 5,
    "maxLength": 100,
    "pattern": "^[A-Z0-9]+$",
    "enum": ["active", "inactive", "pending"]
  }
}
Available Constraints:
  • minLength / maxLength - String length limits
  • min / max - Numeric value ranges
  • pattern - Regular expression validation
  • enum - Allowed values list
  • required - Field is mandatory

Example: Banking Data Schema

Here’s a complete example of defining a schema for banking customer data:
{
  "name": "Banking Customer Data",
  "version": "1.0.0",
  "type": "database",
  "category": "financial",
  "schemaData": {
    "fields": [
      {
        "name": "customer_id",
        "type": "string",
        "required": true,
        "description": "Unique customer identifier",
        "constraints": {
          "pattern": "^CUST[0-9]{8}$"
        }
      },
      {
        "name": "full_name",
        "type": "string",
        "required": true,
        "description": "Customer full legal name",
        "constraints": {
          "minLength": 2,
          "maxLength": 200
        }
      },
      {
        "name": "account_balance",
        "type": "number",
        "required": false,
        "description": "Current account balance in USD",
        "constraints": {
          "min": 0
        }
      },
      {
        "name": "risk_level",
        "type": "string",
        "required": true,
        "description": "Risk classification",
        "constraints": {
          "enum": ["low", "medium", "high", "critical"]
        }
      },
      {
        "name": "onboarding_date",
        "type": "date",
        "required": true,
        "description": "Date customer was onboarded"
      },
      {
        "name": "kyc_verified",
        "type": "boolean",
        "required": true,
        "description": "Whether KYC verification is complete"
      }
    ],
    "metadata": {
      "sourceFormat": "database",
      "encoding": "UTF-8"
    }
  }
}

Best Practices

  • Use descriptive field names that match your source data
  • Include detailed descriptions for complex fields
  • Set appropriate validation constraints
  • Version your schemas (1.0.0, 1.1.0, etc.)
  • Start with direct mappings, add transformations as needed
  • Test mappings with sample data before bulk import
  • Document custom transformation logic
  • Handle null/missing values gracefully
  • Validate data at the source before importing
  • Use strict mode for production environments
  • Monitor failed imports and validation errors
  • Implement data cleansing for known issues
  • Use bulk processing for large datasets (>1000 records)
  • Set appropriate batch sizes (100-1000 records)
  • Schedule imports during off-peak hours
  • Monitor processing times and adjust batch sizes

Common Use Cases

API Endpoints

Next Steps

1

Create Your First Schema

Follow the Custom Schemas guide to define your data structure
2

Map Your Fields

Learn how to map fields to gu1’s model in the Field Mappings guide
3

Import Data

Start importing entities using the Entities API