Skip to main content
Elementary’s Google Cloud Storage (GCS) integration enables streaming audit logs and system logs directly to your GCS bucket for long-term storage, analysis, and integration with other Google Cloud services.

Overview

When enabled, Elementary automatically streams your workspace’s audit logs (user activity logs and system logs) to your GCS bucket using the Google Cloud Storage API. This allows you to:
  • Store logs in your own GCS bucket for long-term retention
  • Integrate logs with BigQuery, Dataflow, or other Google Cloud analytics services
  • Maintain full control over log storage and access policies
  • Process logs using Google Cloud data processing tools
  • Archive logs for compliance and audit requirements

Prerequisites

Before configuring log streaming to GCS, you’ll need:
  1. GCS Bucket - A Google Cloud Storage bucket where logs will be stored
    • The bucket must exist and be accessible
    • You’ll need the bucket path (e.g., gs://my-logs-bucket)
  2. Google Cloud Service Account - A service account with permissions to write to the bucket
    • Required role: Storage Object User (roles/storage.objectUser)
    • You’ll need to generate a service account JSON key file
    • The service account key file must be uploaded in Elementary
    • Workload Identity Federation: Support for Workload Identity Federation with BigQuery service accounts is coming soon

Configuring Log Streaming to GCS

  1. Navigate to the Logs page:
    • Click on your account name in the top-right corner of the UI
    • Open the dropdown menu
    • Select Logs
  2. In the External Integrations section, click the Connect button
  3. In the modal that opens, select Google Cloud Storage (GCS) as your log streaming destination
  4. Enter your GCS configuration:
    • Bucket Path: The full GCS bucket path (e.g., gs://my-logs-bucket)
    • Service Account Key File: Upload your Google Cloud service account JSON key file
      • To generate a service account key file:
        1. Go to Google Cloud Console > IAM & Admin > Service Accounts
        2. Select your service account (or create a new one)
        3. Click the three dots menu and select “Manage keys”
        4. Click “ADD KEY” and select “Create new key”
        5. Choose “JSON” format and click “CREATE”
        6. The JSON file will be downloaded automatically
  5. Click Save to enable log streaming
The log streaming configuration applies to your entire workspace. Both user activity logs and system logs will be streamed to your GCS bucket in batches.

Log Batching

Logs are automatically batched and written to GCS files based on the following criteria:
  • Time-based batching: A new file is created every 15 minutes
  • Size-based batching: A new file is created when the batch reaches 100MB
Whichever condition is met first triggers a new file to be created. This ensures efficient storage while maintaining reasonable file sizes for processing.

File Path Format

Logs are stored at the root of your bucket using a Hive-based partitioning structure for efficient querying and organization:
log_type={log_type}/date={YYYY-MM-DD}/hour={HH}/file_{timestamp}_{batch_id}.ndjson
Where:
  • {log_type}: Either audit (for user activity logs) or system (for system logs)
  • {YYYY-MM-DD}: Date in ISO format (e.g., 2024-01-15)
  • {HH}: Hour in 24-hour format (e.g., 14)
  • {timestamp}: Unix timestamp when the file was created
  • {batch_id}: Unique identifier for the batch

Example File Paths

log_type=audit/date=2024-01-15/hour=14/file_1705320000_batch_abc123.ndjson
log_type=system/date=2024-01-15/hour=14/file_1705320900_batch_def456.ndjson
This Hive-based structure allows you to:
  • Efficiently query logs by date and hour using BigQuery or other tools
  • Filter logs by type (audit or system)
  • Process logs in parallel by partition

Log Format

Logs are stored as line-delimited JSON (NDJSON), where each line represents a single log entry as a JSON object.

User Activity Logs

Each user activity log entry includes:
{
  "timestamp": "2024-01-15T14:30:45.123456Z",
  "log_type": "audit",
  "action": "user_login",
  "success": true,
  "user": {
    "id": "usr_abcdef1234567890",
    "email": "[email protected]",
    "name": "John Doe"
  },
  "env_id": "env_7890123456abcdef",
  "env_name": "Production",
  "data": {
    "additional": "context"
  }
}

System Logs

Each system log entry includes:
{
  "timestamp": "2024-01-15T14:30:45.123456Z",
  "log_type": "system",
  "action": "dbt_data_sync_completed",
  "success": true,
  "env_id": "env_7890123456abcdef",
  "env_name": "Production",
  "data": {
    "environment_id": "env_789",
    "environment_name": "Production"
  }
}

Field Descriptions

  • timestamp: ISO 8601 timestamp of the event (UTC)
  • log_type: Either "audit" for user activity logs or "system" for system logs
  • action: The specific action that was performed (e.g., user_login, create_test, dbt_data_sync_completed)
  • success: Boolean indicating whether the action completed successfully
  • user: User information (only present in audit logs)
    • id: User ID
    • email: User email address
    • name: User display name
  • env_id: Environment identifier (empty string for account-level actions)
  • env_name: Environment name (empty string for account-level actions)
  • data: Additional context-specific information as a JSON object

Disabling Log Streaming

To disable log streaming to GCS:
  1. Navigate to the Logs page
  2. In the External Integrations section, find your GCS integration
  3. Click Disable or remove the GCS configuration
  4. Confirm the action
Disabling log streaming will stop sending new logs to GCS immediately. Historical logs already written to GCS will remain in your bucket.