Google BigQuery Setup

Connect Google BigQuery to Klairr with a read-only service account.

This guide connects Google BigQuery to Klairr. The connection uses a dedicated read-only service account — Klairr never has write access to your data, and every query is cost-estimated before execution.

Prerequisites

  • A Google Cloud project with BigQuery enabled.
  • At least one dataset with tables you want to query.
  • Permission to create service accounts and manage IAM roles in the project (typically roles/iam.serviceAccountAdmin and roles/iam.securityAdmin, or Project Owner).

Choose a connection method

MethodWhen to use
Public endpoint (default)The BigQuery API is reachable from Klairr’s egress with no extra configuration. This is the standard path.
VPC Service ControlsYour project enforces a perimeter on BigQuery. Add Klairr’s egress range to the perimeter’s allowed sources. See Network Access.

Unlike self-hosted databases, BigQuery has no private-network setup — the API is accessed over Google’s public endpoints.

Step 1: Create a service account

  1. In the Google Cloud Console, select your project.
  2. Navigate to IAM & Admin → Service Accounts.
  3. Click Create Service Account.
  4. Name it (e.g. klairr-reader) and click Create and Continue.

Step 2: Grant minimum permissions

Assign these roles to the service account:

RoleWhy it’s needed
BigQuery Data Viewer (roles/bigquery.dataViewer)Read access to tables and views. Can be scoped to a single dataset for tighter access.
BigQuery Job User (roles/bigquery.jobUser)Permission to run query jobs (issued at the project level).

Klairr explicitly does not request BigQuery Admin, Data Editor, Data Owner, Job Admin, or any roles/owner-level access.

For tighter scoping, grant Data Viewer at the dataset level (not the project level) — open the dataset in BigQuery Studio, click Sharing → Permissions, and add the service account there. Job User still needs to be project-level because BigQuery jobs are project-scoped.

Step 3: Create and download a JSON key

  1. Click the new service account.
  2. Go to the Keys tab → Add Key → Create new key.
  3. Select JSON format and download the file.

Keep this file secure — it grants access to your BigQuery data. You’ll paste it into Klairr in Step 4.

Step 4: Add the connector in Klairr

Open Settings → Connectors → Add connector, pick Google BigQuery, and enter:

FieldExampleNotes
GCP Project IDacme-analytics-prodThe project that owns the dataset. Auto-filled when you upload the JSON key.
Dataset IDwarehouseThe dataset Klairr should query.
Service Account JSON(uploaded file)Stored encrypted; never logged.
Value-catalog probingOn-demand (recommended)See below.

Value-catalog probing

Klairr can sample low-cardinality STRING columns at setup time so the LLM has known values from day one (faster fuzzy matching), or it can probe each column on first use (no upfront cost).

  • On-demand (default) — INFORMATION_SCHEMA only at setup; columns are probed the first time a question references them. Effectively zero upfront cost.
  • Pre-warm at setup — every enum-candidate STRING column is sampled at connect time. Costs ~256 MB billed per probed column. Choose this if you want fuzzy matches working immediately.

Setup tests

When you click Connect, Klairr runs the following checks. Each failure maps to the troubleshooting entry below.

  1. JSON key shape — the uploaded file is a valid service-account JSON.
  2. Authentication — the service account exists and the key is current.
  3. Project access — the account can list datasets in the project.
  4. Dataset access — the configured dataset is visible.
  5. Read access — the account has Data Viewer on at least one table in the dataset.
  6. Job submission — the account has Job User and can submit a dry-run query.
  7. Schema introspection — Klairr enumerates tables, columns, and types from INFORMATION_SCHEMA.

What Klairr queries

  • Schema introspection runs against INFORMATION_SCHEMA.TABLES and .COLUMNS — no row data leaves your warehouse during introspection.
  • Question answering issues plain SELECT statements with a server-enforced LIMIT. Every emitted query is validated against an allow-list before execution: INSERT, UPDATE, DELETE, MERGE, CREATE, DROP, ALTER, TRUNCATE, EXPORT, LOAD, CALL, GRANT, REVOKE, ASSERT, and SELECT … INTO are all rejected at the application layer.
  • Every query runs as a BigQuery dry-run first to estimate bytes-scanned + cost; the dry-run gates the real execution against your spend limits.

Cost & spend controls

BigQuery bills on bytes scanned. Klairr ships with cost controls layered on top:

  • Per-query estimate — bytes-to-be-processed + estimated USD shown before every execution (uses BigQuery’s on-demand price of $5/TB).
  • Per-query limit — maximum bytes scanned for any single query (admin-configurable).
  • Daily limit — total bytes scanned per user per day (admin-configurable).
  • Automatic LIMIT injection — every query Klairr emits ends with a LIMIT clause to prevent full-table scans by accident.

Configure these in Settings → Data sources for each connector.

Troubleshooting

  • “Permission denied” on connect — the service account is missing Data Viewer or Job User. Verify both roles are present and Job User is at the project level.
  • “Dataset not found”Data Viewer is at the wrong scope, the dataset is in a different region than the project default, or the project ID has a typo. Confirm in BigQuery Studio that the service account can SELECT * FROM <dataset>.INFORMATION_SCHEMA.TABLES LIMIT 1.
  • Schema not loading — large datasets (100+ tables) take longer to introspect. Wait, then refresh. Persistent failures usually mean Data Viewer is dataset-scoped but INFORMATION_SCHEMA access requires project- or dataset-level grants.
  • Query timed out — query plan is exceeding BigQuery’s slot allocation. Create views or materialised views for commonly queried tables. If the underlying tables are partitioned, ensure your question (or the WHERE clause Klairr generates) constrains the partition column.
  • Cost estimate exceeds spend limit — Klairr blocks execution when the dry-run estimate exceeds the per-query or daily cap. Either narrow the question, or ask an admin to raise the cap in Settings → Data sources.

What’s next

Need help? Contact support · Start free