This guide connects Google BigQuery to Klairr. The connection uses a dedicated read-only service account — Klairr never has write access to your data, and every query is cost-estimated before execution.
Prerequisites
- A Google Cloud project with BigQuery enabled.
- At least one dataset with tables you want to query.
- Permission to create service accounts and manage IAM roles in the project (typically
roles/iam.serviceAccountAdminandroles/iam.securityAdmin, or Project Owner).
Choose a connection method
| Method | When to use |
|---|---|
| Public endpoint (default) | The BigQuery API is reachable from Klairr’s egress with no extra configuration. This is the standard path. |
| VPC Service Controls | Your project enforces a perimeter on BigQuery. Add Klairr’s egress range to the perimeter’s allowed sources. See Network Access. |
Unlike self-hosted databases, BigQuery has no private-network setup — the API is accessed over Google’s public endpoints.
Step 1: Create a service account
- In the Google Cloud Console, select your project.
- Navigate to IAM & Admin → Service Accounts.
- Click Create Service Account.
- Name it (e.g.
klairr-reader) and click Create and Continue.
Step 2: Grant minimum permissions
Assign these roles to the service account:
| Role | Why it’s needed |
|---|---|
BigQuery Data Viewer (roles/bigquery.dataViewer) | Read access to tables and views. Can be scoped to a single dataset for tighter access. |
BigQuery Job User (roles/bigquery.jobUser) | Permission to run query jobs (issued at the project level). |
Klairr explicitly does not request BigQuery Admin, Data Editor, Data Owner, Job Admin, or any roles/owner-level access.
For tighter scoping, grant Data Viewer at the dataset level (not the project level) — open the dataset in BigQuery Studio, click Sharing → Permissions, and add the service account there. Job User still needs to be project-level because BigQuery jobs are project-scoped.
Step 3: Create and download a JSON key
- Click the new service account.
- Go to the Keys tab → Add Key → Create new key.
- Select JSON format and download the file.
Keep this file secure — it grants access to your BigQuery data. You’ll paste it into Klairr in Step 4.
Step 4: Add the connector in Klairr
Open Settings → Connectors → Add connector, pick Google BigQuery, and enter:
| Field | Example | Notes |
|---|---|---|
| GCP Project ID | acme-analytics-prod | The project that owns the dataset. Auto-filled when you upload the JSON key. |
| Dataset ID | warehouse | The dataset Klairr should query. |
| Service Account JSON | (uploaded file) | Stored encrypted; never logged. |
| Value-catalog probing | On-demand (recommended) | See below. |
Value-catalog probing
Klairr can sample low-cardinality STRING columns at setup time so the LLM has known values from day one (faster fuzzy matching), or it can probe each column on first use (no upfront cost).
- On-demand (default) — INFORMATION_SCHEMA only at setup; columns are probed the first time a question references them. Effectively zero upfront cost.
- Pre-warm at setup — every enum-candidate
STRINGcolumn is sampled at connect time. Costs ~256 MB billed per probed column. Choose this if you want fuzzy matches working immediately.
Setup tests
When you click Connect, Klairr runs the following checks. Each failure maps to the troubleshooting entry below.
- JSON key shape — the uploaded file is a valid service-account JSON.
- Authentication — the service account exists and the key is current.
- Project access — the account can list datasets in the project.
- Dataset access — the configured dataset is visible.
- Read access — the account has
Data Vieweron at least one table in the dataset. - Job submission — the account has
Job Userand can submit a dry-run query. - Schema introspection — Klairr enumerates tables, columns, and types from
INFORMATION_SCHEMA.
What Klairr queries
- Schema introspection runs against
INFORMATION_SCHEMA.TABLESand.COLUMNS— no row data leaves your warehouse during introspection. - Question answering issues plain
SELECTstatements with a server-enforcedLIMIT. Every emitted query is validated against an allow-list before execution:INSERT,UPDATE,DELETE,MERGE,CREATE,DROP,ALTER,TRUNCATE,EXPORT,LOAD,CALL,GRANT,REVOKE,ASSERT, andSELECT … INTOare all rejected at the application layer. - Every query runs as a BigQuery dry-run first to estimate bytes-scanned + cost; the dry-run gates the real execution against your spend limits.
Cost & spend controls
BigQuery bills on bytes scanned. Klairr ships with cost controls layered on top:
- Per-query estimate — bytes-to-be-processed + estimated USD shown before every execution (uses BigQuery’s on-demand price of $5/TB).
- Per-query limit — maximum bytes scanned for any single query (admin-configurable).
- Daily limit — total bytes scanned per user per day (admin-configurable).
- Automatic LIMIT injection — every query Klairr emits ends with a
LIMITclause to prevent full-table scans by accident.
Configure these in Settings → Data sources for each connector.
Troubleshooting
- “Permission denied” on connect — the service account is missing
Data ViewerorJob User. Verify both roles are present andJob Useris at the project level. - “Dataset not found” —
Data Vieweris at the wrong scope, the dataset is in a different region than the project default, or the project ID has a typo. Confirm in BigQuery Studio that the service account canSELECT * FROM <dataset>.INFORMATION_SCHEMA.TABLES LIMIT 1. - Schema not loading — large datasets (100+ tables) take longer to introspect. Wait, then refresh. Persistent failures usually mean
Data Vieweris dataset-scoped butINFORMATION_SCHEMAaccess requires project- or dataset-level grants. - Query timed out — query plan is exceeding BigQuery’s slot allocation. Create views or materialised views for commonly queried tables. If the underlying tables are partitioned, ensure your question (or the
WHEREclause Klairr generates) constrains the partition column. - Cost estimate exceeds spend limit — Klairr blocks execution when the dry-run estimate exceeds the per-query or daily cap. Either narrow the question, or ask an admin to raise the cap in Settings → Data sources.
Related
- Network Access — egress IPs and VPC Service Controls posture.
- Roles & Permissions — workspace-level access controls.
- Supported Data Sources — every database and SaaS Klairr connects to.
- Security & Data Handling — encryption, logging, what we do and don’t store.
What’s next
- Ask your first question — get a real answer from your data.
- AI Memory — the organisational knowledge layer Klairr builds and you refine.