Dataset

Account-scoped tables for capturing node outputs across many workflows and scoring them over time.

The Dataset node is where Solaris AI Flow remembers outcomes. Where the Storage node is a small per-workflow dedup cache, a Dataset is an append-heavy table scoped to your whole account, so many workflows can write into one dataset and a separate workflow can read across all of them.

This is the primitive behind deterministic scoring. A trading workflow appends each closed trade's outcome; a nightly scoring workflow aggregates the last 30 days to rank strategies. Because rows persist long enough and every workflow you own can write to the same dataset, you can score across your entire account, not just one flow.

Storage vs Dataset

Pick the right tool:

Storage is a key/value + list cache scoped to a single workflow, with mandatory short TTLs. Use it for dedup ("have I seen this mint?") and small per-workflow state.
Dataset is a structured, queryable table scoped to your account, with long retention. Use it to capture outputs over time and compute scores, counts, and averages across workflows.

If you find yourself trying to log a growing history into Storage, or trying to read one workflow's Storage from another, you want a Dataset.

Prerequisites

No API key required
Auto-capture (passive logging) requires a Pro or Ultra plan

Operations

Operation	Purpose	Required fields
Append Row	Add a new row	dataset, row
Upsert Row	Update the row matching a field, else append	dataset, row, match field, match value
Query	Return rows matching filters	dataset
Aggregate	Compute count / sum / avg / min / max, optionally grouped	dataset, aggregate op (+ field)
Delete Rows	Remove rows matching filters	dataset

Append Row

Adds a row to the dataset, creating the dataset on first write. The row is a JSON object of the small, queryable fields you want to keep, for example {"mint":"{token.mint}","pnl":1.2,"strategy":"sniper"}. Each row is automatically stamped with a creation time and the ID of the workflow that wrote it, so a single dataset can attribute rows back to multiple source workflows.

Keep rows to scalars (strings, numbers, booleans) you actually filter or aggregate on. The row is what lives in the queryable store, so smaller rows mean more history under your plan's size cap.

Upsert Row

Like Append, but first looks for an existing row whose match field equals the resolved match value. If found, that row is overwritten; otherwise a new row is appended. Use this for "one row per token" style datasets where re-processing the same token should update rather than duplicate.

To guarantee a unique match, upsert scans every row, so it is only available while a dataset stays within the Query / aggregate scan window from Plan limits (3,000 rows on Pro, 5,000 on Ultra). This is far below the per-dataset row cap: a Pro dataset holds up to 100,000 rows, but upsert stops working past ~3,000. Once a dataset grows past the scan window, upsert errors rather than risk silently inserting a duplicate it couldn't detect; switch to Append, lower the dataset's retention, or split it. (Indexed keys for unbounded upsert are planned.)

Query

Returns rows matching an optional filter array, newest first by default. Filters are a JSON array like [{"field":"pnl","op":"gt","value":0}]. Supported ops are eq, ne, gt, gte, lt, lte, and contains. Ordering and comparison filters operate on numeric fields. Results are capped by the limit field (default 200, max 1000) and the per-plan scan window.

Field names are top-level only. Filter field, group by, and the aggregate field all read a row's top-level keys. A dot path like trade.pnl is treated as the literal key "trade.pnl", not a nested lookup, so a nested value silently won't match. Keep the fields you score on flat (the way auto-capture writes them), e.g. store pnl rather than {"trade":{"pnl":1.2}}.

Aggregate

The scoring engine. Computes count, sum, avg, min, or max over the rows matching your filters. With a group by field, it returns one bucket per distinct value, for example average PnL grouped by strategy. count needs no field; the other ops require a numeric field. This is how you turn a long log of outcomes into a single deterministic score.

Delete Rows

Removes rows matching the filters, operating over the same recent window as Query and Aggregate (the most recent rows, newest first) and deleting up to a batch per run. The hasMore flag in the response indicates more matches remain; re-run to keep draining. For routine expiry use retention, and to wipe an entire dataset use the Delete button in the Datasets manager rather than this node.

Configuration

Field	Type	Required	Description
Node Label	string	No	Display name shown on the canvas
Operation	enum	Yes	One of the operations above
Dataset	string	Yes	Dataset name (max 128 chars). Allowed characters: letters, digits, `_ - : .` Created on first write
Row	JSON object	For Append / Upsert	The queryable fields to store. Templates resolved before parsing
Match Field / Match Value	string	For Upsert	The field and value used to find an existing row
Filters	JSON array	No	`[{field, op, value}]` for Query / Aggregate / Delete
Aggregate / Field / Group By	enum + string	For Aggregate	The reducer, the numeric field, and an optional group-by field
Limit / Order	number / enum	No (Query)	Result cap and sort direction over creation time
Retention Days	number	No	Applied when the dataset is first created. Clamped to your plan max

Dataset, Row, Match Value, and Filters all accept template expressions, so you can build rows directly from upstream node output:

tradeOutcomes
{"mint":"{token.mint}","pnl":{trade.pnl},"strategy":"sniper"}

Auto-capture: passive logging on any node

Manually wiring an Append node into every workflow is tedious. Instead, most nodes can passively capture their own output. Open a node's config and turn on Save output to Dataset (an optional, off-by-default panel in node settings), pick a dataset, and map fields:

Dataset:  signalOutcomes
score   ← {output.score}
mint    ← {token.mint}

After every successful run, that node's output is appended to the dataset automatically, with no extra node on the canvas. This turns every workflow into a passive data generator for scoring. In capture field templates, the node's own output is available as {output...} and under its response name.

The toggle isn't offered on triggers or on the Dataset node itself (a trigger has no scored output, and the Dataset node already writes rows), so they never capture.

Auto-capture is a Pro / Ultra feature. On the free plan the toggle is ignored at runtime (the run still succeeds; nothing is captured). Captured values are coerced to numbers and booleans where possible so they aggregate cleanly.

Templating against the response

Each Dataset operation writes its result to datasetResponse (override via the Response Name field):

Operation	Output shape
Append	`{ rowId, datasetId, rowCount }`
Upsert	`{ rowId, datasetId, inserted, rowCount }`
Query	`{ rows: [{ _id, data, createdAt, hasBlob, sourceNodeId }], count, scanned, truncated, scanTruncated, rowCount, order }`
Aggregate (ungrouped)	`{ value, count, scanned, truncated, scanTruncated, rowCount }`
Aggregate (grouped)	`{ buckets: [{ key, value, count }], count, scanned, truncated, scanTruncated, rowCount }`
Delete	`{ deleted, hasMore }`

All outputs include success: true, operation, and dataset alongside the operation-specific fields.

Query and aggregate return two completeness flags, and they mean different things:

scanTruncated: true means the dataset is larger than your plan's scan window, so the result was computed over only a partial slice: the most-recent rows (or, for an oldest-first query, the oldest rows). This is a correctness signal: a score over part of the dataset, not all of it.
truncated: true is a superset of scanTruncated. It is also true for benign output caps even when the whole dataset was scanned: a Query returned fewer rows than matched (the Limit or the response-size budget), or a grouped Aggregate hit the distinct-group cap. When truncated is true but scanTruncated is false, the computed value is complete; only the returned list was capped.

Use scanned (rows actually read) against rowCount (the dataset's live total) to gauge coverage. Treat scanTruncated as the flag for "is this score over the full dataset?"; treat the bare truncated as "is there more output I'm not seeing?".

Plan limits

Caps are enforced server-side and differ by tier:

Cap	Free	Pro	Ultra
Datasets per account	3	25	100
Rows per dataset	2,000	100,000	1,000,000
Row size (queryable fields)	5 KB	20 KB	50 KB
Total stored bytes per account	10 MB	500 MB	5 GB
Query / aggregate scan window	2,000	3,000	5,000
Auto-capture	Off	On	On
Default retention	30 days	90 days	180 days
Max retention	30 days	365 days	Unbounded

The scan window is the number of most-recent rows a single Query or Aggregate folds in. For large datasets, narrow the result with filters; whole-dataset scoring at scale is on the roadmap (precomputed rollups). An hourly background sweep deletes rows past their dataset's retention.

Marketplace and ownership

Datasets are scoped to your account, and nodes reference them by name. A marketplace clone resolves that name against the buyer's own datasets (created on first write), so an imported template never reads or writes the author's data. Rows are never shared between accounts.

Common use cases

Strategy scoring: append each trade's PnL and strategy, then aggregate avg PnL grouped by strategy over a rolling window
Signal quality tracking: capture every AI signal's score and later outcome, then score precision over time
Cross-workflow counters and outcome logs that survive long enough to analyze

Example: score strategies from captured trades

Two workflows share one dataset:

Trading workflow closes a trade, then a Dataset (Append) node writes {"strategy":"{cfg.strategy}","pnl":{trade.pnl}} to tradeOutcomes. (Or turn on auto-capture on the trade node and skip the explicit node.)
Scoring workflow (Cron Trigger) runs Dataset (Aggregate): op: avg, field: pnl, group by: strategy. Output: {buckets: [{key:"sniper", value:1.8, count:120}, ...]}.
Condition / AI ranks the buckets and adjusts which strategies stay active.

Next steps

Storage: per-workflow dedup and small state
Condition: branch on {datasetResponse.value} or a bucket
Code: post-process query rows or compute a custom score

On this page