Solaris AISolaris AI FlowDocs
Node ReferenceUtility

Dataset

Account-scoped tables for capturing node outputs across many workflows and scoring them over time.

View as Markdown

The Dataset node is where Solaris AI Flow remembers outcomes. Where the Storage node is a small per-workflow dedup cache, a Dataset is an append-heavy table scoped to your whole account, so many workflows can write into one dataset and a separate workflow can read across all of them.

This is the primitive behind deterministic scoring. A trading workflow appends each closed trade's outcome; a nightly scoring workflow aggregates the last 30 days to rank strategies. Because rows persist long enough and every workflow you own can write to the same dataset, you can score across your entire account, not just one flow.

Storage vs Dataset

Pick the right tool:

  • Storage is a key/value + list cache scoped to a single workflow, with mandatory short TTLs. Use it for dedup ("have I seen this mint?") and small per-workflow state.
  • Dataset is a structured, queryable table scoped to your account, with long retention. Use it to capture outputs over time and compute scores, counts, and averages across workflows.

If you find yourself trying to log a growing history into Storage, or trying to read one workflow's Storage from another, you want a Dataset.

Prerequisites

  • No API key required
  • Auto-capture (passive logging) requires a Pro or Ultra plan

Operations

OperationPurposeRequired fields
Append RowAdd a new rowdataset, row
Upsert RowUpdate the row matching a field, else appenddataset, row, match field, match value
QueryReturn rows matching filtersdataset
AggregateCompute count / sum / avg / min / max, optionally groupeddataset, aggregate op (+ field)
Delete RowsRemove rows matching filtersdataset

Append Row

Adds a row to the dataset, creating the dataset on first write. The row is a JSON object of the small, queryable fields you want to keep, for example {"mint":"{token.mint}","pnl":1.2,"strategy":"sniper"}. Each row is automatically stamped with a creation time and the ID of the workflow that wrote it, so a single dataset can attribute rows back to multiple source workflows.

Keep rows to scalars (strings, numbers, booleans) you actually filter or aggregate on. The row is what lives in the queryable store, so smaller rows mean more history under your plan's size cap.

Upsert Row

Like Append, but first looks for an existing row whose match field equals the resolved match value. If found, that row is overwritten; otherwise a new row is appended. Use this for "one row per token" style datasets where re-processing the same token should update rather than duplicate.

To guarantee a unique match, upsert scans every row, so it is only available while a dataset stays within the Query / aggregate scan window from Plan limits (3,000 rows on Pro, 5,000 on Ultra). This is far below the per-dataset row cap: a Pro dataset holds up to 100,000 rows, but upsert stops working past ~3,000. Once a dataset grows past the scan window, upsert errors rather than risk silently inserting a duplicate it couldn't detect; switch to Append, lower the dataset's retention, or split it. (Indexed keys for unbounded upsert are planned.)

Query

Returns rows matching an optional filter array, newest first by default. Filters are a JSON array like [{"field":"pnl","op":"gt","value":0}]. Supported ops are eq, ne, gt, gte, lt, lte, and contains. Ordering and comparison filters operate on numeric fields. Results are capped by the limit field (default 200, max 1000) and the per-plan scan window.

Field names are top-level only. Filter field, group by, and the aggregate field all read a row's top-level keys. A dot path like trade.pnl is treated as the literal key "trade.pnl", not a nested lookup, so a nested value silently won't match. Keep the fields you score on flat (the way auto-capture writes them), e.g. store pnl rather than {"trade":{"pnl":1.2}}.

Aggregate

The scoring engine. Computes count, sum, avg, min, or max over the rows matching your filters. With a group by field, it returns one bucket per distinct value, for example average PnL grouped by strategy. count needs no field; the other ops require a numeric field. This is how you turn a long log of outcomes into a single deterministic score.

Delete Rows

Removes rows matching the filters, operating over the same recent window as Query and Aggregate (the most recent rows, newest first) and deleting up to a batch per run. The hasMore flag in the response indicates more matches remain; re-run to keep draining. For routine expiry use retention, and to wipe an entire dataset use the Delete button in the Datasets manager rather than this node.

Configuration

FieldTypeRequiredDescription
Node LabelstringNoDisplay name shown on the canvas
OperationenumYesOne of the operations above
DatasetstringYesDataset name (max 128 chars). Allowed characters: letters, digits, _ - : . Created on first write
RowJSON objectFor Append / UpsertThe queryable fields to store. Templates resolved before parsing
Match Field / Match ValuestringFor UpsertThe field and value used to find an existing row
FiltersJSON arrayNo[{field, op, value}] for Query / Aggregate / Delete
Aggregate / Field / Group Byenum + stringFor AggregateThe reducer, the numeric field, and an optional group-by field
Limit / Ordernumber / enumNo (Query)Result cap and sort direction over creation time
Retention DaysnumberNoApplied when the dataset is first created. Clamped to your plan max

Dataset, Row, Match Value, and Filters all accept template expressions, so you can build rows directly from upstream node output:

tradeOutcomes
{"mint":"{token.mint}","pnl":{trade.pnl},"strategy":"sniper"}

Auto-capture: passive logging on any node

Manually wiring an Append node into every workflow is tedious. Instead, most nodes can passively capture their own output. Open a node's config and turn on Save output to Dataset (an optional, off-by-default panel in node settings), pick a dataset, and map fields:

Dataset:  signalOutcomes
score   ← {output.score}
mint    ← {token.mint}

After every successful run, that node's output is appended to the dataset automatically, with no extra node on the canvas. This turns every workflow into a passive data generator for scoring. In capture field templates, the node's own output is available as {output...} and under its response name.

The toggle isn't offered on triggers or on the Dataset node itself (a trigger has no scored output, and the Dataset node already writes rows), so they never capture.

Auto-capture is a Pro / Ultra feature. On the free plan the toggle is ignored at runtime (the run still succeeds; nothing is captured). Captured values are coerced to numbers and booleans where possible so they aggregate cleanly.

Templating against the response

Each Dataset operation writes its result to datasetResponse (override via the Response Name field):

OperationOutput shape
Append{ rowId, datasetId, rowCount }
Upsert{ rowId, datasetId, inserted, rowCount }
Query{ rows: [{ _id, data, createdAt, hasBlob, sourceNodeId }], count, scanned, truncated, scanTruncated, rowCount, order }
Aggregate (ungrouped){ value, count, scanned, truncated, scanTruncated, rowCount }
Aggregate (grouped){ buckets: [{ key, value, count }], count, scanned, truncated, scanTruncated, rowCount }
Delete{ deleted, hasMore }

All outputs include success: true, operation, and dataset alongside the operation-specific fields.

Query and aggregate return two completeness flags, and they mean different things:

  • scanTruncated: true means the dataset is larger than your plan's scan window, so the result was computed over only a partial slice: the most-recent rows (or, for an oldest-first query, the oldest rows). This is a correctness signal: a score over part of the dataset, not all of it.
  • truncated: true is a superset of scanTruncated. It is also true for benign output caps even when the whole dataset was scanned: a Query returned fewer rows than matched (the Limit or the response-size budget), or a grouped Aggregate hit the distinct-group cap. When truncated is true but scanTruncated is false, the computed value is complete; only the returned list was capped.

Use scanned (rows actually read) against rowCount (the dataset's live total) to gauge coverage. Treat scanTruncated as the flag for "is this score over the full dataset?"; treat the bare truncated as "is there more output I'm not seeing?".

Plan limits

Caps are enforced server-side and differ by tier:

CapFreeProUltra
Datasets per account325100
Rows per dataset2,000100,0001,000,000
Row size (queryable fields)5 KB20 KB50 KB
Total stored bytes per account10 MB500 MB5 GB
Query / aggregate scan window2,0003,0005,000
Auto-captureOffOnOn
Default retention30 days90 days180 days
Max retention30 days365 daysUnbounded

The scan window is the number of most-recent rows a single Query or Aggregate folds in. For large datasets, narrow the result with filters; whole-dataset scoring at scale is on the roadmap (precomputed rollups). An hourly background sweep deletes rows past their dataset's retention.

Marketplace and ownership

Datasets are scoped to your account, and nodes reference them by name. A marketplace clone resolves that name against the buyer's own datasets (created on first write), so an imported template never reads or writes the author's data. Rows are never shared between accounts.

Common use cases

  • Strategy scoring: append each trade's PnL and strategy, then aggregate avg PnL grouped by strategy over a rolling window
  • Signal quality tracking: capture every AI signal's score and later outcome, then score precision over time
  • Cross-workflow counters and outcome logs that survive long enough to analyze

Example: score strategies from captured trades

Two workflows share one dataset:

  1. Trading workflow closes a trade, then a Dataset (Append) node writes {"strategy":"{cfg.strategy}","pnl":{trade.pnl}} to tradeOutcomes. (Or turn on auto-capture on the trade node and skip the explicit node.)
  2. Scoring workflow (Cron Trigger) runs Dataset (Aggregate): op: avg, field: pnl, group by: strategy. Output: {buckets: [{key:"sniper", value:1.8, count:120}, ...]}.
  3. Condition / AI ranks the buckets and adjusts which strategies stay active.

Next steps

  • Storage: per-workflow dedup and small state
  • Condition: branch on {datasetResponse.value} or a bucket
  • Code: post-process query rows or compute a custom score

On this page