Data Flow · Analyst QL

This section explains the data model and how data flows during execution.

Data Model

The basic unit of data in an AQL job is a row. A row is a keyed array of primitive values.

The keys are called column names and are case-insensitive. The values can be of the following types:

String
Number (MS SQL NUMERIC types are returned as strings due to a Microsoft issue)
Null

At this time, other types are not explicitly supported, although you may find that if using the same driver for data input and output, a particular driver may have wider type support than documented above.

Component Model

Every AQL DATA, EXEC, QUERY, and TRANSFORM block represents one or more sources, transforms, or destinations:

A source is read-only
A transform reads from a source and outputs to another transform or a destination
A destination reads from a source or transform and has no output

Under normal (failure-free) operation, the component lifecycle is as follows:

Compilation: the component options are parsed and the component is instantiated
Scheduling: the component is included in a DAG and waits for any dependencies to complete
Running: the component begins to pull data from its upstream source
End-of-life: the component closes its own output channel, if it has one

Note that data is pulled from upstream components by downstream components, rather than pushed.

If any component encounters an error condition, the whole flow is immediately stopped and transactions in progress are rolled back.

Transaction Management

A transaction manager oversees all SQL destination components. In the majority of cases, it ensures atomicity accross SQL destinations, so either no statement is committed to any destination (including EXECs) or all statements are committed.

If a connection to a database drops between the time when a statement is executed and when the transaction is committed, but after any other statement has already been committed, then the transaction manager will proceed with the commit on other destinations, and so it is possible that the transaction may be committed in all databases except the one with the dropped connection. It is a very small window (normally <1s) in which atomicity is not guaranteed, and note that this is consistent with typical 2PC behavior (see eg Wikipedia article).

Should that case present itself in practice, the uncommitted transaction may block other queries from proceeding and a system administrator will need to manually commit it once network connectivity has been restored. For this reason it is recommended to create alerts based on errors that may appear in AQL logs.

Disabling Automatic Transaction Management

Warning: Please make sure you know what you're doing!

To disable the transaction manager for a single destination, set the option MANAGED_TRANSACTION to 'False'.

To disable it for all destinations, set it using a CLI flag or SET MANAGED_TRANSACTION = 'False'.

Batching Inserts

For certain SQL databases (eg. Microsoft Azure SQL Database) a large number of inserts per transaction is not supported. To set the batch size for SQL destinations, set MANAGED_TRANSACTION to False and ROWS_PER_BATCH to the number of rows per transaction.

Please note that disabling automatic transaction management is necessary to achieve this.