Streaming

Most AI / ML training and inferencing systems us a combination of batch jobs or API’s.

Pickle Piper is designed from the ground up to embrace streaming at every part of the pipeline allowing for inferencing models at extremely low latency and high throughput.

By default Pickle Piper uses Apache Kafka - (An open source data streaming platform) topics as a source and destination.

Pickle Piper also interfaces with NVIDIA SMP’s (Streaming Multi-Processors) using a GPU latching algorithm to optimise speed and throughput.

Input

Inputs are queries for inferencing with the model.

An input in Pickle Piper is a streaming source, by default Pickle Piper supports Kafka as the input source.

Inputs need to follow the Pickle Piper format for structure and metadata.

Output

Outputs are results of inference queries.

An Output in Pickle Piper is an output destination, by default Pickle Piper supports Kafka as the destination for completed results.

Outputs follow the Pickle Piper format for structure and metadata.

Processor

A processor processes inputs by inferencing with the model against the GPU and produces results to the output.

Pipeline

A pipeline consist of the following components:

  • Input (Source)
  • Processor (Model)
  • Output (Destination)