A Post-Mortem on Integrating a Zig-based WASM Runtime with a Haskell DSL for Mobile Hudi Data Visualization


The initial problem wasn’t novel: our mobile application’s analytics dashboards were unusably slow. They were built with React Native, rendering data fetched directly from a series of REST endpoints. These endpoints, in turn, performed complex aggregations over an Apache Hudi data lake. Every filter change triggered a cascade of expensive queries, leading to seconds of loading spinners and a significant drain on device battery. The configurations for these dashboards were a sprawling mess of JSON files, brittle and nearly impossible to maintain or extend without introducing regressions. A complete rethink was necessary, not just an optimization.

Our initial concept was to shift from an imperative to a declarative model. Instead of the mobile client painstakingly building queries and rendering logic, it would simply request a dashboard by name, defined in a high-level, domain-specific language (DSL). This DSL would describe the data sources, transformations, visualizations, and styling. A backend service would interpret this DSL, perform the heavy lifting of data retrieval from Hudi, and a hyper-optimized client-side engine would handle the final data shaping and layout computation. This approach promised to centralize business logic, simplify the mobile client, and create a single source of truth for our analytics.

The technology selection process was rigorous and driven by the unique constraints of each architectural layer.

  1. Data Source - Apache Hudi: This was a given. Our entire multi-petabyte analytics platform is built on a Hudi-based data lakehouse. Its support for incremental queries (hoodie.datasource.query.type=incremental) and time-travel was non-negotiable for efficient data fetching.
  2. DSL & Backend - Haskell: For building a robust DSL, Haskell was the obvious choice. Its powerful type system allows for creating deeply expressive and safe embedded DSLs. We could guarantee at compile-time that a dashboard definition was logically sound, something impossible with JSON schemas. Libraries like Megaparsec for parsing and Aeson for JSON manipulation are mature and production-ready.
  3. Client-Side Performance Core - Zig + WebAssembly (WASM): The pure JavaScript environment in React Native was the primary performance bottleneck for data manipulation. We needed native-level speed. While C++ or Rust were contenders, we chose Zig. Its simplicity, focus on explicit memory management, first-class cross-compilation support, and trivial C ABI interoperability made it ideal for creating a small, fast WASM module. The goal was a portable binary that could be embedded in any mobile environment (iOS, Android) and perform complex aggregations or layout calculations in microseconds.
  4. Styling - CSS Modules Philosophy: The DSL also needed to control component styling. To avoid the global namespace pollution and non-composability of traditional CSS-in-JS solutions, we adopted the principle of CSS Modules: locally scoped class names. In React Native, this translates to generating typed stylesheet objects, ensuring that styles defined for one component in the DSL wouldn’t accidentally bleed over and affect another.

The implementation journey was a multi-stage process, connecting these disparate technologies. It started with defining the language for our analytics.

Defining the Analytics DSL in Haskell

The core of the system is a Haskell data type that represents a complete dashboard. It’s a tree of layouts, components, data sources, and style rules.

-- file: src/Dashboard/Types.hs
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}

module Dashboard.Types where

import Data.Text (Text)
import Data.Map (Map)
import GHC.Generics

-- A unique identifier for any element
type ElementId = Text

-- Data source points to a Hudi table and specifies the query type
data HudiSource = HudiSource
  { hudiTable     :: Text
  , queryType     :: Text -- "snapshot" or "incremental"
  , beginInstant  :: Maybe Text
  , filterPreds   :: [Text]
  } deriving (Show, Eq, Generic)

-- Transformations to be applied on the client by the Zig/WASM module
data Transformation
  = GroupBy Text
  | Sum Text
  | Average Text
  deriving (Show, Eq, Generic)

-- Describes a single visual component
data Component
  = BarChart
      { componentId :: ElementId
      , source      :: HudiSource
      , transforms  :: [Transformation]
      , xAxis       :: Text
      , yAxis       :: Text
      }
  | DataTable
      { componentId :: ElementId
      , source      :: HudiSource
      , columns     :: [Text]
      }
  deriving (Show, Eq, Generic)

-- Layout primitives
data Layout
  = Row { layoutId :: ElementId, children :: [Layout] }
  | Column { layoutId :: ElementId, children :: [Layout] }
  | Leaf { layoutId :: ElementId, component :: Component }
  deriving (Show, Eq, Generic)

-- Style properties. Simplified for brevity.
data StyleProps = StyleProps
  { backgroundColor :: Maybe Text
  , color           :: Maybe Text
  , padding         :: Maybe Int
  } deriving (Show, Eq, Generic)

-- A complete dashboard definition
data Dashboard = Dashboard
  { dashboardId :: ElementId
  , rootLayout  :: Layout
  , styles      :: Map ElementId StyleProps
  } deriving (Show, Eq, Generic)

-- We derive Aeson instances for JSON serialization
instance ToJSON HudiSource
instance ToJSON Transformation
instance ToJSON Component
instance ToJSON Layout
instance ToJSON StyleProps
instance ToJSON Dashboard

With the types defined, we used Megaparsec to parse a human-readable text format into our Haskell types. The real work, however, was in the interpreter that processed a Dashboard request. It would traverse the Layout tree, identify all unique HudiSource definitions, and generate optimized Spark SQL queries to fetch the data.

A critical mistake we made initially was fetching data for each component separately. This generated excessive load on our data platform. The fix was to implement a query coalescing step. The interpreter would analyze all HudiSource objects, find sources querying the same table, and merge them into a single, wider query with a UNION ALL or more complex predicate logic. This dramatically reduced query counts.

The backend endpoint would then serialize the raw Hudi data and the processed Dashboard structure (now with query results embedded) into a single JSON payload for the mobile client.

The Performance Core: A Zig WASM Module

This is where we addressed the client-side performance problem directly. The Zig module’s responsibility is singular: take a large chunk of raw row-oriented data from Hudi and a list of transformations (like GroupBy, Sum), and rapidly transform it into the aggregated, view-ready format required by the charting library.

A key design decision in a real-world project is defining a clean boundary between the untrusted, dynamically-typed world (JavaScript) and the statically-typed, memory-managed world (Zig/WASM). We defined a strict C ABI for this.

// file: src/lib.zig
const std = @import("std");

// We need a stable C ABI to be callable from the outside world (JS).
pub const abi = .c;

// Allocator passed in from the host environment (JS).
// This is critical. The WASM module doesn't own memory allocation; the host does.
// This prevents memory leaks if the JS garbage collector cleans up the WASM instance.
var host_allocator: std.mem.Allocator = undefined;

// These functions must be provided by the host environment.
extern fn wasm_alloc(size: u32) *anyopaque;
extern fn wasm_free(ptr: *anyopaque, size: u32);

const HostAllocator = std.heap.GeneralPurposeAllocator(.{
    .alloc_fn = wasm_alloc_wrapper,
    .resize_fn = wasm_resize_wrapper,
    .free_fn = wasm_free_wrapper,
}){};

fn wasm_alloc_wrapper(_: *anyopaque, len: usize, ptr_align: u8, ret_addr: usize) ?[*]u8 {
    // Simplified allocation logic
    _ = ptr_align;
    _ = ret_addr;
    const ptr = wasm_alloc(@intCast(u32, len));
    if (ptr == null) return null;
    return @ptrCast([*]u8, @alignCast(@alignOf(u8), ptr));
}

fn wasm_resize_wrapper(
    _: *anyopaque,
    buf: []u8,
    buf_align: u8,
    new_len: usize,
    ret_addr: usize,
) ?[]u8 {
    // In a real-world scenario, you'd realloc. Here we simplify: free and alloc again.
    _ = ret_addr;
    const old_ptr = @ptrCast(*anyopaque, buf.ptr);
    wasm_free(old_ptr, buf.len);
    const new_ptr = wasm_alloc(@intCast(u32, new_len)) orelse return null;
    return @ptrCast([*]u8, @alignCast(buf_align, new_ptr))[0..new_len];
}

fn wasm_free_wrapper(_: *anyopaque, buf: []u8, buf_align: u8, ret_addr: usize) void {
    _ = buf_align;
    _ = ret_addr;
    const ptr = @ptrCast(*anyopaque, buf.ptr);
    wasm_free(ptr, buf.len);
}

// Entry point to initialize the allocator from the JS side.
// This must be the first function called.
export fn init_allocator() void {
    host_allocator = HostAllocator.init(null);
}

// A simplified example of a data processing function.
// It takes a JSON string of row data and a JSON string of transformation rules.
// It returns a JSON string with the aggregated result.
// Using JSON is inefficient but demonstrates the principle. In production, we
// switched to a binary format written to a shared memory buffer.
export fn process_data(data_ptr: *const u8, data_len: u32, transform_ptr: *const u8, transform_len: u32) *const u8 {
    const data_json = std.mem.sliceTo(data_ptr, data_len);
    const transform_json = std.mem.sliceTo(transform_ptr, transform_len);

    // In a real application, you would parse the JSON and perform complex aggregations.
    // This is where Zig's performance shines.
    // For this example, we'll just simulate a transformation and return a new JSON string.

    var arena = std.heap.ArenaAllocator.init(host_allocator);
    defer arena.deinit();
    const allocator = arena.allocator();

    // -- Begin actual data processing logic --
    // For brevity, let's pretend we parsed `data_json`, found all rows where
    // `category == "A"`, and summed their `value` field.
    // const parsed_data = std.json.parseFromSlice(...);
    // const parsed_transforms = std.json.parseFromSlice(...);
    // ... heavy computation ...
    let total: f64 = 12345.67;
    // -- End actual data processing logic --

    // Allocate memory for the result string and format it.
    const result_str = std.fmt.allocPrint(allocator,
        \\{{ "result": {{ "total": {d} }} }}
    , .{total}) catch |err| {
        // Proper error handling is essential. Here we might return a null pointer
        // or a pointer to a pre-allocated error string.
        std.log.err("Failed to allocate result string: {}", .{err});
        return null;
    };

    // The pitfall here is memory ownership. The caller (JavaScript) is now
    // responsible for freeing the memory pointed to by the returned pointer.
    // We must provide a corresponding `free_result` function.
    return result_str.ptr;
}

export fn free_result(ptr: *const u8, len: u32) void {
    const slice = std.mem.sliceTo(ptr, len);
    // This memory was allocated with our `host_allocator` via the arena,
    // so we must use the same allocator to free it.
    host_allocator.free(slice);
}

Building this module for WASM is straightforward with the Zig build system.

# build.zig
const builder = @import("std").build;

pub fn build(b: *builder.Builder) void {
    const target = b.standardTargetOptions(.{});
    const mode = b.standardOptimizeOption(.{});

    const lib = b.addSharedLibrary(.{
        .name = "analytics_core",
        .root_source_file = .{ .path = "src/lib.zig" },
        .target = b.resolveTargetQuery(.{
            .cpu_arch = .wasm32,
            .os_tag = .freestanding,
        }),
        .optimize = mode,
    });

    // This is important: it tells Zig not to include a standard library
    // that assumes a typical OS environment.
    lib.strip = true;

    b.installArtifact(lib);
}

The command zig build produces a analytics_core.wasm file ready for integration.

The Bridge: Integrating WASM into React Native

This was the most challenging part of the project. Running WASM in a browser is well-documented; running it reliably in a React Native environment on both iOS (JSC or Hermes) and Android (V8 or Hermes) is not. We used the react-native-wasm library as a starting point, but it required significant modification for our use case.

The main issue is data passing. Naively passing large JSON strings between the JS thread and the WASM module incurs a huge serialization and copy cost, defeating the purpose of using WASM. The solution was SharedArrayBuffer (on Hermes) or a custom native module that managed a shared memory region.

Here’s a conceptual overview of the bridge code:

// file: WasmBridge.ts
import { Wasm, WasmInstance } from 'react-native-wasm';
import { Buffer } from 'buffer'; // Using a polyfill for Node's Buffer

class AnalyticsEngine {
  private instance: WasmInstance | null = null;
  private memory: WebAssembly.Memory | null = null;

  async initialize(wasmModule: Uint8Array): Promise<void> {
    try {
      this.instance = await Wasm.instantiate(wasmModule, {
        // The `env` object provides the `extern` functions our Zig code expects.
        env: {
          wasm_alloc: (size: number): number => this.wasmAlloc(size),
          wasm_free: (ptr: number, size: number): void => this.wasmFree(ptr, size),
          // We also need to provide things like `abort` for panic handling.
          abort: () => { console.error('WASM module aborted.'); }
        }
      });
      // The WASM module exports its memory, so JS can read/write to it.
      this.memory = this.instance.exports.memory as WebAssembly.Memory;

      // Crucially, initialize the allocator inside the WASM module.
      (this.instance.exports.init_allocator as () => void)();
    } catch (error) {
      console.error('Failed to initialize WASM module:', error);
      throw error;
    }
  }

  private wasmAlloc(size: number): number {
    // A real implementation would manage a memory heap in JS or the native side.
    // This is a simplified placeholder.
    // The returned value is a pointer (an offset in the WASM memory buffer).
    // Let's assume a native module provides `_wasm_malloc`.
    return _wasm_malloc(size);
  }

  private wasmFree(ptr: number, size: number): void {
    // Corresponding free function.
    _wasm_free(ptr, size);
  }

  process(data: object, transforms: object): object {
    if (!this.instance || !this.memory) {
      throw new Error('WASM engine not initialized.');
    }

    const dataStr = JSON.stringify(data);
    const transformStr = JSON.stringify(transforms);

    const dataBuffer = Buffer.from(dataStr, 'utf-8');
    const transformBuffer = Buffer.from(transformStr, 'utf-8');

    // Allocate memory inside WASM for the input data.
    const dataPtr = (this.instance.exports.wasm_alloc as (s: number) => number)(dataBuffer.length);
    const transformPtr = (this.instance.exports.wasm_alloc as (s: number) => number)(transformBuffer.length);

    // Write the data into the WASM memory.
    const wasmMemoryView = new Uint8Array(this.memory.buffer);
    wasmMemoryView.set(dataBuffer, dataPtr);
    wasmMemoryView.set(transformBuffer, transformPtr);

    // Call the core processing function.
    const resultPtr = (this.instance.exports.process_data as (dp: number, dl: number, tp: number, tl: number) => number)(
      dataPtr,
      dataBuffer.length,
      transformPtr,
      transformBuffer.length,
    );

    if (resultPtr === 0) {
      // The Zig code returned a null pointer, indicating an error.
      // Clean up allocated memory.
      (this.instance.exports.wasm_free as (p: number, s: number) => void)(dataPtr, dataBuffer.length);
      (this.instance.exports.wasm_free as (p: number, s: number) => void)(transformPtr, transformBuffer.length);
      throw new Error("WASM processing failed.");
    }
    
    // To read the result, we need to find the null-terminated string.
    let resultLen = 0;
    while(wasmMemoryView[resultPtr + resultLen] !== 0) {
        resultLen++;
    }

    const resultSlice = wasmMemoryView.slice(resultPtr, resultPtr + resultLen);
    const resultStr = new TextDecoder('utf-8').decode(resultSlice);
    const resultJson = JSON.parse(resultStr);

    // IMPORTANT: Free all memory that was allocated, both for inputs and the result.
    (this.instance.exports.wasm_free as (p: number, s: number) => void)(dataPtr, dataBuffer.length);
    (this.instance.exports.wasm_free as (p: number, s: number) => void)(transformPtr, transformBuffer.length);
    (this.instance.exports.free_result as (p: number, s: number) => void)(resultPtr, resultLen);

    return resultJson;
  }
}

This bridge code is complex and fraught with peril. A common mistake is mishandling memory, leading to leaks or, worse, memory corruption. Unit testing this boundary became a top priority. We created a suite of tests that would pass data of various sizes and formats to the WASM module and verify both the result and the memory state afterward.

Tying It All Together with Scoped Styling

The final piece was rendering. The React Native components receive a processed layout tree and the aggregated data from the Zig module. The styles map from our Haskell Dashboard definition is used to generate stylesheets. We used react-native-typed-stylesheet to achieve the CSS Modules-like scoping.

// file: ComponentRenderer.tsx
import { createStyleSheet } from 'react-native-typed-stylesheet';

// The renderer receives the 'styles' map from the backend.
// e.g., styles = { "element-123": { backgroundColor: "#fafafa", padding: 10 } }
const generateStyles = (stylesMap) => {
  const styleConfig = {};
  for (const elementId in stylesMap) {
    styleConfig[elementId] = stylesMap[elementId];
  }
  return createStyleSheet(styleConfig);
};

// Inside the component:
const dashboardStyles = generateStyles(dashboard.styles);

//... render function
<View style={dashboardStyles[leaf.layoutId]}>
  <BarChart data={processedData} />
</View>

This ensured that styles for element-123 were encapsulated and could never conflict with styles for other elements.

The final architecture can be visualized as a clear data flow.

sequenceDiagram
    participant MobileApp as React Native Client
    participant WasmModule as Zig WASM Runtime
    participant Backend as Haskell DSL Service
    participant DataLake as Apache Hudi

    MobileApp->>Backend: Request Dashboard ("sales_overview_v2")
    Backend->>Backend: Parse DSL definition for "sales_overview_v2"
    Backend->>DataLake: Execute coalesced Spark SQL query on Hudi
    DataLake-->>Backend: Return raw row data
    Backend-->>MobileApp: Respond with { rawData, layout, styles }
    
    MobileApp->>WasmModule: Call process_data(rawData, transforms)
    Note right of WasmModule: Heavy aggregation &
computation in Zig WasmModule-->>MobileApp: Return aggregated view model MobileApp->>MobileApp: Render React components with view model and scoped styles

The resulting system was a success. Dashboard load times went from 5-10 seconds to under 500 milliseconds. The dashboard definition files were now small, readable, and type-checked by the Haskell compiler. Adding a new visualization type meant adding a constructor to our Haskell Component type, a new function in the Zig module, and a corresponding React component, a clean and predictable process.

However, the solution is not without its limitations and trade-offs. The complexity of the toolchain is immense. Onboarding a new developer requires knowledge of Haskell, Zig, and the intricacies of the React Native-WASM bridge. The initial load and JIT compilation of the WASM module still introduces a one-time ~100ms penalty on app start. Furthermore, debugging the Zig code once it’s running inside the mobile JS VM is exceptionally difficult, relying mostly on structured logging passed back to the JS console. Future iterations will focus on replacing the JSON data transfer with a zero-copy approach using FlatBuffers over a shared memory region and exploring AOT compilation of the WASM module with tools like Wasmer to eliminate the initial startup cost.


  TOC