Client-Side Execution of Snowflake Data Models Using a Rust-WASM Engine and Recoil for State Synchronization

Software Architecture

Word Count: 2.8k

Read Times: 17 Min

The core technical pain point originated from a seemingly simple user request for one of our internal financial analytics platforms: “I want to change the assumptions in this forecast model and see the impact on the portfolio’s 30-year projection instantly.” The underlying dataset for this portfolio, residing in Snowflake, consisted of several million records of historical asset performance. Each “what-if” scenario involved running a Monte Carlo simulation—a computationally intensive, iterative process. The existing architecture, which sent simulation parameters to a backend service, triggered a Snowflake query, ran the simulation, and returned the result, had a latency of 15-30 seconds. This was unacceptable for interactive analysis.

A pure JavaScript implementation in the browser was attempted first. It buckled immediately. Processing millions of data points through thousands of iterations pegged a single CPU core and often crashed the browser tab. The performance was orders ofmagnitude too slow. This led to the initial concept: offload the simulation logic to a high-performance, sandboxed environment within the browser itself. The goal was to bring the computation to the data, which, in this unconventional architecture, meant bringing a native-speed compute engine to the client-side data subset.

The technology selection process was rigorous, driven by the need for performance, safety, and seamless integration with our existing React frontend.

Compute Engine: Rust compiled to WebAssembly (WASM) was the only viable candidate. C/C++ could also target WASM but lacked Rust’s memory safety guarantees, a critical feature when running complex calculations in a browser sandbox. Rust’s ecosystem, particularly wasm-bindgen for JavaScript interoperability and serde for data serialization, is exceptionally mature. It promised near-native execution speed without sacrificing safety.
Data Source: The data had to come from Snowflake. The challenge wasn’t querying Snowflake itself, but rather how to securely and efficiently get a relevant slice of data to the browser. Direct browser-to-Snowflake connections are a security anti-pattern. Therefore, a thin backend facade (a Next.js API route in our case) was chosen to proxy authenticated requests.
Frontend State Management: The simulation inputs were numerous and interdependent—market volatility, inflation rates, contribution schedules, etc. A change in one input could invalidate multiple downstream calculations. Our existing Redux setup felt cumbersome for this highly granular and interconnected state. Recoil, with its concept of atoms for minimal state units and selectors for derived state, offered a more natural model. Its ability to create a dependency graph where components subscribe only to the state they need was a perfect fit to prevent unnecessary re-renders. Asynchronous selectors, in particular, provided a clean pattern for triggering the WASM computation when input parameters changed.

The final architecture took shape: A React UI allows users to adjust simulation parameters stored in Recoil atoms. An asynchronous Recoil selector subscribes to these atoms, fetches a baseline dataset from Snowflake via a secure API route, and then invokes the Rust/WASM module to run the simulation. The result is then stored back in the selector’s state, causing the UI to update reactively.

Step 1: The Core Rust Simulation Engine

The foundation of this architecture is the Rust library that performs the heavy lifting. This is not a simple “add two numbers” example; it must be structured to handle complex data and return a structured result, with robust error handling.

First, the project is set up with wasm-pack. The Cargo.toml file needs dependencies for WASM bindings, serialization, and random number generation for the simulation.

# Cargo.toml

[package]
name = "portfolio-simulator"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib", "rlib"]

[dependencies]
wasm-bindgen = "0.2"
serde = { version = "1.0", features = ["derive"] }
serde-wasm-bindgen = "0.4"
rand = "0.8"
rand::prelude::*
getrandom = { version = "0.2", features = ["js"] }
console_error_panic_hook = "0.1.7"

[profile.release]
lto = true
opt-level = "s"

A key detail here is getrandom with the js feature, which is necessary for rand to work correctly in a WASM environment by hooking into browser APIs. console_error_panic_hook is also critical for development; it ensures that Rust panics are logged to the browser console instead of causing a cryptic WASM trap.

Next is the Rust code itself. We define input and output structures that will be serialized and deserialized across the JS/WASM boundary. Using serde and wasm-bindgen makes this almost seamless.

// src/lib.rs

use wasm_bindgen::prelude::*;
use serde::{Serialize, Deserialize};
use rand::prelude::*;
use rand::distributions::Normal;
use std::panic;

// It's good practice to set a panic hook for better error messages in the browser console.
#[wasm_bindgen(start)]
pub fn main_js() -> Result<(), JsValue> {
    panic::set_hook(Box::new(console_error_panic_hook::hook));
    Ok(())
}

// Input parameters for the simulation, received from JavaScript.
#[derive(Serialize, Deserialize)]
pub struct SimulationParams {
    pub years: u32,
    pub num_simulations: u32,
    pub initial_portfolio_value: f64,
    pub annual_contribution: f64,
    pub mean_annual_return: f64,
    pub annual_volatility: f64,
}

// A single data point in a simulation path.
#[derive(Serialize, Deserialize, Clone)]
pub struct YearEndValue {
    pub year: u32,
    pub value: f64,
}

// The final output structure sent back to JavaScript.
#[derive(Serialize, Deserialize)]
pub struct SimulationOutput {
    pub median_path: Vec<YearEndValue>,
    pub percentile_5th_path: Vec<YearEndValue>,
    pub percentile_95th_path: Vec<YearEndValue>,
    pub success_rate: f64, // Example metric: % of sims above initial value
}

// The core function exposed to WebAssembly.
// It takes a JsValue (which we expect to be a JSON string of SimulationParams)
// and returns a JsValue (which will be a JSON string of SimulationOutput).
// Returning Result<JsValue, JsValue> allows us to propagate errors back to JS.
#[wasm_bindgen]
pub fn run_monte_carlo_simulation(params_js: JsValue) -> Result<JsValue, JsValue> {
    // Deserialize input parameters from JavaScript.
    let params: SimulationParams = serde_wasm_bindgen::from_value(params_js)
        .map_err(|e| JsValue::from_str(&format!("Parameter deserialization error: {}", e)))?;

    // Basic validation of parameters.
    if params.years == 0 || params.num_simulations == 0 || params.annual_volatility <= 0.0 {
        return Err(JsValue::from_str("Invalid simulation parameters provided."));
    }

    let mut rng = rand::thread_rng();
    let normal_dist = Normal::new(params.mean_annual_return, params.annual_volatility)
        .map_err(|_| JsValue::from_str("Failed to create normal distribution."))?;

    let mut all_sim_paths: Vec<Vec<YearEndValue>> = Vec::with_capacity(params.num_simulations as usize);
    let mut successful_sims = 0;

    for _ in 0..params.num_simulations {
        let mut path: Vec<YearEndValue> = Vec::with_capacity(params.years as usize);
        let mut current_value = params.initial_portfolio_value;

        for year in 1..=params.years {
            // Generate a random return for the year based on the normal distribution.
            let annual_return = normal_dist.sample(&mut rng);
            current_value = current_value * (1.0 + annual_return) + params.annual_contribution;

            // Prevent negative portfolio values. In a real model, this would be more complex.
            if current_value < 0.0 {
                current_value = 0.0;
            }

            path.push(YearEndValue { year, value: current_value });
        }
        
        if path.last().map_or(0.0, |y| y.value) >= params.initial_portfolio_value {
            successful_sims += 1;
        }
        all_sim_paths.push(path);
    }
    
    // Post-processing: Calculate median and percentiles for each year.
    // This is another computationally intensive step that justifies Rust/WASM.
    let mut median_path: Vec<YearEndValue> = Vec::with_capacity(params.years as usize);
    let mut percentile_5th_path: Vec<YearEndValue> = Vec::with_capacity(params.years as usize);
    let mut percentile_95th_path: Vec<YearEndValue> = Vec::with_capacity(params.years as usize);

    for year_idx in 0..params.years as usize {
        let mut year_values: Vec<f64> = all_sim_paths.iter().map(|path| path[year_idx].value).collect();
        
        // A common mistake is to not sort before calculating percentiles.
        year_values.sort_by(|a, b| a.partial_cmp(b).unwrap());
        
        let median_val = year_values[year_values.len() / 2];
        let p5_idx = (year_values.len() as f64 * 0.05).floor() as usize;
        let p95_idx = (year_values.len() as f64 * 0.95).floor() as usize;

        median_path.push(YearEndValue { year: (year_idx + 1) as u32, value: median_val });
        percentile_5th_path.push(YearEndValue { year: (year_idx + 1) as u32, value: year_values[p5_idx] });
        percentile_95th_path.push(YearEndValue { year: (year_idx + 1) as u32, value: year_values[p95_idx] });
    }
    
    let success_rate = successful_sims as f64 / params.num_simulations as f64;

    let output = SimulationOutput {
        median_path,
        percentile_5th_path,
        percentile_95th_path,
        success_rate,
    };

    // Serialize the output structure to be sent back to JavaScript.
    serde_wasm_bindgen::to_value(&output)
        .map_err(|e| JsValue::from_str(&format!("Result serialization error: {}", e)))
}

This Rust code is compiled into a WASM package using wasm-pack build --target web. This produces a pkg directory containing the .wasm file and the necessary JavaScript glue code to load and interact with it.

Step 2: The Snowflake Data Fetching Facade

Directly connecting to Snowflake from the browser is not an option. We implemented a simple Next.js API route to act as a secure proxy. This route handles authentication using service account credentials stored securely as environment variables and executes a pre-defined query. A production system would have more robust security, such as user-session-based authorization.

// pages/api/fetch-asset-data.js

import snowflake from 'snowflake-sdk';

// A common mistake is to hardcode credentials. They MUST be environment variables.
const SNOWFLAKE_ACCOUNT = process.env.SNOWFLAKE_ACCOUNT;
const SNOWFLAKE_USERNAME = process.env.SNOWFLAKE_USERNAME;
const SNOWFLAKE_PASSWORD = process.env.SNOWFLAKE_PASSWORD;
const SNOWFLAKE_WAREHOUSE = process.env.SNOWFLAKE_WAREHOUSE;
const SNOWFLAKE_DATABASE = process.env.SNOWFLAKE_DATABASE;
const SNOWFLAKE_SCHEMA = process.env.SNOWFLAKE_SCHEMA;

export default async function handler(req, res) {
  if (req.method !== 'GET') {
    return res.status(405).json({ message: 'Method Not Allowed' });
  }

  // In a real app, portfolioId would be dynamic and validated.
  const { portfolioId } = req.query;
  if (!portfolioId) {
      return res.status(400).json({ message: 'Portfolio ID is required.' });
  }

  const connection = snowflake.createConnection({
    account: SNOWFLAKE_ACCOUNT,
    username: SNOWFLAKE_USERNAME,
    password: SNOWFLAKE_PASSWORD,
    warehouse: SNOWFLAKE_WAREHOUSE,
    database: SNOWFLAKE_DATABASE,
    schema: SNOWFLAKE_SCHEMA,
  });

  try {
    await new Promise((resolve, reject) => {
      connection.connect((err) => {
        if (err) {
          console.error('Unable to connect to Snowflake: ' + err.message);
          reject(err);
        } else {
          console.log('Successfully connected to Snowflake.');
          resolve();
        }
      });
    });

    // The SQL should be parameterized and only fetch the minimum data required for the client.
    // The query here is simplified. In production, this would be a more complex aggregation.
    const sqlText = `
        SELECT 
            ASSET_CLASS,
            AVG(ANNUAL_RETURN) as MEAN_RETURN,
            STDDEV(ANNUAL_RETURN) as VOLATILITY
        FROM 
            HISTORICAL_PERFORMANCE
        WHERE 
            PORTFOLIO_ID = ?
        GROUP BY 
            ASSET_CLASS;
    `;

    const rows = await new Promise((resolve, reject) => {
        connection.execute({
            sqlText,
            binds: [portfolioId],
            complete: (err, stmt, rows) => {
                if (err) {
                    console.error('Failed to execute statement due to the following error: ' + err.message);
                    reject(err);
                } else {
                    resolve(rows);
                }
            }
        });
    });

    res.status(200).json(rows);

  } catch (error) {
    console.error('Snowflake API error:', error);
    res.status(500).json({ message: 'Failed to fetch data from Snowflake.' });
  } finally {
    if (connection.isUp()) {
        connection.destroy((err) => {
            if (err) {
                console.error('Failed to destroy Snowflake connection: ' + err.message);
            }
        });
    }
  }
}

This API route provides a clean, secure interface for the frontend to get the necessary statistical inputs (mean return, volatility) for the simulation engine without ever exposing database credentials.

Step 3: Integrating with Recoil and React

This is where all the pieces come together. We need to load the WASM module, manage the simulation parameters with Recoil, and trigger the computation when those parameters change.

First, let’s define the Recoil state.

// state/simulationState.js

import { atom, selector } from 'recoil';

// Atoms hold the raw, user-editable input values.
export const initialPortfolioValueAtom = atom({
  key: 'initialPortfolioValueAtom',
  default: 1000000,
});

export const annualContributionAtom = atom({
  key: 'annualContributionAtom',
  default: 50000,
});

export const simulationYearsAtom = atom({
  key: 'simulationYearsAtom',
  default: 30,
});

// This selector fetches the initial model parameters from our Snowflake API.
// It is asynchronous and will be handled by React Suspense.
const portfolioStatsSelector = selector({
  key: 'portfolioStatsSelector',
  get: async ({ get }) => {
    // A real app would get the portfolio ID from somewhere, e.g., URL or user state.
    const portfolioId = 'default_portfolio'; 
    const response = await fetch(`/api/fetch-asset-data?portfolioId=${portfolioId}`);
    if (!response.ok) {
      throw new Error("Network response was not ok");
    }
    const data = await response.json();
    // For this example, we assume a single-asset portfolio from the fetched data.
    // A more complex implementation would aggregate these stats.
    if (!data || data.length === 0) {
        throw new Error("No data returned from Snowflake for this portfolio.");
    }
    return {
        mean_annual_return: data[0].MEAN_RETURN,
        annual_volatility: data[0].VOLATILITY
    };
  },
});

// This is the core selector that runs the simulation.
// It's asynchronous and depends on both user inputs (atoms) and fetched data (another selector).
export const simulationResultSelector = selector({
  key: 'simulationResultSelector',
  get: async ({ get }) => {
    // Lazily import the WASM module.
    const wasm = await import('portfolio-simulator/portfolio_simulator.js');
    const { run_monte_carlo_simulation } = wasm;

    // Depend on other atoms and selectors. Recoil tracks these dependencies automatically.
    const stats = get(portfolioStatsSelector);
    const params = {
      years: get(simulationYearsAtom),
      num_simulations: 5000, // Hardcoded for this example
      initial_portfolio_value: get(initialPortfolioValueAtom),
      annual_contribution: get(annualContributionAtom),
      mean_annual_return: stats.mean_annual_return,
      annual_volatility: stats.annual_volatility,
    };

    try {
        // The actual call to the WebAssembly function.
        // The input object is automatically serialized to JSON by wasm-bindgen.
        const result = run_monte_carlo_simulation(params);
        return result;
    } catch (error) {
        // The Rust code propagates errors as JS exceptions.
        console.error("WASM simulation failed:", error);
        // Propagate the error to be caught by a React Error Boundary.
        throw error;
    }
  },
});

The beauty of this Recoil setup is its declarative nature. simulationResultSelector will only re-execute if one of its dependencies (initialPortfolioValueAtom, annualContributionAtom, portfolioStatsSelector, etc.) changes. Recoil memoizes the output, so if the user changes an input and then changes it back, the expensive WASM computation is not run a second time.

Finally, the React component uses these Recoil hooks to display inputs and results. It leverages React Suspense to handle the loading states of the asynchronous selectors.

// components/SimulationDashboard.jsx

import React, { Suspense } from 'react';
import { useRecoilState, useRecoilValue } from 'recoil';
import {
  initialPortfolioValueAtom,
  annualContributionAtom,
  simulationYearsAtom,
  simulationResultSelector,
} from '../state/simulationState';

// Component to display the results from the WASM simulation.
const SimulationResults = () => {
  // This hook will suspend while the async selector is pending.
  const results = useRecoilValue(simulationResultSelector);

  if (!results) {
    return <div>No results yet.</div>;
  }

  // A real implementation would use a charting library here.
  return (
    <div>
      <h3>Simulation Results</h3>
      <p>Success Rate: {(results.success_rate * 100).toFixed(2)}%</p>
      <h4>Median Path (Final Value):</h4>
      <p>${results.median_path[results.median_path.length - 1]?.value.toLocaleString()}</p>
      <h4>5th Percentile Path (Final Value):</h4>
      <p>${results.percentile_5th_path[results.percentile_5th_path.length - 1]?.value.toLocaleString()}</p>
      <h4>95th Percentile Path (Final Value):</h4>
      <p>${results.percentile_95th_path[results.percentile_95th_path.length - 1]?.value.toLocaleString()}</p>
    </div>
  );
};

// Main dashboard component with controls.
const SimulationDashboard = () => {
  const [initialValue, setInitialValue] = useRecoilState(initialPortfolioValueAtom);
  const [contribution, setContribution] = useRecoilState(annualContributionAtom);
  const [years, setYears] = useRecoilState(simulationYearsAtom);

  return (
    <div>
      <h2>Portfolio Forecast Simulator</h2>
      <div>
        <label>Initial Value: </label>
        <input
          type="number"
          value={initialValue}
          onChange={(e) => setInitialValue(Number(e.target.value))}
        />
      </div>
      <div>
        <label>Annual Contribution: </label>
        <input
          type="number"
          value={contribution}
          onChange={(e) => setContribution(Number(e.target.value))}
        />
      </div>
      <div>
        <label>Simulation Years: </label>
        <input
          type="number"
          value={years}
          onChange={(e) => setYears(Number(e.target.value))}
        />
      </div>
      <hr />
      {/* 
        Suspense is critical for handling async Recoil selectors. It shows a fallback
        UI while data is being fetched or the WASM module is computing.
      */}
      <Suspense fallback={<div>Loading portfolio data and running simulation...</div>}>
        <SimulationResults />
      </Suspense>
    </div>
  );
};

export default SimulationDashboard;

The final result is an interface where moving a slider causes the complex Monte Carlo simulation to re-run in under 200 milliseconds, a dramatic improvement over the initial 30-second server roundtrip. The UI feels instantaneous, fulfilling the original user request.

graph TD
    subgraph Browser
        A[React UI: Input Sliders] -- user changes value --> B(Recoil Atoms);
        B -- provides input --> C{simulationResultSelector};
        C -- fetches initial data --> F[API Route Proxy];
        F -- returns data subset --> C;
        C -- triggers WASM call --> D[Rust/WASM Module];
        D -- performs 5000x30 iterations --> D;
        D -- returns structured result --> C;
        C -- resolves with final data --> E[React UI: Charts/Results];
    end
    
    subgraph "Backend (Serverless Function)"
        F -- executes parameterized query --> G[(Snowflake Data Warehouse)];
    end

    E -- Suspense fallback shown during --> C

This architecture is not without its limitations and trade-offs. The primary bottleneck shifts from server compute time to the initial data transfer from the Snowflake proxy. While we only fetched aggregated stats, a model requiring a larger raw dataset would face latency here. Future iterations could explore streaming data directly into the WASM module’s memory via WebSockets to mitigate this. Furthermore, while serde-wasm-bindgen is convenient, the JSON serialization/deserialization between JS and WASM introduces overhead. For extreme performance needs, one might investigate more efficient binary formats like Protocol Buffers or direct manipulation of the WASM linear memory, though this significantly increases complexity. The current solution, however, strikes a pragmatic balance between performance, security, and development effort.

Rust Frontend Snowflake WebAssembly Recoil

Orchestrating a Real-Time Feature Pipeline from Nuxt.js to Kubeflow via a JavaScript Gateway and NoSQL Store

2023-10-27 System Architecture

Node.js MLOps MongoDB JavaScript Nuxt.js NoSQL Kubeflow

Architecting a Resilient Edge-to-Cloud Pipeline with C++, CockroachDB Changefeeds, and Vue.js

2023-10-27 Distributed Systems

Vue.js C++ Edge Computing CockroachDB Changefeeds