The initial request was seemingly straightforward: enable a new iOS application for our portfolio managers to perform real-time Value at Risk (VaR) calculations on selected asset portfolios. The technical reality was a minefield. Our core portfolio data resides in an on-premise Oracle database, managed by a monolithic Java 8 application with a data access layer built entirely on MyBatis. The VaR calculation itself required complex Monte Carlo simulations, a task for which our Java stack was ill-suited and our in-house expertise was non-existent. The latency requirement from the Swift client to the final result was under 2 seconds for a moderately sized portfolio.
A rewrite of the core Java service was immediately ruled out due to risk and cost. The MyBatis XML mappers represented years of accumulated business logic and performance tuning that nobody fully understood anymore. The pragmatic path forward was to build a new, specialized service for the computation and orchestrate a call chain from the mobile client through our existing infrastructure. This is the log of how we stitched together Swift, Java/MyBatis, and Python/SciPy on Azure, and how a surprising tool like Turbopack became critical for our operational visibility.
The Architectural Compromise: A Heterogeneous Pipeline
Our final design was a multi-hop pipeline. A direct call from the Swift client to a new computational service was not possible due to security policies and the need to enrich the request with data from the legacy monolith.
The flow was established as follows:
- Swift iOS App: The client initiates a request with a portfolio identifier and simulation parameters.
- Azure API Management: Acts as the security gateway, authenticating the client and routing the request.
- Java/MyBatis Service (on Azure App Service): The existing monolith, now containerized and moved to App Service. It receives the request, uses MyBatis to fetch the portfolio’s historical performance data from the Oracle DB, and then calls the computational service.
- Python/SciPy Service (on Azure Functions): A new, stateless function that receives the historical data, performs the VaR calculation using SciPy, and returns the result.
- SRE Monitoring Dashboard: An internal Next.js web app, built with Turbopack, that consumes logs and metrics from Azure Monitor to provide real-time visibility into the pipeline’s health.
sequenceDiagram participant SwiftClient as Swift iOS Client participant APIM as Azure API Management participant JavaService as Java/MyBatis Service (App Service) participant PythonFunction as Python/SciPy (Azure Function) participant OracleDB as Oracle Database SwiftClient->>+APIM: POST /calculateVaR (PortfolioID, Params) APIM->>+JavaService: Forward Request JavaService->>+OracleDB: Fetch historical data via MyBatis OracleDB-->>-JavaService: Return Portfolio Data JavaService->>+PythonFunction: POST /runSimulation (HistoricalData) PythonFunction-->>-JavaService: Return VaR Result JavaService-->>-APIM: Return VaR Result APIM-->>-SwiftClient: 200 OK (VaR Result)
This architecture was a compromise. It introduced multiple network hops and serialization steps, putting our sub-2-second latency target at risk. Every component had to be optimized.
The Computational Core: SciPy on Azure Functions
The first component built was the Python service. Azure Functions were chosen for their auto-scaling and pay-per-use model, which suited the sporadic nature of these calculations. The core logic relied on scipy.stats
and numpy
.
A common pitfall in such projects is underestimating the configuration and packaging complexity. The function required a specific Python version and external libraries, which had to be declared in requirements.txt
and managed carefully in the deployment pipeline.
Here is the core of the function app.
VaRFunction/function.json
:
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
}
]
}
VaRFunction/__init__.py
:
import logging
import json
import time
import numpy as np
from scipy.stats import norm
import azure.functions as func
# --- Constants for Simulation ---
# In a production system, these would be configurable.
SIMULATION_COUNT = 10000
CONFIDENCE_LEVEL = 0.99
TIME_HORIZON_DAYS = 5
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a VaR calculation request.')
try:
req_body = req.get_json()
# Input validation is critical. Never trust the caller.
historical_returns = req_body.get('returns')
if not isinstance(historical_returns, list) or len(historical_returns) < 2:
return func.HttpResponse(
"Invalid input: 'returns' must be a list of at least two numbers.",
status_code=400
)
except ValueError:
return func.HttpResponse(
"Request body must be valid JSON.",
status_code=400
)
start_time = time.perf_counter()
# --- Core Scientific Computation ---
# This is where SciPy and NumPy are indispensable.
try:
returns_array = np.array(historical_returns)
# Calculate daily drift and volatility
mean_return = np.mean(returns_array)
std_dev = np.std(returns_array)
drift = mean_return - (0.5 * std_dev ** 2)
# Generate random variables for Monte Carlo simulation
random_vars = norm.ppf(np.random.rand(TIME_HORIZON_DAYS, SIMULATION_COUNT))
# Calculate daily returns using Geometric Brownian Motion
daily_returns = np.exp(drift + std_dev * random_vars)
# Create price paths
price_paths = np.zeros_like(daily_returns)
price_paths[0] = 100 # Start with a nominal initial price
for t in range(1, TIME_HORIZON_DAYS):
price_paths[t] = price_paths[t - 1] * daily_returns[t]
# Get final prices and calculate portfolio returns
final_prices = price_paths[-1]
portfolio_returns = final_prices - 100
# Calculate Value at Risk (VaR)
# VaR is the inverse of the CDF (percent point function) at a given confidence level.
var_result = np.percentile(portfolio_returns, 100 * (1 - CONFIDENCE_LEVEL))
except Exception as e:
# Log the specific scientific computation error for debugging.
logging.error(f"SciPy/NumPy calculation failed: {e}", exc_info=True)
return func.HttpResponse(
"An error occurred during the simulation.",
status_code=500
)
end_time = time.perf_counter()
duration_ms = (end_time - start_time) * 1000
# Structured logging is key for monitoring in Azure Monitor.
logging.info(json.dumps({
"message": "VaR calculation successful",
"durationMs": duration_ms,
"simulations": SIMULATION_COUNT,
"confidence": CONFIDENCE_LEVEL,
"result": var_result
}))
return func.HttpResponse(
json.dumps({
'valueAtRisk': var_result,
'calculationDurationMs': duration_ms,
'confidenceLevel': CONFIDENCE_LEVEL
}),
mimetype="application/json",
status_code=200
)
The initial deployment revealed a critical performance bottleneck: cold starts. The first request after a period of inactivity took over 8 seconds, as the Azure Functions host had to provision a container and load the Python interpreter and all the SciPy/NumPy dependencies. This was unacceptable. The solution was to switch from the Consumption Plan to a Premium Plan with a configured number of pre-warmed instances. This was a direct trade-off: we accepted higher baseline costs for guaranteed low-latency responses.
Orchestration: The Java/MyBatis Monolith
The Java service’s role was to act as an orchestrator. It had to fetch data using its existing MyBatis setup and then call the new Python function. A common mistake here is to use blocking I/O, which would tie up a thread in the application server’s pool for the entire duration of the downstream call. In a system under load, this leads to thread pool exhaustion. We opted for Java 11’s non-blocking HttpClient
.
First, the MyBatis layer. We couldn’t change the tables, but we could add a new mapper for this specific query.
PortfolioMapper.xml
:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper
PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN"
"http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="com.legacy.data.PortfolioMapper">
<select id="getHistoricalDailyReturns" resultType="double">
SELECT
(p.close_price - p.open_price) / p.open_price as daily_return
FROM
portfolio_daily_history p
WHERE
p.portfolio_id = #{portfolioId}
AND p.record_date >= #{startDate}
ORDER BY
p.record_date ASC
</select>
</mapper>
PortfolioMapper.java
:
package com.legacy.data;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;
import java.time.LocalDate;
import java.util.List;
@Mapper
public interface PortfolioMapper {
List<Double> getHistoricalDailyReturns(
@Param("portfolioId") String portfolioId,
@Param("startDate") LocalDate startDate
);
}
Next, the service layer orchestrating the call. The most critical part here is the communication with the Python function. Initial tests with JSON serialization via Jackson proved to be a performance bottleneck for large datasets. We switched to Protocol Buffers (Protobuf) for a more compact and faster binary format.
var_request.proto
:
syntax = "proto3";
package computation;
option java_package = "com.legacy.computation.dto";
option java_outer_classname = "VarComputationProtos";
message VarInput {
repeated double returns = 1;
}
message VarOutput {
double valueAtRisk = 1;
double calculationDurationMs = 2;
double confidenceLevel = 3;
}
The Java service now used this definition.
VarCalculationService.java
:
package com.legacy.service;
import com.legacy.computation.dto.VarComputationProtos;
import com.legacy.data.PortfolioMapper;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.charset.StandardCharsets;
import java.time.Duration;
import java.time.LocalDate;
import java.util.List;
import java.util.concurrent.CompletableFuture;
@Service
public class VarCalculationService {
private static final Logger logger = LoggerFactory.getLogger(VarCalculationService.class);
private final PortfolioMapper portfolioMapper;
private final HttpClient httpClient;
// These would be injected from config server or application.properties
@Value("${var.function.url}")
private String functionUrl;
@Value("${var.function.key}")
private String functionKey;
public VarCalculationService(PortfolioMapper portfolioMapper) {
this.portfolioMapper = portfolioMapper;
// In a real project, this HttpClient should be a shared bean.
this.httpClient = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_2)
.connectTimeout(Duration.ofSeconds(5))
.build();
}
public CompletableFuture<VarComputationProtos.VarOutput> calculateVaR(String portfolioId) {
// Step 1: Fetch data from Oracle via MyBatis
logger.info("Fetching historical data for portfolio: {}", portfolioId);
List<Double> returns = portfolioMapper.getHistoricalDailyReturns(portfolioId, LocalDate.now().minusYears(1));
if (returns == null || returns.isEmpty()) {
logger.warn("No historical data found for portfolio: {}", portfolioId);
return CompletableFuture.failedFuture(new IllegalArgumentException("Portfolio data not found."));
}
// Step 2: Prepare the Protobuf payload
VarComputationProtos.VarInput requestPayload = VarComputationProtos.VarInput.newBuilder()
.addAllReturns(returns)
.build();
// Step 3: Build the async HTTP request to the Azure Function
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(functionUrl + "?code=" + functionKey))
.timeout(Duration.ofSeconds(10))
.header("Content-Type", "application/x-protobuf") // Switched from application/json
.POST(HttpRequest.BodyPublishers.ofByteArray(requestPayload.toByteArray()))
.build();
logger.info("Invoking SciPy Azure Function for portfolio: {}", portfolioId);
// Step 4: Execute the call asynchronously and process the response
return httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofByteArray())
.thenApply(response -> {
if (response.statusCode() != 200) {
String errorBody = new String(response.body(), StandardCharsets.UTF_8);
logger.error("Azure Function failed with status {} and body: {}", response.statusCode(), errorBody);
throw new RuntimeException("VaR calculation service failed.");
}
try {
// Deserialize the Protobuf response
return VarComputationProtos.VarOutput.parseFrom(response.body());
} catch (Exception e) {
logger.error("Failed to parse Protobuf response from Azure Function", e);
throw new RuntimeException("Invalid response from calculation service.", e);
}
});
}
}
This asynchronous approach was vital. The calling thread in the Java service was now free to handle other requests while waiting for the Python function to complete its work.
The Client: Swift and async/await
The Swift client’s job was to present a responsive UI while this complex backend operation was underway. Using modern Swift concurrency with async/await
made the code clean and prevented the classic “callback hell.” The client also had to handle Protobuf decoding.
VaRService.swift
:
import Foundation
// Assuming Protobuf-generated Swift struct `Computation_VarOutput` exists.
// Code generation would be handled by SwiftProtobuf plugin.
enum VaRServiceError: Error {
case invalidURL
case networkError(Error)
case serverError(statusCode: Int)
case decodingError(Error)
case noData
}
class VaRService {
// In a production app, this would be configured based on environment.
private let baseURL = "https://your-api-management.azure-api.net/v1"
private let apiKey: String // Loaded from a secure store like Keychain.
init(apiKey: String) {
self.apiKey = apiKey
}
func calculateVaR(for portfolioId: String) async -> Result<Computation_VarOutput, VaRServiceError> {
guard let url = URL(string: "\(baseURL)/calculateVaR/\(portfolioId)") else {
return .failure(.invalidURL)
}
var request = URLRequest(url: url)
request.httpMethod = "POST"
request.setValue("application/json", forHTTPHeaderField: "Content-Type") // The Java service takes JSON input
request.setValue(apiKey, forHTTPHeaderField: "Ocp-Apim-Subscription-Key") // APIM Key
// Minimal request body as the Java service enriches it.
let requestBody = ["simulationParameters": "default"]
request.httpBody = try? JSONEncoder().encode(requestBody)
do {
// Modern async/await networking call
let (data, response) = try await URLSession.shared.data(for: request)
guard let httpResponse = response as? HTTPURLResponse else {
// This case is unlikely but good to handle.
return .failure(.noData)
}
// A common mistake is to only check for 200. We must handle the full 2xx range.
guard (200...299).contains(httpResponse.statusCode) else {
return .failure(.serverError(statusCode: httpResponse.statusCode))
}
// The final response from the Java service is also Protobuf
// We assume the Content-Type is "application/x-protobuf"
let varOutput = try Computation_VarOutput(serializedData: data)
return .success(varOutput)
} catch let error as VaRServiceError {
// Re-throw our specific errors
return .failure(error)
} catch {
// Catch URLSession errors and wrap them
return .failure(.networkError(error))
}
}
}
The view model would then call this service and update the UI upon completion, ensuring the user experience remained fluid even during the 1-2 second wait.
Operational Visibility with Turbopack
This distributed system was a black box during initial testing. When latency spiked, it was impossible to know if the bottleneck was the MyBatis query, the Java-to-Python network hop, the SciPy computation, or serialization overhead. We needed a real-time SRE dashboard.
Our platform team was building internal tools with a Next.js stack. They were early adopters of Turbopack, the Rust-based successor to Webpack. For them, the choice was pragmatic: their monitoring dashboards were complex, with many components and real-time data feeds from Azure Monitor Log Analytics. The slow feedback loop with Webpack’s dev server was hindering their ability to iterate. Turbopack provided near-instantaneous Hot Module Replacement (HMR), which meant they could add new charts, tweak Kusto queries, and deploy changes to the dashboard significantly faster.
While we cannot show the Turbopack build system itself, here is a conceptual React component for the dashboard that illustrates its purpose.
PipelineLatencyMonitor.tsx
:
import { useState, useEffect } from 'react';
import { Chart } from 'react-chartjs-2'; // Example charting library
import { AzureMonitorLogAnalyticsClient } from '@azure/monitor-query'; // Conceptual client
import { DefaultAzureCredential } from '@azure/identity';
// This is a simplified representation of fetching data from Azure Monitor
async function fetchPipelineMetrics(client: AzureMonitorLogAnalyticsClient) {
const workspaceId = 'YOUR_LOG_ANALYTICS_WORKSPACE_ID';
const query = `
traces
| where message == "VaR calculation successful"
| extend details = parse_json(message)
| project timestamp, duration = toreal(details.durationMs)
| order by timestamp desc
| limit 100
`;
const response = await client.queryWorkspace(workspaceId, query, {
duration: 'PT1H' // Last 1 hour
});
// Process response into a format for the chart
return response.tables[0].rows.map(row => ({ x: row[0], y: row[1] }));
}
export function PipelineLatencyMonitor() {
const [latencyData, setLatencyData] = useState([]);
const [isLoading, setIsLoading] = useState(true);
useEffect(() => {
// In a real app, you would use a more robust auth provider.
const credential = new DefaultAzureCredential();
const client = new AzureMonitorLogAnalyticsClient(credential);
const interval = setInterval(() => {
fetchPipelineMetrics(client)
.then(data => setLatencyData(data))
.catch(console.error)
.finally(() => setIsLoading(false));
}, 5000); // Refresh every 5 seconds
return () => clearInterval(interval);
}, []);
// The fast iteration on this component—adjusting the query, changing chart types,
// adding new data sources—is what Turbopack significantly accelerated for the SRE team.
if (isLoading) return <div>Loading latency metrics...</div>;
const chartData = {
datasets: [{
label: 'SciPy Function Duration (ms)',
data: latencyData,
borderColor: 'rgb(75, 192, 192)',
}]
};
return <Chart type='line' data={chartData} />;
}
The choice of Turbopack had no direct impact on the production pipeline’s performance. Its value was in developer velocity for the tooling that provided critical operational insight.
Lingering Issues and Future Optimizations
The final system meets the sub-2-second performance target, but it is not without its architectural scars. The multi-language environment increases the cognitive load for developers and the operational burden for the SRE team. A single transaction now spans three different technology stacks, making end-to-end tracing and debugging complex, even with good observability.
The primary lingering issue is the synchronous HTTP call from the Java service to the Python function. While non-blocking on the Java side, it’s still a request/response pattern that is susceptible to transient network failures. A more resilient future iteration would decouple these services using Azure Service Bus. The Java service would publish a “calculation requested” event with the data, and the Python function would be triggered by this event. Upon completion, the Python function would publish a “calculation complete” event, which a SignalR service could then use to push the result directly back to the Swift client, eliminating polling and further improving perceived performance. This, however, represents a significant increase in complexity—a trade-off to be considered as the system’s criticality grows.