XycLOPs III

April 27, 20267 minSarvesh Karunakaran

A desktop optimization tool that automates analog circuit tuning by coupling the open-source Sandia Xyce simulator with parallel numerical optimization.

Built for Sandia National Laboratories as part of CSCE 483.

Overview

Analog circuit tuning is one of those engineering tasks that sounds straightforward until you are actually doing it. An engineer needs to find the right values for a handful of resistors, capacitors, and inductors so that the circuit behaves the way it should. The traditional workflow is manual: adjust some component values, run a simulation, compare the output to the target, and repeat. On a simple circuit this is tedious. On a complex one it can take hours or even days.

XycLOPs III is the third generation of an internal optimization tool developed for Sandia National Laboratories that automates this process. Engineers upload a SPICE-compatible netlist, select the components they want to tune, define a target output curve, and let the optimizer do the rest. The tool handles the simulation loop automatically, adjusting component values each iteration until the simulated output matches the target as closely as possible.

The goal was not to replace the engineer's judgment. It was to eliminate the part of the job that was just waiting.

Our team of five inherited XycLOPs II from a prior capstone group and spent the semester transforming it into a faster, more scalable, and more usable platform. The core Xyce simulator was not touched. Everything around it was rebuilt.

The Problem

The previous version (XycLOPs II) worked, but it had a fundamental performance bottleneck. At the heart of any gradient-based optimizer is the Jacobian matrix $J(\mathbf{x})$ , where each entry describes how a small change in component parameter $x_j$ shifts the $i$ -th point of the simulated output. Formally:

J(\mathbf{x}) \in \mathbb{R}^{m \times n},\quad J_{ij} = \frac{\partial r_i}{\partial x_j}

where $\mathbf{r}(\mathbf{x}) \in \mathbb{R}^m$ is the residual vector, the pointwise difference between the simulated output and the target curve across $m$ sample points. Computing each column of $J$ requires running a separate perturbed simulation, and in XycLOPs II those simulations ran one at a time.

On a circuit with ten tunable components, that meant ten sequential simulation runs just to compute a single Jacobian before the optimizer could take one step. Our sponsor Matthew McDonough had circuits that were taking hours to optimize. The hardware was sitting mostly idle while the software waited for each simulation to finish before starting the next one.

Building It

Trust Region Filter (TRF)

We parallelized the Jacobian computation. Instead of running each perturbed simulation sequentially, we built a parallel finite-difference Jacobian module that submits all columns to a thread pool simultaneously. The central difference approximation for each column is:

\frac{\partial \mathbf{r}}{\partial x_j} \approx \frac{\mathbf{r}(\mathbf{x} + h\mathbf{e}_j) - \mathbf{r}(\mathbf{x} - h\mathbf{e}_j)}{2h}

where $h$ is a step size scaled relative to the magnitude of $x_j$ :

h = \varepsilon^{1/3} \cdot \max(1, |x_j|)

with $\varepsilon$ being machine epsilon. Near parameter bounds where a central stencil would step outside the feasible region, the module falls back to forward or backward three-point formulas to stay within the box constraints. Each simulation writes its output to its own isolated directory so concurrent runs never interfere with each other.

On a buck converter test circuit this alone reduced runtime from about 12.75 seconds to 2.78 seconds, a speedup of roughly 4.5×. The formal optimization problem being solved is nonlinear least squares:

\min_{\mathbf{x}} \|\mathbf{r}(\mathbf{x})\|^2 = \sum_i r_i(\mathbf{x})^2

subject to box constraints $\mathbf{lb} \le \mathbf{x} \le \mathbf{ub}$ on each component value. A trust region filter method handles this by building a local linear model of the residual at each iterate:

\mathbf{r}(\mathbf{x} + \boldsymbol{\delta}) \approx \mathbf{r}(\mathbf{x}) + J(\mathbf{x})\,\boldsymbol{\delta}

and choosing a step $\boldsymbol{\delta}$ that reduces $\|\mathbf{r}\|$ while staying within a trust region radius and satisfying the filter's acceptance criteria for feasibility.

CMA-ES

We added CMA-ES, short for Covariance Matrix Adaptation Evolution Strategy. Rather than computing a Jacobian at all, CMA-ES evaluates a whole population of candidate solutions each generation, all in parallel. It minimizes the same scalar objective:

f(\mathbf{x}) = \|\mathbf{r}(\mathbf{x})\|^2 = \mathbf{r}(\mathbf{x})^{\mathsf{T}}\mathbf{r}(\mathbf{x})

but explores the space by maintaining a multivariate Gaussian distribution over candidate solutions and adapting its covariance matrix based on which directions produced improvements. The population size each generation follows:

\lambda = \max\left(4 + \lfloor 3 \ln n \rfloor,\, W\right)

where $n$ is the number of tunable parameters and $W$ is the worker cap, ensuring every generation can fully saturate the available thread pool. Per-parameter initial standard deviations are set to:

\sigma_j = 0.25 \cdot (\mathrm{ub}_j - \mathrm{lb}_j)

so components on very different scales, ohms vs farads for instance, get appropriately sized search ranges rather than a single global step width.

XycLOPs III dashboard showing CMA-ES population-based optimization progress and convergence

The codebase was also refactored significantly. What had been a loosely connected set of scripts became a modular architecture with a clean separation between the GUI, the optimization engine, the simulation layer, and output processing. We added a project save and load system, better result visualization, enhanced logging, and packaged the whole thing as a Windows executable so Sandia could use it without any Python setup.

The Two Algorithms

Choosing between TRF and CMA-ES depends on the circuit and the machine. Neither method guarantees a global optimum on arbitrary netlists.

Trust Region Filter (TRF)

TRF is a local optimizer that works well when the starting netlist is close to a good solution and the landscape is smooth. Each iteration builds on a parallel finite-difference Jacobian and takes constrained trust-region steps toward lower residual norm.

CMA-ES

CMA-ES has no gradient requirement and fully utilizes available cores regardless of parameter count, making it better suited for noisy landscapes or when a broader global search is needed. Each generation evaluates an entire population of candidates in parallel rather than perturbing one parameter at a time.

Convergence in the dashboard is tracked using an RMS-based similarity metric:

\mathrm{Convergence} = \max\left(0,\; 100 \cdot \left(1 - \frac{\mathrm{RMS}_{\mathrm{error}}}{\mathrm{RMS}_{\mathrm{target}}}\right)\right)

where $\mathrm{RMS}_{\mathrm{error}} = \sqrt{\mathrm{mean}(e_i^2)}$ over the pointwise residuals and $\mathrm{RMS}_{\mathrm{target}} = \sqrt{\mathrm{mean}(y_{\mathrm{target}}^2)}$ . A score of 100 means perfect agreement with the target curve.

The tradeoffs between the two approaches are summarized below:

Aspect	TRF (Trust Region Filter)	CMA-ES
Gradient / Jacobian	Required (parallel finite-diff)	Not required
Search character	Local, trust-region steps	Adaptive covariance
Parallelization	Jacobian columns per iteration	Evaluations per generation
Best when	Starting point is close	Noisy landscape
Global optimum	Not guaranteed	Not guaranteed

What We Learned

Working on software that is already in active use by real engineers changes how you think about every decision. Performance improvements that would be nice to have on a personal project become critical when someone's workflow takes a day and you can cut it to an hour. Correctness matters more too, because the engineer on the other end is trusting the output to make real design decisions.

The project also taught us a lot about the gap between academic software and production software. Making it something a Sandia engineer could actually sit down and use without a Python environment, without reading the source code, and without hitting confusing edge cases, turned out to be as much work as the performance improvements themselves.

What's Next

The noise analysis plot currently displays voltage instead of the actual noise spectrum, a known bug that did not get resolved before the deadline. Validating CMA-ES performance on Sandia's Linux high-performance computing environment is the most important remaining task, since that is where the scaling advantages of the algorithm would actually be realized. From there the natural next step would be expanding the tool to support more circuit types and analysis modes beyond the transient, AC, and noise workflows it handles today.

If you want to learn more or view the code for this project, feel free to reach out via email.

XycLOPs III capstone project poster for Sandia National Laboratories

← Back to Work