Hardware in the Loop Aerospace Simulation Development

Friday, July 11, 2025

Double-to-Int Conversion with Bit Shifting

We often need to pack a large numeric range into 32 bits. For instance, timestamps in microseconds over a 36 minute period exceed Integer.MAX_VALUE. By discarding the least significant bits (via right shift), we can fit the value and later we can recover it by the same amount of left shift. However, increasing the number of shifts decreases accuracy. What we need to optimize is to find the right amount of bit shift that covers the range and minimizes error. The following Java code investigates this:

Monday, June 16, 2025

C/C++ header mismatch bug

I encountered a problem where a field in a global struct (myStruct) held a valid value before entering a function foo, but turned into garbage after entering it. When I consulted AI tools, they suggested that foo might be allocating very large local arrays, causing a stack overflow that could corrupt the global structure. Another possibility was an out-of-bounds write elsewhere in the code.

After a week of debugging and trying various solutions—such as increasing the thread's stack size—I discovered the root cause: The function foo was defined in a C library with multiple versions. Each version resided in a different folder but had the same file names. Which folder was used depended on a #define. I was including the header from one version of the library, but linking against the implementation from another. If the struct definitions had matched, this wouldn’t have caused an issue, but they differed—evident from the differing sizeof(myStruct). As a result, myStruct was interpreted using the wrong layout, leading to corrupted values from an incorrect memory region.

Sunday, June 15, 2025

C++ pointer bug

This C++ code has a significant bug that will cause undefined behavior:

#include <iostream>
class A {
  public:
    int val;
};
void reset(A *p_a) {
  if (p_a != NULL) {
      delete p_a;
  }
  p_a = new A();
}
int main() {
  A *p_a = new A();
  p_a->val = 5;
  std::cout << "Before reset, p_a->val:" << p_a->val << "\n";
  reset(p_a);
  std::cout << "After reset, p_a->val:" << p_a->val << "\n";
  return 0;
}

The reset function receives a copy of the pointer p_a, not a reference to it. When you modify p_a inside the function (with p_a = new A()), you're only changing the local copy - the original pointer in main() remains unchanged. What actually happens:

p_a in main() points to an A object with val = 5
reset() receives a copy of this pointer
reset() deletes the original object (memory is freed)
reset() creates a new object, but assigns it only to the local copy
The original p_a in main() still points to the deleted memory
Accessing p_a->val after reset() is undefined behavior (accessing freed memory)

The Fix: Pass the pointer by reference using a pointer-to-pointer or reference-to-pointer:

//Reference to pointer
void reset(A *&p_a) {
  if (p_a != nullptr) {
    delete p_a;
  }
  p_a = new A();
  // Call with: reset(p_a);

An even better fix is to use smart pointers, which removes the necessity for the reset function:

auto p_a = std::make_unique<A>();

You can detect such problems by enabling AddressSanitizer (ASAN) in Visual Studio:

Right-click your project → Properties
Go to Configuration Properties → C/C++ → General
Set Enable Address Sanitizer to Yes (/fsanitize=address)
Go to Configuration Properties → C/C++ → Optimization
Set Optimization to Disabled (/Od) for better debugging
Set Whole Program Optimization to No
Go to Configuration Properties → C/C++ → Debug Information Format
Set to Program Database (/Zi) or Program Database for Edit & Continue (/ZI)

In Eclipse CDT:

Open your C/C++ project in Eclipse CDT
Right-click project → Properties
Navigate to C/C++ Build → Settings
Under Tool Settings:

GCC C++ Compiler → Miscellaneous
GCC C Compiler → Miscellaneous
Add to "Other flags": -fsanitize=address -g -O1

Project Properties → C/C++ Build → Settings
GCC C++ Linker → Miscellaneous
Add to "Other objects": -fsanitize=address

Tuesday, May 27, 2025

Fuzzy Logic and Quake III Bots

Fuzzy logic is often used in decision-making systems where a detailed mathematical model of the system is unavailable or impractical. Instead of relying on equations, fuzzy logic encodes expert intuition into human-readable rules. These rules allow systems to make decisions based on approximate or linguistic input values, such as “low health” or “enemy nearby.”

For simple systems — say, with just one input and one output — fuzzy logic may be overkill. In those cases, a 1D interpolation (similar to proportional navigation) is often enough to generate smooth behavior transitions. But as systems grow more complex, fuzzy logic scales better than maintaining large interpolation grids or rigid condition trees.

While neural networks have become dominant in many domains, fuzzy logic still offers distinct advantages, especially in embedded or control-focused systems. Fuzzy logic requires structured human insight, while neural networks thrive on raw data and pattern discovery. For complex or poorly understood systems, writing fuzzy rules is impractical. Advantages of fuzzy logic over neural networks:

Interpretability: Fuzzy rules are readable and understandable by developers and domain experts.
Minimal training: Rules encode prior knowledge, reducing or eliminating the need for extensive data-driven training.
Lightweight tuning: At most, fuzzy systems may require optimizing rule weights — a much simpler process than full network training.

One of the most interesting uses of fuzzy logic in gaming came from Quake III Arena. The bots in the game used fuzzy logic to evaluate possible behaviors — such as attack, search for health, search for a better weapon, retreat. Each action was assigned a desirability score based on fuzzy evaluations of current game state (e.g., health, distance to enemy, ammo). At each tick, the bot would choose the highest-scoring action.

To tune the bot parameters, the developers had bots play against each other and applied genetic algorithms to evolve the best-performing rule sets. Of course, they could not make the bots perfect because then a human player would never be able to win.

Monday, May 12, 2025

CPU, Analog, FPGA, or ASIC?

Algorithms can be implemented across a wide spectrum of hardware, each with its own trade-offs in speed, power, flexibility, cost, and scalability. Let’s compare the four main approaches:

1. Software on General-Purpose CPUs

Pros:

Easy to develop and debug: Rich toolchains, IDEs, and profiling tools.
Highly flexible: Reprogram anytime; modify algorithms at will.
Low development cost: No custom hardware needed; ready to run on PCs, servers, or microcontrollers.
Ecosystem and libraries: Access to optimized math libraries (e.g., FFTW, NumPy, BLAS).

Cons:

Latency and real-time constraints: OS overhead and unpredictable timing make hard real-time difficult. Soft real-time is achievable.
Performance limitations: Limited parallelism compared to hardware solutions.
High power consumption per operation: Especially inefficient for repetitive, simple tasks.

Ideal for general-purpose applications.

2. Analog Circuits

Pros:

Ultra-low latency: Signal is processed in real-time with no sampling delay.
Potentially high throughput: Continuous operation with no clock constraints.
Minimal power: No digital switching, especially useful in low-power sensors or RF front-ends.
No need for ADC/DAC: Processes raw analog signals directly.

Cons:

Limited precision: Susceptible to noise, drift, and component tolerances.
Hard to scale: Each additional function requires more physical components.
Difficult to tune or reconfigure: Redesign often requires physical changes.
No programmability: Once built, behavior is fixed or only marginally tunable.

Ideal for real-time sensing, analog filters, RF circuits, ultra-low power embedded front-ends.

3. Field-Programmable Gate Arrays (FPGAs)

Pros:

High parallelism: True concurrent execution of multiple operations.
Low deterministic latency: Ideal for real-time pipelines.
Reconfigurable hardware: Algorithms can be updated post-deployment.
Power-efficient: Much better performance-per-watt than CPUs for many tasks.

Cons:

Steep learning curve: Requires HDL knowledge (VHDL/Verilog) or high-level synthesis.
Toolchain complexity: Longer compile/synthesis times, debugging can be difficult.
Moderate development cost: More expensive than CPUs in small volumes.
Not optimal for floating-point math: Often better with fixed-point arithmetic.

Ideal for real-time video/audio processing, signal processing, robotics, hardware prototyping.

4. Custom Chips (ASICs)

Pros:

Maximum performance: Custom datapaths, memory layouts, and logic yield unmatched throughput.
Lowest power consumption: Fully optimized for the task at hand.
Smallest footprint: No unnecessary hardware or software overhead.
Production cost scales well: Extremely cheap per unit at high volumes.

Cons:

Astronomically high NRE (non-recurring engineering) cost: Millions of dollars just to reach first silicon.
Long time-to-market: Can take 6–24 months from design to tapeout.
Zero flexibility: Bugs in logic mean hardware re-spins.
High risk: A single design flaw can cost months of work and millions in losses.

Ideal for high-volume commercial products (e.g., smartphones, wireless chips), aerospace, medical devices, deep learning accelerators. Example: u-blox

Monday, April 14, 2025

First Step in Safety-Critical Software Development

The first action when developing safety-critical software is to add automatic commit checks for compiler warnings and reject the commit if any warnings are present. Enable the highest warning level and treat all warnings as errors.

Common Visual C++ warnings relevant to safety critical systems:

C26451: Arithmetic overflow: Using operator 'op' on a value that may overflow the result (comes from Code Analysis with /analyze). Example: uint64_t c = a + b where a and b are of type uint32_t
C4244: Conversion from ‘type1’ to ‘type2’, possible loss of data. Example: int → char or double → float
C4018: Signed/unsigned mismatch. Can cause logic bugs and unsafe comparisons.
C4701: Potentially uninitialized variable
C4715: Not all control paths return a value
C4013: 'function' undefined; assuming extern returning int

Of course, there is much more that needs to be done; in this blog post, I just wanted to focus on the first step from the perspective of a tech lead.

Saturday, April 12, 2025

UDP vs TCP

When working with real-time systems, it's important to understand how data is sent and received over UDP and TCP. The main reason to use UDP is that it can be 10x faster than TCP, but you have to be aware of its limitations.

UDP (User Datagram Protocol) sends data in discrete packets (called datagrams). Each call to sendto() on the sender side corresponds to exactly one recvfrom() on the receiver side.

No connection setup or teardown.
No built-in guarantees about delivery, order, or duplication (intermediate routers may retransmit packets if they think the first one was lost).
If you call sock.recvfrom(4) and the incoming packet is 9 bytes, you get the first 4 bytes—and the rest are discarded, i.e. you cannot get them with another receive call.

Rough Performance Advantage:

In short, bursty communications or real-time streaming, UDP can be 2x to 10x faster than TCP due to its lack of handshake, retransmission, and flow control mechanisms.
In my test case, UDP returned a DNS response in 42 ms, while TCP took 382 ms — nearly 9x faster.

TCP (Transmission Control Protocol) provides a continuous stream of bytes. It breaks your data into segments under the hood, but applications don’t see those packet boundaries.

Reliable: Guarantees delivery, order, and no duplication (if a duplicate packet is received, TCP automatically discards it).
Stream-oriented: You send bytes, not messages.
If you send b"Message 1", and call sock.recv(4), you might receive b"Mess", and then get the rest (b"age 1") in another call.

If the message might get corrupted during the creation phase rather than during transmission over the network, and you want to add a CRC to detect corruption at the application layer, then UDP might be better because it delivers the entire message, including the CRC, in a single recvfrom call.