Monday, June 16, 2025

C/C++ header mismatch bug

I encountered a problem where a field in a global struct (myStruct) held a valid value before entering a function foo, but turned into garbage after entering it. When I consulted AI tools, they suggested that foo might be allocating very large local arrays, causing a stack overflow that could corrupt the global structure. Another possibility was an out-of-bounds write elsewhere in the code.

After a week of debugging and trying various solutions—such as increasing the thread's stack size—I discovered the root cause: The function foo was defined in a C library with multiple versions. Each version resided in a different folder but had the same file names. Which folder was used depended on a #define. I was including the header from one version of the library, but linking against the implementation from another. If the struct definitions had matched, this wouldn’t have caused an issue, but they differed—evident from the differing sizeof(myStruct). As a result, myStruct was interpreted using the wrong layout, leading to corrupted values from an incorrect memory region.

Sunday, June 15, 2025

C++ pointer bug

This C++ code has a significant bug that will cause undefined behavior:
#include <iostream>
class A {
  public:
    int val;
};
void reset(A *p_a) {
  if (p_a != NULL) {
      delete p_a;
  }
  p_a = new A();
}
int main() {
  A *p_a = new A();
  p_a->val = 5;
  std::cout << "Before reset, p_a->val:" << p_a->val << "\n";
  reset(p_a);
  std::cout << "After reset, p_a->val:" << p_a->val << "\n";
  return 0;
}

The reset function receives a copy of the pointer p_a, not a reference to it. When you modify p_a inside the function (with p_a = new A()), you're only changing the local copy - the original pointer in main() remains unchanged. What actually happens:

  1. p_a in main() points to an A object with val = 5 
  2. reset() receives a copy of this pointer 
  3. reset() deletes the original object (memory is freed) 
  4. reset() creates a new object, but assigns it only to the local copy 
  5. The original p_a in main() still points to the deleted memory 
  6. Accessing p_a->val after reset() is undefined behavior (accessing freed memory) 
The Fix: Pass the pointer by reference using a pointer-to-pointer or reference-to-pointer:
//Reference to pointer
void reset(A *&p_a) {
  if (p_a != nullptr) {
    delete p_a;
  }
  p_a = new A();
  // Call with: reset(p_a);

An even better fix is to use smart pointers, which removes the necessity for the reset function:

auto p_a = std::make_unique<A>();

You can detect such problems by enabling AddressSanitizer (ASAN) in Visual Studio:

  1. Right-click your project → Properties
  2. Go to Configuration Properties → C/C++ → General
  3. Set Enable Address Sanitizer to Yes (/fsanitize=address)
  4. Go to Configuration Properties → C/C++ → Optimization
  5. Set Optimization to Disabled (/Od) for better debugging
  6. Set Whole Program Optimization to No
  7. Go to Configuration Properties → C/C++ → Debug Information Format
  8. Set to Program Database (/Zi) or Program Database for Edit & Continue (/ZI)
In Eclipse CDT:

  1. Open your C/C++ project in Eclipse CDT
  2. Right-click project → Properties
  3. Navigate to C/C++ Build → Settings
  4. Under Tool Settings:
    1. GCC C++ Compiler → Miscellaneous
    2. GCC C Compiler → Miscellaneous
    3. Add to "Other flags": -fsanitize=address -g -O1
  5. Project Properties → C/C++ Build → Settings
  6. GCC C++ Linker → Miscellaneous
  7. Add to "Other objects": -fsanitize=address

Tuesday, May 27, 2025

Fuzzy Logic and Quake III Bots

Fuzzy logic is often used in decision-making systems where a detailed mathematical model of the system is unavailable or impractical. Instead of relying on equations, fuzzy logic encodes expert intuition into human-readable rules. These rules allow systems to make decisions based on approximate or linguistic input values, such as “low health” or “enemy nearby.”

For simple systems — say, with just one input and one output — fuzzy logic may be overkill. In those cases, a 1D interpolation (similar to proportional navigation) is often enough to generate smooth behavior transitions. But as systems grow more complex, fuzzy logic scales better than maintaining large interpolation grids or rigid condition trees.

While neural networks have become dominant in many domains, fuzzy logic still offers distinct advantages, especially in embedded or control-focused systems. Fuzzy logic requires structured human insight, while neural networks thrive on raw data and pattern discovery. For complex or poorly understood systems, writing fuzzy rules is impractical. Advantages of fuzzy logic over neural networks:

  1. Interpretability: Fuzzy rules are readable and understandable by developers and domain experts.
  2. Minimal training: Rules encode prior knowledge, reducing or eliminating the need for extensive data-driven training.
  3. Lightweight tuning: At most, fuzzy systems may require optimizing rule weights — a much simpler process than full network training.

One of the most interesting uses of fuzzy logic in gaming came from Quake III Arena. The bots in the game used fuzzy logic to evaluate possible behaviors — such as attack, search for health, search for a better weapon, retreat. Each action was assigned a desirability score based on fuzzy evaluations of current game state (e.g., health, distance to enemy, ammo). At each tick, the bot would choose the highest-scoring action.

To tune the bot parameters, the developers had bots play against each other and applied genetic algorithms to evolve the best-performing rule sets. Of course, they could not make the bots perfect because then a human player would never be able to win.

Monday, May 12, 2025

CPU, Analog, FPGA, or ASIC?

Algorithms can be implemented across a wide spectrum of hardware, each with its own trade-offs in speed, power, flexibility, cost, and scalability. Let’s compare the four main approaches:

1. Software on General-Purpose CPUs

Pros:

  • Easy to develop and debug: Rich toolchains, IDEs, and profiling tools.
  • Highly flexible: Reprogram anytime; modify algorithms at will.
  • Low development cost: No custom hardware needed; ready to run on PCs, servers, or microcontrollers.
  • Ecosystem and libraries: Access to optimized math libraries (e.g., FFTW, NumPy, BLAS).

Cons:

  • Latency and real-time constraints: OS overhead and unpredictable timing make hard real-time difficult. Soft real-time is achievable.
  • Performance limitations: Limited parallelism compared to hardware solutions.
  • High power consumption per operation: Especially inefficient for repetitive, simple tasks.

Ideal for general-purpose applications.

2. Analog Circuits

Pros:

  • Ultra-low latency: Signal is processed in real-time with no sampling delay.
  • Potentially high throughput: Continuous operation with no clock constraints.
  • Minimal power: No digital switching, especially useful in low-power sensors or RF front-ends.
  • No need for ADC/DAC: Processes raw analog signals directly.

Cons:

  • Limited precision: Susceptible to noise, drift, and component tolerances.
  • Hard to scale: Each additional function requires more physical components.
  • Difficult to tune or reconfigure: Redesign often requires physical changes.
  • No programmability: Once built, behavior is fixed or only marginally tunable.

Ideal for real-time sensing, analog filters, RF circuits, ultra-low power embedded front-ends.

3. Field-Programmable Gate Arrays (FPGAs)

Pros:

  • High parallelism: True concurrent execution of multiple operations.
  • Low deterministic latency: Ideal for real-time pipelines.
  • Reconfigurable hardware: Algorithms can be updated post-deployment.
  • Power-efficient: Much better performance-per-watt than CPUs for many tasks.

Cons:

  • Steep learning curve: Requires HDL knowledge (VHDL/Verilog) or high-level synthesis.
  • Toolchain complexity: Longer compile/synthesis times, debugging can be difficult.
  • Moderate development cost: More expensive than CPUs in small volumes.
  • Not optimal for floating-point math: Often better with fixed-point arithmetic.

Ideal for real-time video/audio processing, signal processing, robotics, hardware prototyping.

4. Custom Chips (ASICs)

Pros:

  • Maximum performance: Custom datapaths, memory layouts, and logic yield unmatched throughput.
  • Lowest power consumption: Fully optimized for the task at hand.
  • Smallest footprint: No unnecessary hardware or software overhead.
  • Production cost scales well: Extremely cheap per unit at high volumes.

Cons:

  • Astronomically high NRE (non-recurring engineering) cost: Millions of dollars just to reach first silicon.
  • Long time-to-market: Can take 6–24 months from design to tapeout.
  • Zero flexibility: Bugs in logic mean hardware re-spins.
  • High risk: A single design flaw can cost months of work and millions in losses.

Ideal for high-volume commercial products (e.g., smartphones, wireless chips), aerospace, medical devices, deep learning accelerators. Example: u-blox

Monday, April 14, 2025

First Step in Safety-Critical Software Development

The first action when developing safety-critical software is to add automatic commit checks for compiler warnings and reject the commit if any warnings are present. Enable the highest warning level and treat all warnings as errors.

Common Visual C++ warnings relevant to safety critical systems:

  1. C26451: Arithmetic overflow: Using operator 'op' on a value that may overflow the result (comes from Code Analysis with /analyze). Example: uint64_t c = a + b where a and b are of type uint32_t
  2. C4244: Conversion from ‘type1’ to ‘type2’, possible loss of data. Example: int → char or double → float
  3. C4018: Signed/unsigned mismatch. Can cause logic bugs and unsafe comparisons.
  4. C4701: Potentially uninitialized variable
  5. C4715: Not all control paths return a value
  6. C4013: 'function' undefined; assuming extern returning int
Of course, there is much more that needs to be done; in this blog post, I just wanted to focus on the first step from the perspective of a tech lead.

Saturday, April 12, 2025

UDP vs TCP

When working with real-time systems, it's important to understand how data is sent and received over UDP and TCP. The main reason to use UDP is that it can be 10x faster than TCP, but you have to be aware of its limitations.

UDP (User Datagram Protocol) sends data in discrete packets (called datagrams). Each call to sendto() on the sender side corresponds to exactly one recvfrom() on the receiver side.

  • No connection setup or teardown.
  • No built-in guarantees about delivery, order, or duplication (intermediate routers may retransmit packets if they think the first one was lost).
  • If you call sock.recvfrom(4) and the incoming packet is 9 bytes, you get the first 4 bytes—and the rest are discarded, i.e. you cannot get them with another receive call.

Rough Performance Advantage:

  • In short, bursty communications or real-time streaming, UDP can be 2x to 10x faster than TCP due to its lack of handshake, retransmission, and flow control mechanisms.
  • In my test case, UDP returned a DNS response in 42 ms, while TCP took 382 ms — nearly 9x faster.

TCP (Transmission Control Protocol) provides a continuous stream of bytes. It breaks your data into segments under the hood, but applications don’t see those packet boundaries.

  • Reliable: Guarantees delivery, order, and no duplication (if a duplicate packet is received, TCP automatically discards it).
  • Stream-oriented: You send bytes, not messages.
  • If you send b"Message 1", and call sock.recv(4), you might receive b"Mess", and then get the rest (b"age 1") in another call.
If the message might get corrupted during the creation phase rather than during transmission over the network, and you want to add a CRC to detect corruption at the application layer, then UDP might be better because it delivers the entire message, including the CRC, in a single recvfrom call.

Wednesday, February 12, 2025

Longitude convention and octal values

Recently, I encountered a bug in code that handled longitude values, where 33 degrees longitude is typically written as 033. Upon investigation, I discovered that Java was calculating 033 + 1 as 28 instead of 34. This happened because, when we write longitude values like "033" in geographic notation, we mean decimal 33 degrees. However, if we write it directly in Java, C++, or similar languages, it is interpreted as octal 33 (which equals decimal 27) due to the leading zero!

This is a classic example of why it's important to be careful when working with domain-specific number formats - what makes sense in geographic notation can have unexpected behavior when directly translated to programming language syntax!