Background and Motivation
I am interested in learning how to become more proficient at architecting FPGA systems with a Hard Processor System (HPS) and Programmable Logic (PL). I also want to become more proficient at writing testbenches and using assertions to verify that the design is working as intended.
In the late ’80s, I did some FPGA work writing in Verilog, and more recently, I worked on a large project using VHDL. I thought I would try to become proficient in SystemVerilog since I am exploring new things.
I acquired an Agilex 5 development board from Terasic, the DE25-NANO. I installed and am learning the ins and outs of Quartus and Questa. I’m using the free version.
I purchased some books so I can have reference material to learn from:
- Spear, Chris, and Greg Tumbush. SystemVerilog for Verification. 3rd ed., Springer, 2014.
- Vijayaraghavan, Srikanth, and Meyyappan Ramanathan. A Practical Guide for SystemVerilog Assertions. 2nd ed., Springer, 2014.
- Sutherland, Stuart, et al. SystemVerilog for Design Second Edition. 2nd ed., Springer, 2006.
- Bergeron, Janick, et al. Verification Methodology Manual for SystemVerilog. 2006th ed., Springer, 2005.
- Salemi, Ray. The Uvm Primer. Boston Light Press, 2013.
I think I have what I need to begin this learning exercise to support my career growth.
The First Project: Building a Foundation
To begin, I started with a tutorial project. I soon began modifying the example code to match my style:
- Used Platform Designer to build the clock system and reset system
- Created an async-to-sync block to synchronize the push-button inputs
- Added timing constraints to prevent timing errors or warnings
- Added a testbench for simulating the elementary logic
- Added an assertion file to help check a few requirements
The assertion file proved its value immediately - it caught that I was releasing the reset in my testbench before the minimum number of clocks I had specified. This early validation win reinforced the importance of using assertions even for simple logic.
I then added a tick generator to create the timing controls I needed.
The Core Challenge: PL vs HPS Partitioning
As I planned my next steps, I faced the fundamental question that every FPGA SoC developer must answer: What goes in the PL (FPGA fabric) and what goes in the HPS (processor)?
This isn’t just an academic question. Making the wrong choice can mean: - Wasting FPGA resources on tasks better suited for software - Creating timing issues by trying to do real-time processing in software - Over-complicating the design with unnecessary complexity
The Decision Framework
Through research and experimentation, I developed a simple decision framework:
Ask: “Does this need guaranteed timing or massive parallelism?” - Yes → Use PL (FPGA fabric) - No → Use HPS (ARM processors)
More specifically:
Use the PL when you need: - Deterministic timing (sub-microsecond response) - High bandwidth or low latency (direct pin access) - Massive parallelism (thousands of concurrent operations) - Custom interfaces or unusual bit widths - Real-time signal processing
Use the HPS when you need: - Control-flow heavy algorithms (branching, loops, conditionals) - OS services (networking, file systems, USB) - Rapid development or flexibility (software vs HDL) - Complex protocols (TCP/IP, HTTP) - Configuration and management tasks
A Real Example: ADC Data Acquisition
To put this framework into practice, I planned my next project: reading data from the on-board TLA2518 ADC (8-channel, 12-bit, 1 MSPS, SPI interface) and sending it over serial communication.
This breaks down into two parts:
- ADC Interface: Requires precise SPI timing, direct pin control → PL
- Data Communication: Could be either FPGA_UART (PL) or HPS USB
For learning purposes, I decided to start with a pure PL implementation using the FPGA_UART. This keeps the initial design simpler and lets me understand the FPGA side thoroughly before adding HPS complexity.
The ADC System Architecture
I designed a modular system with clear separation of concerns:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ ADC SPI │────▶│ ADC to ASCII │────▶│ UART TX │
│ Controller │ │ Converter │ │ (115200) │
└─────────────────┘ └──────────────────┘ └─────────────┘
▲ │
│ sample_trigger (10 Hz) │
│ ▼
┌─────────────────┐ Serial Output
│ Tick Generator │ "0000\r\n"
│ (5 MHz clock) │ to
└─────────────────┘ "4095\r\n"
Key Design Decisions
Clock Management: Used Platform Designer to create an IOPLL converting 50 MHz input to 5 MHz system clock. This simplified timing throughout the design.
Module Breakdown: - adc_spi_ctrl.sv -
TLA2518 SPI controller (manual mode, single channel) -
uart_tx.sv - 8N1 UART transmitter (115200 baud) -
adc_to_ascii.sv - Converts 12-bit ADC data to ASCII format
- adc_uart_top.sv - Top-level integration
Data Format: ASCII text (“0000” to “4095”) rather than binary. This makes debugging much easier with a simple serial terminal and trades bandwidth for debuggability.
Sample Rate: 10 Hz for initial testing. The UART is actually the bottleneck at ~1000 samples/sec for ASCII format, but 10 Hz is fine for validating the design.
Lessons Learned
1. Start with Infrastructure
Platform Designer was invaluable for getting clocks and resets right from the beginning. Don’t try to build everything from scratch - use the tools that handle the complex parts (PLLs, reset synchronization) and focus your effort on the application logic.
2. Modular Design Pays Off
By breaking the ADC system into separate modules, each with clear interfaces: - Each module can be tested independently - Changes are localized (modify UART without touching ADC) - The design is easier to understand and maintain
3. Assertions Catch Problems Early
Even simple assertions like “reset must be held for N clock cycles” caught issues during testbench development. These checks take minutes to write but can save hours of debugging.
4. ASCII is Your Friend for Debugging
Binary protocols are efficient, but ASCII makes it trivial to see what’s happening with a serial terminal. Start with ASCII for development, optimize to binary later if needed.
5. Read Vendor Examples, Then Write Your Own
The Terasic demo code was helpful for understanding the ADC protocol, but writing my own clean implementation helped me truly understand how it works.
Testbenches and Assertions: Verification from the Start
One of my primary goals was to become proficient at writing testbenches and using assertions. Coming from a background where verification was often an afterthought, I wanted to build verification into my workflow from the beginning.
Why Verification Matters
The cost of finding bugs increases exponentially as you move through the development cycle:
Bug found in simulation: Minutes to fix
Bug found on hardware: Hours to debug (limited visibility)
Bug found by customer: Days + reputation damage
Bug in shipped product: Recalls, redesigns, catastrophe
For FPGA SoCs, the situation is even more challenging because you’re dealing with hardware-software interaction, multiple clock domains, and complex timing relationships.
The Early Win: Reset Timing Assertion
My first assertion was simple but proved its value immediately. I wanted to ensure that reset was held for a minimum number of clock cycles to properly initialize the system.
The Assertion (in a separate assertion file):
// Reset must be held low for at least 10 clock cycles
property reset_duration;
@(posedge clk)
$fell(reset_n) |-> reset_n[*10]; // Reset stays low for 10 cycles
endproperty
assert_reset_duration: assert property (reset_duration)
else $error("Reset released too early - must be held for 10 cycles");
What it caught: In my testbench, I was releasing reset after only 5 clock cycles. The assertion immediately flagged this:
# ** Error: Reset released too early - must be held for 10 cycles
# Time: 25 ns Iteration: 1 Instance: /testbench
This was a trivial bug, but it validated the approach. If a simple assertion caught this, what else could assertions find?
Testbench Structure: Building for Reusability
I structured my testbenches to be modular and reusable. Here’s the pattern I developed:
1. Clock and Reset Generation
module testbench;
// Clock generation
logic clk = 0;
always #10ns clk = ~clk; // 50 MHz clock
// Reset generation with proper timing
logic reset_n;
initial begin
reset_n = 0;
repeat(15) @(posedge clk); // Hold reset for 15 cycles
reset_n = 1;
$display("Reset released at time %0t", $time);
end
// DUT signals
logic sample_trigger;
logic busy;
// ... other signals
// Device Under Test
adc_uart_top dut (.*);
2. Stimulus Generation
// Test stimulus
initial begin
sample_trigger = 0;
// Wait for reset release
@(posedge reset_n);
repeat(5) @(posedge clk);
// Test case 1: Single sample
$display("Test 1: Single sample trigger");
@(posedge clk);
sample_trigger = 1;
@(posedge clk);
sample_trigger = 0;
// Wait for completion
wait(busy == 0);
$display("Sample complete at time %0t", $time);
// Test case 2: Back-to-back samples
// ... more test cases
#10us;
$display("All tests passed!");
$finish;
end
3. Response Checking
// Monitor UART output
logic [7:0] uart_byte;
int byte_count = 0;
always @(posedge clk) begin
if (uart_tx_valid) begin
uart_byte = uart_tx_data;
$display("UART TX: 0x%02h ('%c')", uart_byte, uart_byte);
byte_count++;
end
end
// Check expected sequence
assert property (
@(posedge clk) uart_tx_valid && (byte_count == 5)
|-> (uart_byte == "\r")
) else $error("Expected CR at position 5");
Practical Assertions for FPGA Designs
Here are assertions I found valuable during development:
1. Handshake Protocol Assertions
// Valid and ready handshake must not have gaps
property no_valid_gaps;
@(posedge clk) disable iff (!reset_n)
valid && !ready |=> valid; // Valid stays high until ready
endproperty
assert_no_gaps: assert property (no_valid_gaps)
else $error("Valid signal dropped before ready");
2. State Machine Assertions
// State machine must not enter invalid states
property valid_states;
@(posedge clk) disable iff (!reset_n)
state inside {IDLE, ADDR, DATA, WAIT, DONE};
endproperty
assert_valid_states: assert property (valid_states)
else $error("State machine in invalid state: %0d", state);
// Timeout: Don't stay in WAIT state forever
property wait_timeout;
@(posedge clk) disable iff (!reset_n)
(state == WAIT) |-> ##[1:1000] (state != WAIT);
endproperty
assert_wait_timeout: assert property (wait_timeout)
else $error("Stuck in WAIT state > 1000 cycles");
3. Clock Domain Crossing Assertions
// Single-bit CDC: signal must be stable for 2 clock cycles
property cdc_stable;
@(posedge clk_dest) disable iff (!reset_n)
$stable(signal_from_source)[*2]; // Must be stable
endproperty
assert_cdc: assert property (cdc_stable)
else $warning("CDC signal changed too quickly");
4. Data Integrity Assertions
// ADC data must be within valid range (12-bit)
property adc_range;
@(posedge clk) disable iff (!reset_n)
adc_sample_valid |-> (adc_sample_data <= 12'hFFF);
endproperty
assert_adc_range: assert property (adc_range)
else $error("ADC data out of range: %h", adc_sample_data);
5. Timing Relationship Assertions
// Chip select must go high before clock starts
property cs_before_clk;
@(posedge clk) disable iff (!reset_n)
$fell(adc_cs_n) |-> !adc_sck throughout ##1 adc_sck[->1];
endproperty
assert_cs_clk: assert property (cs_before_clk)
else $error("SPI clock started before CS asserted");
Functional Coverage: What Did We Test?
Beyond assertions, I used functional coverage to track what scenarios were actually tested:
covergroup adc_coverage @(posedge clk);
// Cover all ADC channels
channel: coverpoint adc_channel {
bins channels[] = {[0:7]};
}
// Cover ADC value ranges
value: coverpoint adc_sample_data {
bins low = {[0:1023]};
bins mid = {[1024:3071]};
bins high = {[3072:4095]};
bins corner = {0, 4095};
}
// Cross coverage: all channels at all ranges
chan_x_val: cross channel, value;
endgroup
adc_coverage cov_inst = new();
After running tests, I could check coverage:
# Coverage Report:
# channel: 100% (8/8 bins)
# value: 100% (4/4 bins)
# chan_x_val: 62.5% (20/32 bins) <- Need more test cases!
Verification Workflow
My workflow became:
- Write the module interface (ports, parameters)
- Write basic assertions (reset behavior, state validity, range checks)
- Write the testbench framework (clock, reset, stimulus)
- Implement the RTL
- Run simulation, fix assertion failures
- Add more assertions as bugs are found
- Check coverage, add tests for gaps
- Iterate until confident
This “verification-first” approach meant I caught bugs in simulation rather than on hardware.
Practical Tips for Testbench Development
1. Use $display liberally during
development
$display("[%0t] State: %s, Count: %0d", $time, state.name(), count);
2. Create reusable tasks for common operations
task send_sample_trigger();
@(posedge clk);
sample_trigger <= 1;
@(posedge clk);
sample_trigger <= 0;
endtask
3. Use waveforms for complex timing
Assertions are great for automated checking, but waveforms are essential for understanding complex timing issues. I always saved waveforms during simulation:
initial begin
$dumpfile("waveform.vcd");
$dumpvars(0, testbench);
end
4. Test corner cases explicitly - Minimum and maximum values - Back-to-back operations - Simultaneous events - Timeout conditions
5. Separate assertions into their own file
Keep assertions in a separate file (e.g.,
module_assertions.sv) that can be bound to the module:
bind adc_spi_ctrl adc_spi_ctrl_assertions check_inst (.*);
This keeps the RTL clean and makes assertions easy to enable/disable.
The Payoff
The time invested in testbenches and assertions paid off immediately: - Caught the reset timing bug before hardware - Validated SPI timing without a logic analyzer - Verified UART framing in simulation - Built confidence before committing to synthesis
More importantly, it established a workflow that scales. As designs get more complex, the verification infrastructure is already in place.
For anyone starting with FPGA development: write the testbench first. It forces you to think about the module’s behavior before implementation, and it gives you a safety net as you develop.
Next Steps
With the PL-only ADC system designed, my next steps are:
- Hardware Testing: Copy the modules to the fpga_1 project, compile, and test on the actual board
- Debug and Iterate: Use the serial terminal to verify ADC readings
- Add HPS Integration: Move communication to HPS USB, learn about HPS-to-PL bridges
- Expand Verification: Add comprehensive testbenches and assertions
- Multi-channel Sampling: Expand from single-channel to multi-channel ADC reads
Once I have both PL and HPS working together, I’ll have a solid foundation for developing more complex FPGA SoC products. The key is building up systematically - understand each piece thoroughly before adding the next layer of complexity.
Advice for Others Getting Started
If you’re new to FPGA SoC development, here’s what I’d recommend:
- Start Simple: Get LEDs blinking with properly managed clocks and resets before tackling complex designs
- Use Platform Designer: Don’t hand-code clock management - let the tools handle it
- Think About Partitioning Early: Before you write any RTL, decide what goes in PL vs HPS
- Write Testbenches: Even for simple modules, testbenches help you understand behavior
- Use Assertions: They’re not just for complex designs - simple assertions catch simple bugs
- ASCII for Debug: Binary is efficient, ASCII is debuggable
- Read the Vendor Docs: The reference manuals are dense but essential
- Build Incrementally: Working designs that do less are better than ambitious designs that don’t work
Understanding the Bridges: HPS-to-PL Communication
One of the most important concepts for FPGA SoC development is understanding how the HPS and PL communicate. The Agilex 5 provides several bridges:
┌─────────────────────────────────────────────────────────────────┐
│ HPS │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Cortex-A76 │ │ Cortex-A76 │ │ Cortex-A55 │ ... │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ┌─────┴─────┐ │
│ │ CCU │ (Cache Coherent Unit) │
│ └─────┬─────┘ │
│ │ │
├──────────────────────────┼───────────────────────────────────────┤
│ │ BRIDGES │
│ ┌─────────────────────┼─────────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ F2H │ │ H2F │ │ F2SDRAM │ │
│ │ (FPGA to │ │ (HPS to FPGA)│ │ (FPGA to │ │
│ │ HPS) │ │ │ │ DDR) │ │
│ └──────────┘ └──────────────┘ └──────────┘ │
├──────────────────────────────────────────────────────────────────┤
│ FPGA FABRIC (PL) │
└──────────────────────────────────────────────────────────────────┘
Bridge Selection Guide: - H2F (HPS-to-FPGA): CPU controls PL peripherals, register access - F2H (FPGA-to-HPS): PL masters accessing HPS memory/peripherals - F2SDRAM: High-bandwidth DMA, frame buffers, bulk data transfers
Choosing the right bridge is crucial for performance. Don’t try to push gigabytes through register interfaces - use DMA and the SDRAM bridge for bulk data.
Common Design Patterns
Pattern 1: Real-Time I/O with Supervisory Control
┌─────────────────┐ ┌─────────────────────────────────┐
│ HPS │ │ PL │
│ │ │ │
│ - UI/Display │ │ Real-Time Engine: │
│ - Logging │◀───▶│ - Motor control loops │
│ - Config │ │ - ADC sampling │
│ - Network │ │ - PWM generation │
│ - Safety │ │ - Sensor interfaces │
│ monitoring │ │ │
└─────────────────┘ └────────────────┬─────────────────┘
│
▼
Physical I/O
This is the pattern I’m implementing with the ADC system. The PL handles microsecond-level timing for the ADC and UART, while the HPS (in future iterations) will handle configuration, data logging, and network communication.
Pattern 2: Offload Accelerator
When you have compute-intensive operations (image processing, encryption, FFTs), the PL can process data in parallel while the HPS manages control flow. The HPS sets up data buffers in DDR, triggers the PL accelerator, and gets notified via interrupt when processing completes.
The Journey from Idea to Working Product
The journey from idea to working FPGA SoC product is challenging, but breaking it into manageable steps and building a solid foundation makes it achievable:
- Foundation Phase (where I am now): Basic infrastructure, simple peripherals, understanding tools
- Integration Phase (next): Connecting HPS and PL, using bridges, driver development
- Optimization Phase: Performance tuning, resource optimization, advanced features
- Product Phase: Reliability, testing, documentation, deployment
Each phase builds on the previous one. Rushing ahead without a solid foundation leads to confusion and frustration.
Good luck with your own FPGA adventures!