Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • A ALPACA-Advanced L-band Phased Array Camera for Astronomy
  • Project information
    • Project information
    • Activity
    • Members
  • Wiki
    • Wiki
  • Activity
Collapse sidebar
  • ALPACA
  • ALPACA-Advanced L-band Phased Array Camera for Astronomy
  • Wiki
  • 6 fw sw design
  • 6.1

6.1 · Changes

Page history
initial upload starting from doc-pkg authored Jul 28, 2022 by Mitch Burnett's avatar Mitch Burnett
Show whitespace changes
Inline Side-by-side
6-fw-sw-design/6.1.md 0 → 100644
View page @ 4842ebf7
[<< Home](/home#6-firmware-and-software-design-panel-charge-5)
[<< Section 5.4](/5-dbe/5.4)
## 6.1 Beamformer F-Engine Firmware
An overview of the beamformer digital back end and its operating modes was given
in [Section 5.1](../5-dbe/5.1). The capability to provide different modes with
either coarse or narrow-band spectral data products is realized by a two-stage
channelizer architecture. First stage digital processing will be done in the
[RFSoC](../5-dbe/5.2). A high-level block diagram is shown in the following
figure which depicts the signal path through the RFSoC for one antenna element.
The input signal is received over the RFoF link, and output is routed to the
second-stage processor through the 100 GbE network and switch.
<div align="center">
<img src="../img/dbe/f-engine-blk-diagram.png" width=800"\>
Figure 1: RFSoC single antenna signal flow diagram
</div>
The first stage digital processing (F-engine) includes sampling antenna voltages,
frequency channelization, and data "packetizing" for network transport.
The following describes functionality and implementation of the IP used in the F-engine
following the ADC. The RFSoCs sampling capabilities, configuration, and
operation in the context of ALPACA were addressed in [Section 5.2](../5-dbe/5.2).
### 6.1.1 Oversampled Polyphase Filter Bank
A channelizer is a filter bank used to decompose an input signal into bins by
frequency. In high-performance real-time systems, computationally efficient
channelization is achieved by using a polyphase filter bank (PFB) as opposed to
a conventional fast Fourier transform (FFT) because of its ability to reduce
spectral leakage and signal attenuation near frequency bin edges (called scalloping
loss).
Single-stage PFB implementations follow a conventional design approach where the
frequency response of the prototype low-pass filter (LPF) has low sidelobes,
narrow transition bands, and the attenuation specification at the crossover
point between adjacent channels is -3 dB. This results in a uniform power spread
for spectra across the full bandwidth of the instrument. The PFB which
accomplishes this is called a critically sampled (or maximally decimated) PFB
because the channelizer output sample rate per channel, in samples per second,
is equal to the effective channel spacing in Hertz [^harris].
In two-stage channelizer architectures, when this same approach is followed but
output products are then subsequently processed by a second-stage "zoom" PFB,
this results in two significant processing artifacts observed in the fine
channelized spectrum in regions corresponding to coarse adjacent channel
crossovers. These undesirable artifacts are scalloping between adjacent
adjacent fine channels, and spectral aliasing between fine channels. An example
of this behavior is shown in the following figure:
<div align="center">
<img src="../img/dbe/second_stage_alias.png" width=600"\>
</div>
Figure 2: System degrading processing artifacts are present when a critically
sampled PFB is followed by a second-stage channelizer. Note the aliased
frequency tone (red curve) and scalloping of the white noise floor which should
be flat (black curve).
Despite the design of the LPF in the first-stage being correct for a
channelizer design, the scalloping shown is the expected result because the
filter frequency response in the transition band is sampled at a finer
frequency resolution as a result of the second channelizer. The spectral images
that occur from signals present at the coarse channel boundary are a more severe
artifact and occur because the filter was not designed to attenuate aliases at
the same level as a conventional anti-aliasing filter.
To avoid these spectral corruptions when processing in fine "zoom" spectrometer mode,
the channelizer in the ALPACA F-engine is not the conventional critically
sampled PFB, but an oversampled PFB (OSPFB). Here, the decimation rate of the
first-stage channelizer is decreased and the channel passband shape is designed
to allow for a slight overlap between adjacent channels in their crossover
region. Following the output of the second-stage critically sampled PFB, the
fine channels in the overlapped region are discarded eliminating all unwanted
processing artifacts. With proper prototype filter design only a few channels of
overlap are required. The OSPFB does increase the channelizer output
sampling rate (compared to the critically sampled case), and this needs to be accounted
for as part of the allocated I/O budget.
The following figure shows a software simulation result comparing the output of
a second-stage PFB for fine spectrometer mode when the first-stage PFB is either
critically sampled or oversampled. A signal of interest is placed between
adjacent channels within the passband. When the first-stage PFB is critically
sampled we again see the scalloping and aliased image of the signal of interest.
The OSPFB successfully removes these unwanted artifacts producing a uniform
power spectrum.
<div align="center">
<img src="../img/dbe/os_pfb_mat.png" width=600"\>
Figure 3: Improved second stage spectrum with an OSPFB first stage.
</div>
The architecture for the implementation of an OSPFB can be derived by starting
with that of a critically sampled PFB. As shown in the following figure, a PFB
channelizer producing $`M`$ frequency bin outputs can be considered an
$`M`$-port device where samples are delivered to the $`M`$ branches of a
polyphase LPF with filter outputs subsequently processed by an $`M`$-point FFT.
<div align="center">
<img src="../img/dbe/cspfb-blk-diagram.png" width=600"\>
Figure 4: Critically sampled PFB block diagram.
</div>
In the critically sampled case, $`M`$ samples are delivered to the core per
computation of the $`M`$ branch filter outputs and $`M`$-point FFT. The OSPFB
modifies the decimation by any rate $`D`$ to be less than the critical rate $`M`$ ($`D
< M`$), increasing the sampling rate at each output port by the ratio $`M/D`$.
In practice this is done by shifting in $`D`$ samples to the core per
computation of branch filter and FFT outputs.
The shifts by $`D`$ samples as opposed to $`M`$ introduce a frequency dependent
phase offset not accounted for by the $`M`$-point FFT kernel. The compensation
of this phase offset is done with the addition of a barrel sample rotator
serving to re-align the $`M`$-path filter outputs with their respective
transform input. The following figure shows the modified block diagram for the
OPSFB implementation with the addition of the phase compensation buffer.
<div align="center">
<img src="../img/dbe/ospfb-concept-blk-diagram.png" width=600"\>
Figure 5: Oversampled PFB block diagram.
</div>
The ALPACA F-engine OSPFB is a custom developed IP which takes into account the
trade-offs in the number of parallel antenna signals and available FPGA
resources resulting in a flexible and efficient implementation.
Design and implementation for a single antenna input of this custom ALPACA
hardware OSPFB IP for the RFSoC has been completed.
The following figure shows a complete post-synthesis hardware simulation (bit and cycle
accurate) for the first-stage ALPACA specified OSPFB (2048 channels, oversample
ratio 4/3, 8 polyphase taps) followed by a second stage software 32-point
critically sampled PFB. The core is functional and working as expected.
<div align="center">
<img src="../img/dbe/ospfb-hw-sim-output.png" width=600"\>
Figure 6: Fine spectrum plot of the ALPACA OSPFB output with a single tone input. Note the lack of scalloping or aliasing.
</div>
### 6.1.2 Packetizer
The document linked below specifies the detailed ethernet jumbo packet format for
data transfer from the RFSoC F-engine digitizer and frequency channelizer, to
the GPU XB-engine digital beamformer. The data transfer is handled by a 60-port
100 GbE ethernet switch, which performs a large "corner turn" operation to
reorder data from being sequenced by antenna index to sequencing by frequency
channel index. Each F-engine RFSoC handles 12 PAF antennas across all frequency
channels. After the corner turn, these jumbo packets are re-routed so that each
GPU process 25 (out of 1300) frequency channels for all 138 (+6 spares) antenna
signal streams.
Another important aspect of the packetizer format design shown in the linked
document below is the way frequency channels from each F-engine (each with a
unique FID index number as shown in the table) are distributed across the 50 GPU
XB-engines (each with a unique XID index). The processing load for some
XB-engine processing modes, such as HI observations using a "zoom" fine
resolution spectrometer, is so high that the digital back end cannot process the
full 305.1 MHz bandwidth. Usually the observer in these modes has no need for
the full bandwidth, so we do reduced width subband processing. However, if
channels are assigned to GPUs (XIDs) sequentially, filling up one XID with
channels before moving on to the next, the system would fail in increased
computational demand modes even with reduced bandwidth. The packet format
handles this by "dealing out like playing cards" one channel per XID until all
50 have one, then starting over for the next 50 channels, and so on. When
processing bandwidth is reduced, the processing load is then still evenly
distributed across all XIDs, rather than concentrated on a few. This keeps the
workload uniform across XIDs when processing demands will not support full
bandwidth operation.
[Ethernet Packet Specifications](../uploads/7666d16ef1f7fb6c19a746e2dbf23508/Packet_Format_2.0.pdf)
### 6.1.3 UDP Framer and 100 GbE
The UDP framer was developed by the Electronic Systems Design Group of
Rutherford Appleton Laboratories. This core converts AXI4-Stream data frames
from the F-engine packetizer into IEEE 802.3 Ethernet and IPv4 packets. The core
is very flexible, with a receive path, AXI4-Lite memory map control
interface, and optional PING and other IPv4 protocol functions. ALPCA will only
be using the UDP core to transmit packets and its ARP capabilities for
destination IP address look up. The outputs of the UDP core are then sent to our
custom wrapper IP for the integrated 100G CMAC PHY of the RFSoC. This core
implements CAUI-4 100G using RS-FEC (Reed-Solomon forward error correction) for
use on a 100GBASE-SR4 link.
The output data rate per each of the 12 RFSoC will be 81.8 Gbps. After being
distributed to the 25 HPCs (50 GPUs) the rate drops to 39.3 Gbps per HPC over
two 100 Gigabit NIC cards per each.
[Section 6.2 >>](./6.2)
### Footnotes
[^harris]: F. J. Harris, Multirate Signal Processing for Communication Systems.
Upper Saddle River, NJ, USA: Prentice Hall PTR, 2004.
\ No newline at end of file
Clone repository

ALPACA Design Review Documentation Package

Table of Contents

Section Links

  1. ALPACA Introduction and Instrument Description
  2. ALPACA Science Capabilities
  3. Front End Design
  4. Signal Transport
  5. Digital Back End Design
  6. Firmware and Software Design
  7. Interfaces Between ALPACA and GBO Infrastructure
  8. Design Risks
  9. Project Management Plan and Schedule
  10. Feasibility of Long-term Project Goals
  11. Additional Reference Documents