Skip to main content

E6.F

·2532 words·12 mins
Alp Ortakci
Author
Alp Ortakci
Electrical and Electronics Engineer with an interest in multi-disciplinary projects.

E6.F Title

E6.F (Environmental Screening, Intervention, Care System for Forestry) is a multi-agent robotic architecture for continuous forest health assessment, designed as a self-initiated bachelor’s thesis at the University of Southampton under the supervision of Professor Christopher Freeman. The project earned 89%, one of the top marks in the cohort. The core contribution is the system architecture itself: a framework for connecting heterogeneous robotic agents through an information pipeline where broad aerial survey progressively guides the commitment of more costly, qualitatively different ground-level measurements.

The project was motivated by the observation that existing forestry robotics solutions are overwhelmingly task-specific and single-agent. Monitoring is addressed by UAV systems, inventory by wheeled ground platforms, firefighting by tracked vehicles. While individually effective, these approaches do not address the problem as a connected system where data from one agent can guide the actions of another. E6.F proposes a multi-agent architecture where agents operating in fundamentally different workspaces (above canopy, below canopy, ground level) are linked by a structured information flow, with each layer providing context that the others cannot obtain independently.

System Architecture
#

Full Design Concept (FDC)
#

The Full Design Concept describes the complete proposed system with four agent types, each operating in a distinct workspace defined by the forest structure itself:

FDC Graphical System Representation and Workspaces
Full Design Concept: graphical system representation and agent workspaces
  • Above Canopy Unit (ACU) operates above the tree canopy, producing georeferenced orthomosaic maps and vegetation health assessments over large areas. It provides the global spatial reference for the entire system. In the FDC, this agent would also relay communication between ground units and detect wildfire risk areas.
  • Below Canopy Unit (BCU) is a micro-drone operating in the space between the canopy and the forest floor. This is an environment that the ACU cannot image (the canopy occludes it) and the ground agent cannot efficiently survey (it requires a vantage point above ground level). The BCU collects inventory data such as tree trunk diameter and spacing, and coordinates visually with the ground agent.
  • Legged Unit (LU) is a quadruped ground agent selected over wheeled alternatives to minimise soil compaction and reduce the chance of getting stuck on irregular terrain. It operates at the forest floor, conducting direct measurements and assessments that airborne platforms cannot perform. In the FDC, this agent would also carry IoT sensor payloads for distributed environmental monitoring.
  • Base Beacon Unit (BBU) is a stationary node that handles task allocation between agents, processes collected data, provides charging, and coordinates the overall mission. It acts as the computational and communication hub.

Information flows top-down through the system. The ACU produces segmented environmental maps and vegetation health indices that identify priority regions. These regions define where the BCU and LU should direct their assessment. The BCU provides intermediate-scale structural data (trunk geometry, canopy gap distribution). The LU provides ground-truth measurements at specific locations. Each layer informs the next, and data flows back up to the BBU for aggregation and decision-making.

A custom task distribution methodology was developed to systematically allocate forestry operations (monitoring, inventory, fire detection, fuel management, planting, pruning) between agents. The primary separator for airborne agents is the degree of physical interaction with the environment (passive sensing vs. active intervention). For ground agents, the primary separator is traversal capability (mobile vs. stationary). Within each branch, further separation considers whether distinct agents can be combined or should remain separate, based on hardware and software compatibility.

Implementable Design Concept (IDC)
#

The full system could not be built within the budget and time constraints of a single thesis. Three scaling strategies were evaluated: system miniaturisation, full simulation, and discrete test cases demonstrating individual functionality. Discrete test cases were selected because they allow real hardware to be demonstrated (validating the affordability and deployability claims) while accepting that full system integration would be a follow-up effort.

The IDC was structured as two test cases. Case 1 demonstrates the Maintenance System (LU + BCU + Test Rig acting as BBU), validating ground locomotion with sim-to-real RL transfer, visual coordination via AprilTags, and communication between agents. Case 2 demonstrates the ACU pipeline: aerial data acquisition, workspace definition through semantic segmentation, and region-of-interest determination through vegetation index computation.

IDC Process Chart
Implementable Design Concept: test case process chart

Above Canopy: Mapping and Vegetation Health Assessment
#

The ACU pipeline addresses two problems: generating a spatial reference for the system, and identifying regions where vegetation health may be degraded so that below-canopy agents can be directed there.

Georeferenced Mapping
#

For spatial reference, the DJI Mavic 3E collects overlapping aerial imagery to be processed into a georeferenced orthomosaic map defining the workspace for the other agents. The standard photogrammetric pipeline involves capturing overlapping images with GNSS-tagged positions, identifying common features across images, refining camera positions through bundle adjustment, generating a dense point cloud and Digital Surface Model (DSM), orthorectifying the original images onto the DSM, and mosaicking into a seamless map.

Automation of the image acquisition was attempted through the DJI MSDK 5, which requires development of a custom Android application for the Mavic 3E’s RC controller. Due to insufficient documentation of the recently released SDK, an alternative approach using Tasker (an Android automation framework capable of emulating user gestures and reading OS state data) was explored to automate the mapping workflow through the bundled DJI Pilot app.

Semantic Segmentation of Aerial Imagery
#

To classify the workspace into traversable and non-traversable regions, a VGG16-UNet semantic segmentation model was trained. The architecture uses a pre-trained VGG16 encoder (transfer learning from ImageNet) paired with a transposed convolutional decoder that enables pixel-level classification. The UNet architecture was chosen because it handles images of arbitrary size (no dense layers) and the skip connections between encoder and decoder preserve spatial detail.

The model was trained on the Graz Semantic Drone Dataset: 200 images at 6000x4000 pixels with manually created masks across 23 terrain categories (paved area, dirt, grass, gravel, water, rocks, vegetation, tree, building elements, vehicles, and others). The Albumentations library was used for data augmentation (horizontal/vertical flips, rotation, scaling, optical distortion), expanding the dataset to approximately 1600 images with 80 reserved for testing. Training ran for 20 epochs on an Intel Core i7-9750H CPU over approximately 2.5 days with batch size 8, input shape 512x512x3, initial learning rate 0.0001, steps per epoch 200, and validation steps 10. The Dice coefficient was used as the primary metric.

Segmentation prediction comparison
VGG16-UNet segmentation: original image, ground truth mask, predicted mask

Custom Multispectral Camera
#

Multispectral camera build Multispectral camera internals Assembled multispectral camera
Custom TetraPi-based multispectral camera: front view, Arducam multiplexer board, 3D-printed housing exploded view

Off-the-shelf multispectral cameras were prohibitively expensive. Instead, a custom device was built based on the TetraPi open-source design: a Raspberry Pi 3A+ driving four cameras through an Arducam multiplexer in a 3D-printed housing. Two standard Raspberry Pi cameras capture RGB imagery, and two NoIR (no infrared filter) cameras are fitted with optical bandpass filters: a 532 nm laser line filter (FLH05532-10) and a 570 nm band-pass filter (FB570-10). This configuration enables spectral separation across the visible and near-infrared range from a single compact device.

From the four captured channels, two vegetation indices are computed:

NDVI (Normalised Difference Vegetation Index): Computed as (NIR - Red) / (NIR + Red), sensitive to chlorophyll content and ranging from -1 to 1, with higher values indicating healthier vegetation.

GRVI (Green-Red Vegetation Index): Computed as (Green - Red) / (Green + Red), useful for detecting subtle differences in greenness that may indicate early vegetation stress.

The implementation was adapted from the Raspberry Pi Foundation’s NDVI tutorial for NoIR cameras, extended to handle multiple image inputs and GRVI computation. When applied to aerial imagery, pixels with depressed index values indicate regions where vegetation health may be at risk, and their coordinates can be extracted to direct the below-canopy agents.

Integration of the multispectral camera onto the Mavic 3E was attempted via the DJI Skyport adapter and through the USB-C port (listening for new media files to trigger capture). Both approaches were blocked by hardware restrictions (Skyport requires a proprietary adapter board; the drone refuses to fly when USB-C is powered). The camera was therefore demonstrated as a standalone device, with the intended integration pathway documented.

RGB channelNoIR channel570nm filtered532nm filtered
RGB, NoIR, 570 nm bandpass filter (FB570-10) on RGB and NoIR camera with the 532 nm bandpass filter (FLH05532-10)

Ground Agent: PPO Locomotion and Sim-to-Real Transfer
#

The Legged Unit uses a Petoi Bittle micro-quadruped, selected after evaluating four platforms (Nova Spot Micro, Bittle, Nybble, Pi Crawler). Spot Micro was the initial preference for its larger build and existing OpenAI Gym support, but cost analysis made it non-viable. Bittle was selected for its ESP32 BiBoard (fast enough to receive and execute joint position commands from a trained controller), open serial interface, existing RL references, and the ability to mount an additional SBC and external sensors.

GPU-Accelerated Training in Isaac Gym
#

The locomotion controller was trained using Proximal Policy Optimization (PPO) in NVIDIA Isaac Gym, a GPU-accelerated RL platform that can train a quadruped to walk in approximately 15 minutes through massive parallelism (thousands of environment instances running simultaneously on GPU). The platform accepts URDF robot descriptions and provides PPO training through its Tensor API, which wraps PhysX physics data into PyTorch tensors.

The training builds on the ANYmal actor-critic PPO policy, adapted for Bittle through a fork of the IsaacGymEnvs repository and a URDF file originally created for Isaac Sim. Significant modification to both the initial joint configuration (specified in YAML) and the training hyperparameters was necessary due to Bittle’s smaller form factor and less precise servos.

The training uses a level-progression structure:

  1. Initialisation

    Each iteration initialises the agent in a cubic volume with one of six terrain types.
  2. Policy Execution

    The agent follows the PPO policy to maximise reward. Iterations terminate early if the agent goes out of bounds or performs poorly.
  3. Level Progression

    Progression to the next level requires traversing over half the level length.
  4. Performance Monitoring

    The agent is monitored to ensure performance is not a random occurrence, demotion occurs on poor repeat performance.

Several issues were encountered and resolved during training. A launch behaviour (the agent being catapulted from the ground on initialisation) was traced to mesh overlap between the agent and terrain when PhysX starts, creating unintended reactive forces. This was fixed by increasing the initial separation from the ground. Joint misalignment and reversed rotation were resolved through iterative adjustment of the URDF configuration and the training’s PD controller stiffness and damping parameters.

Correct gait in Isaac Gym
Corrected quadruped gait trained in NVIDIA Isaac Gym

Sim-to-Real Transfer
#

Two transfer approaches were evaluated. Direct deployment of the A2C model onto the ESP32 BiBoard required conversion to TensorFlow Lite or PyTorch Mobile via ONNX. This proved impractical because the actor-critic architecture requires the observation from the previous state to progress to the next, and the input/output tensors are not exposed in a way that supports straightforward conversion from Isaac Gym’s CUDA/C++ backend through the Python API.

The successful approach streams inference results from the host machine to the Bittle. A local socket transfers the sliced DOF position tensor from the Isaac Gym inference engine to Bittle’s serial communication script. The Bittle communication script was modified to format joint positions with correct rotation directions for simultaneous servo control.

Transfer was achieved with a 30-second delay between the gym environment and the physical robot. The locomotion demonstrated a systematic offset on the knee joints, caused by the difference between the URDF zero-rotation position (full leg extension) and Bittle’s physical calibration (orthogonal L-shape). When DOF positions are mapped onto the servos, additional rotation is added on top of the intended position.

Sim-to-Real transfer breakdown
Sim-to-real transfer pipeline: Isaac Gym inference to Bittle serial interface via local socket
URDF vs Bittle zero rotation comparison
URDF zero-rotation position (full extension) vs Bittle physical calibration (orthogonal L-shape)

RPLIDAR Mount
#

An RPLIDAR A1 was mounted on the Bittle platform with a Raspberry Pi 3B+ to provide spatial awareness. A 3D-printed mount design was sourced and modified to accommodate the RPi 3B+ form factor and jumper cable connections between the BiBoard and the Pi.

LIDAR-mounted Bittle
Petoi Bittle with RPLIDAR A1 and Raspberry Pi 3B+ mounted via custom 3D-printed bracket

Below Canopy: Visual Tracking and Coordination
#

The Below Canopy Unit uses a DJI Tello micro-drone (replacing an initial Parrot Mambo selection that proved unusable due to Bluetooth-only connectivity limiting data throughput). The Tello provides WiFi-based communication, stable flight characteristics, and native Python support.

An AprilTag-based proportional controller was implemented for visual coordination with the legged unit. The controller detects AprilTags in the Tello’s camera feed using the AprilTag library, extracts the tag’s corner points, centre position, and apparent size, then computes error signals: x and y deviation between the tag centre and the camera frame centre, and z error between the desired tag size (representing target distance) and the actual tag size. These errors feed directly into proportional gains that generate RC control commands for the drone.

An OAK-D depth camera was evaluated separately for RGB-D based forest mensuration. A pipeline was developed to capture colour and depth data simultaneously, process the colour image with a YOLOv7-tiny model to detect tree trunks and obtain bounding boxes, and extract corresponding depth data. Steps 1-3 were achieved, but the pipeline was prone to freezing during execution despite attempts to reduce bandwidth (lowering FPS, using minimum resolution). Tree height and DBH extraction from the bounding box and depth data were designed but could not be demonstrated due to the pipeline stability issue.

Trunk Detection with YOLOv7
#

A YOLOv7 trunk detection model was trained on the ForTrunkDetV2 dataset to enable below-canopy tree identification. The model was trained for 100 epochs on an RTX 2070 workstation. While the model achieved detection capability, validation revealed overfitting on the objectness loss parameter, a known issue where the model becomes overly confident in its predictions on training data while generalising poorly. This was identified through analysis of the training curves and comparison with documented cases of the same behaviour in the YOLOv5/v7 community.

Communication
#

LoRa (Long Range) communication via SX1278 (Ra-02) modules was selected for inter-agent data relay due to its long range, low power consumption, and signal penetration through dense vegetation. Two Raspberry Pi nodes were configured as sender and receiver using the SX127x Python library. SPI communication was verified, and the sender script successfully transmitted packets. However, the receiver script did not capture incoming messages. Systematic debugging (verifying hardware connections, testing SPI independently, analysing LoRa configuration parameters) did not resolve the issue. Node initialisation was achieved, but end-to-end communication could not be demonstrated.

Technical Specifications
#

ComponentSpecification
Above Canopy PlatformDJI Mavic 3E (45 min flight time, OcuSync 3.0)
Below Canopy PlatformDJI Tello (13 min flight time, WiFi)
Ground PlatformPetoi Bittle (ESP32 BiBoard, P1S servos, 3 DOF per leg)
Multispectral CameraCustom TetraPi-based: RPi 3A+, Arducam multiplexer, 2x RPi Camera, 2x NoIR Camera, 532nm and 570nm bandpass filters
Depth CameraLuxonis OAK-D (Myriad X VPU, 4056x3040 colour, 1280x800 depth)
LIDARRPLIDAR A1M8-R6 (0.15-12m range, 4000-8000Hz sampling)
RL FrameworkNVIDIA Isaac Gym, PPO (actor-critic, based on ANYmal policy)
Aerial SegmentationVGG16-UNet (TensorFlow/Keras, 23 classes, Graz Semantic Drone Dataset)
Trunk DetectionYOLOv7 (PyTorch/Darknet, ForTrunkDetV2 dataset)
CommunicationLoRa SX1278 (Ra-02), initialised but not demonstrated end-to-end
SupervisorProfessor Christopher Freeman
Second ExaminerDr Mohammad Divband Soorati
Mark89% (First Class, one of the top marks in the cohort)