SPARK Challenge 2026 — Technical Documentation

Technical details of the event-based spacecraft pose estimation pipeline developed for the SPARK Challenge 2026 (AI4Space @ CVPR 2026, Stream 2).

1. Event Camera Data & the SPADES Dataset

Unlike conventional frame-based cameras, event cameras (neuromorphic sensors) output asynchronous per-pixel brightness changes. Each event is a tuple (x, y, t, p) representing the pixel coordinate, timestamp, and polarity (positive or negative change).

The SPADES dataset (SPAcecraft Pose Estimation Dataset using Event Sensing) provides event streams of a spacecraft model under varied lighting and motion profiles, focusing on the Proba-2 satellite. It is split into two categories:

  • Synthetic Dataset: Generated using Unreal Engine (UE) with dynamic backgrounds (animated Sun and Earth), and the ICNS event simulator using Blender to model the neuromorphic sensor. It contains 300 trajectories with 179,400 ground-truth pose labels.
  • Real Dataset: Collected at the Zero-G Laboratory at the SnT, University of Luxembourg, using a scaled mockup of the Proba-2 satellite and a Prophesee Metavision EVK4-HD event vision sensor. It contains 32 trajectories with 16,900 pose labels.

2. Event Frame Encoding

To improve training efficiency and robustness, raw event streams were converted into 2-channel event frames (accumulated polarity surfaces):

# Event frame parameters
# Output: tensor of shape [C=2, H, W]
# channel 0: sum of positive events
# channel 1: sum of negative events

def events_to_event_frame(events, H, W):
    """Convert event stream to 2-channel event frame."""
    frame = np.zeros((2, H, W), dtype=np.float32)
    
    # Accumulate polarities
    for i in range(len(events)):
        x, y = events['x'][i], events['y'][i]
        p = events['p'][i]  # 0 (off) or 1 (on)
        frame[p, y, x] += 1.0
        
    # Log transformation for dynamic range compression
    frame = np.log1p(frame)
    return frame

This encoding significantly reduces the input dimensionality compared to voxel grids, allowing for deeper backbones while maintaining the critical spatial features of the spacecraft model.

3. Model Architecture

The final solution utilized a modular PoseNet architecture supporting high-performance backbones like EfficientNet-B3/B4:

class PoseNet(nn.Module):
    def __init__(self, backbone="efficientnet-b3", in_channels=2):
        super().__init__()
        # Using Pretrained EfficientNet with modified input layer
        self.backbone = PretrainedEfficientNet(
            backbone_type=backbone,
            in_channels=in_channels,
            feature_dim=512,
            pretrained=True
        )
        
        # Modular Pose Head for Translation and Rotation
        self.pose_head = PoseHead(
            feature_dim=512,
            hidden_dim=256,
            dropout=0.1
        )
    
    def forward(self, x):
        features = self.backbone(x)
        translation, quaternion = self.pose_head(features)
        return translation, quaternion

Integration with Feature Pyramid Networks (FPN) further enhanced the model's ability to detect the spacecraft across varying distances and scales.

Loss Functions

We transitioned to a more robust loss formulation that decouples the optimization of translation and rotation:

# Translation: SmoothL1 is more robust to outliers in spacecraft trajectories
translation_loss = F.smooth_l1_loss(t_pred, t_gt)

# Rotation: Geodesic loss on the rotation manifold (SO3)
def geodesic_loss(q_pred, q_gt):
    dot = torch.abs(torch.sum(q_pred * q_gt, dim=-1))
    dot = torch.clamp(dot, -1.0 + 1e-7, 1.0 - 1e-7)
    return torch.mean(torch.acos(dot))

total_loss = translation_loss + (lambda_rot * geodesic_loss(q_pred, q_gt))

For some versions, we also explored a 6D continuous rotation representation, which maps raw network outputs to orthogonal rotation matrices, providing a smoother gradient flow during training.

5. Training Configuration

ParameterValue
GPUNVIDIA RTX 4090
BackboneEfficientNet-B3 / B4 + FPN
Input Channels2 (Polarity Frames)
AugmentationsDensity Augmentation, Scale/Shift
OptimizerAdamW (LR=1e-4)

6. Post-Processing & Refinement

Hybrid Pose Construction

By ensembling multiple models, we found that certain training versions (v7) excelled at translation, while others (v15) were more stable in rotation. Our final submission combined the outputs from these specialized experts.

SLERP Smoothing

To eliminate temporal jitter in rotation, we applied Spherical Linear Interpolation (SLERP) across the sequence of predicted quaternions. This ensured that the estimated spacecraft trajectory was physically consistent and smooth.

7. Results

  • Global Rank: 9th Place (Stream 2)
  • Total Error: 0.125 (Aggregated Metric)
  • Competition: SPARK 2026 @ CVPR 2026 AI4Space workshop