Contact
ETL Pipeline for Real Estate Listing Syndication (alias Ligneurs)

ETL Pipeline for Real Estate Listing Syndication (alias Ligneurs)

ETL pipeline from PIM Akeneo to real estate portals - multi-format delivery (XML, CSV, JSON) over 4 years of continuous operation.

January 2019 - 2023
~4 years
Technical Lead then Project Manager
Groupe Pichet
PHPSymfonyAkeneo PIM v2REST APIXMLCSVJSONFTP/SFTPGitLab CIDockerKubernetes (K8s)MySQL

Partner Portals

Several dozen

Migrated, integrated and maintained

Export Formats

3

XML, CSV, JSON

Project Duration

~4 years

Continuous evolution

Availability

99.5%+

Over 4 years of continuous operation

Presentation

Project definition and scope

System Overview

The "Export Ligneurs" system is the automated real estate listing distribution engine of the Groupe Pichet. It extracts program and lot data from the PIM Akeneo, transforms it into the specific format required by each partner (XML, CSV, or JSON), and automatically exports it to real estate distribution platforms.

The system serves as the critical link between the company's product data and its commercial visibility: every property listing published on major French real estate portals (SeLoger, LeBonCoin, BienIci, LogicImmo...) passes through this pipeline. Any interruption or data inconsistency directly translates into lost leads and missed sales opportunities.

As the sole technical owner of this system, I was responsible for all architecture decisions, development, deployment, monitoring, and incident response - with full accountability for a pipeline feeding an estimated ***K euros/month in lead acquisition.

Nature

Automated ETL pipeline (Extract-Transform-Load) for multi-channel real estate ad distribution

Domain

Real Estate / PropTech - B2B (internal teams, partner portals) and B2C (indirect, end buyers)

Functional Scope
  • Automated data extraction from PIM Akeneo v2 REST API
  • Per-partner format transformation (XML, CSV, JSON)
  • FTP/SFTP automated delivery to several dozen partner platforms
  • Multi-format image adaptation (4/3, 16/9, panoramic, square)
  • Property typology mapping (apartment, house, duplex, triplex, studio, T1-T5+)
  • Execution monitoring with email alerts and centralized monitoring system
  • Individual partner activation/deactivation capability
  • SKU matching algorithm for real vs. manually-created PIM programs
System Architecture
Export Ligneurs - System Architecture Overview
Technology Choices & Rationale

State of the art in 2019

Stack aligned with the B2B integration standard at the time: batch ETL and FTP/SFTP were the norm before webhooks and event-driven architectures became mainstream.

PHP / Symfony

Consistent with the existing backend ecosystem. Symfony Console provided a solid framework for scheduled batch command execution.

Akeneo PIM v2

Strategic company choice for product catalog management. Its REST API provided structured access to all program and lot data with versioned endpoints.

Docker / Kubernetes

Each export job isolated in its own container, preventing resource conflicts between partner modules. K8s on AWS EKS handled scheduling and auto-recovery of failed jobs.

GitLab CI

Automated the build-test-deploy cycle for each partner module independently, allowing targeted deployments without impacting other active feeds.

Objectives, Context, Stakes & Risks

Strategic vision and constraints

Objectives
  • 1Migrate all export feeds from legacy PIM v1.4 to the new PIM v2 Akeneo
  • 2Execute migration partner by partner with business validation at each step
  • 3Verify data consistency between source PIM and feeds sent to portals
  • 4Handle each portal's specificities (image formats, typologies, required fields)
  • 5Automate feed supervision (error alerts, execution reports)
Context

The project was initiated during the knowledge transfer from Andoni L. in January 2019. The existing system ran on the legacy PIM v1.4 and needed to be fully migrated to PIM v2 Akeneo while maintaining continuous service to all partner portals.

The migration had to be performed portal by portal - each with its own format specifications, required fields, image constraints, and property typology mappings - making it impossible to execute as a single "big bang" migration. Each partner required individual validation by the business teams before going live.

The system was embedded in a larger data ecosystem: upstream data came from the accounting software and in-house ERPs feeding the PIM, while downstream the feeds connected to around a hundred lead suppliers generating an estimated 1 lead every 2 seconds across all portals.

Stakes

The partner portals (SeLoger, LeBonCoin, BienIci...) are major lead acquisition channels in the real estate market. Any interruption or error in the feeds directly translates into lost leads and reduced commercial pipeline. With several dozen partners to migrate individually, the project required sustained attention over multiple years while maintaining zero downtime on active feeds.

Risks

Data Inconsistency

Risk of publishing incorrect prices, wrong images, or missing properties on partner portals - directly impacting buyer trust and commercial results.

Service Interruption

Any feed failure means properties disappear from partner portals, causing immediate lead loss for the commercial teams.

Format Divergence

Each portal has unique requirements (image ratios, typology codes, required fields) - a generic approach was impossible.

API Instability

PIM Akeneo API connection issues could block all exports simultaneously, requiring solid error handling and retry logic.

Key Architecture Decisions

Modular per-partner architecture

Decision: One isolated module per portal instead of a generic engine

Rationale: Fault isolation: a bug in one module cannot affect other partners. Independent deployment and testing per feed.

Progressive migration over big-bang

Decision: Portal-by-portal migration with business validation at each step

Rationale: Blast radius limited to one partner at a time, with immediate rollback capability if issues arise.

ETL batch processing over real-time streaming

Decision: Scheduled batch exports via CRON jobs rather than event-driven publishing

Rationale: Partners consumed data via FTP/SFTP drops, not webhooks. Real-time would have added complexity without benefit.

Multi-format image pre-generation

Decision: Pre-generate all image variants centrally rather than on-demand per partner

Rationale: Avoids redundant processing of the same image across portals and ensures upstream compliance.

ETL Data Pipeline
Extract-Transform-Load pipeline for partner feed generation

The Steps - What I Did

Chronological progression of the project

Phase 1
Knowledge Transfer & Initial Migration
January 2019
  • I became the sole technical owner within 2 weeks after the handover from Andoni L.
  • On the migration side, I shipped the first batch: SeLoger Neuf, LogicImmo, TULN, Paru Vendu
  • On the project management side, I framed the migration roadmap with the business teams and defined partner-by-partner validation milestones
  • To secure the subsequent migrations, I established an acceptance checklist that I reused throughout the project
Phase 2
Feature Development & New Integrations
June - September 2019
  • As Technical Lead, I prioritized the integration backlog by arbitrating between business requests, partner constraints and technical capacity
  • I integrated BienIci with a dedicated image adaptation layer
  • On ImmoNeuf, I adapted the feed with a 16/9 to 4/3 image conversion
  • On the reliability side, I stabilized the SeLoger and Knock feeds
Phase 3
Stabilization & Critical Fixes
January 2020
  • I added pricing validation guardrails before publication
  • To absorb API error spikes, I introduced a circuit breaker and exponential backoff on PIM API calls
  • On the observability side, I added structured logging to reduce incident diagnosis time
  • On the project management side, I ran the incident post-mortems and reported the corrective actions and timelines to the steering committee
Phase 4
New Partners & Continuous Evolution
June 2020 - 2023
  • On the new-partner side, I built the Investimeo and BienIci integrations from scratch
  • As Technical Lead / Project Manager, I framed the multi-year roadmap with management and negotiated the technical scope and integration SLA with each new partner
  • For Marketshot, I drove the clean removal of the partner without side effects on the other feeds
  • On the incident side, I resolved the NEEDOCS, BienIci and Green Valley anomalies

Actors & Interactions

Who I interacted with directly and how I collaborated

Coordination and Collaboration

As the sole technical owner, I coordinated directly with business stakeholders, external vendors and partner portals. Each migration led me to define acceptance criteria, pilot validation cycles and make the go/no-go call for production deployment. I learned to translate technical constraints into business terms and vice versa to keep everyone aligned.

Andoni L. (Predecessor)·Gaetan B. (Business referent)·Leslie A. (Business referent)·Franck C. (Manager (N+1))·Sebastien B. (Vendor team)

Results

Impact for me and for the company

For Me
  • I took full technical ownership of a business-critical system directly impacting revenue
  • I made autonomous architecture decisions, with full accountability for reliability and data accuracy
  • Over 4 years, I piloted this project with business teams, external vendors and several dozen partner portals
  • I held the end-to-end lifecycle: architecture, development, deployment, monitoring and incident response
  • This project changed the way I work: it placed me in cross-functional leadership on validation processes and partner onboarding
For the Company
  • Several dozen partner portals migrated from PIM v1.4 to v2 Akeneo with zero service interruption
  • 2 new partner integrations built from scratch (BienIci, Investimeo)
  • Several thousand listings processed daily across all partner portals
  • 99.5%+ availability over 4 years, average incident resolution under 4 hours
  • Standardized property typology across all feeds, reducing data inconsistency reports
Partner Type Distribution
Export Format Distribution

Project Aftermath

What happened after delivery

System Evolution

Immediate aftermath: After the 2019 migration wave, the system entered a continuous maintenance phase with new partner additions and anomaly resolution as needed.

Medium term: Resilience proven over 4 years, handling partner format changes and internal data model evolutions without disruption.

Long-term perspective: Became a foundational piece of infrastructure feeding the commercial pipeline. Modular architecture allowed scaling across several dozen portals without fundamental redesign, and any developer could add a new partner by following the established patterns.

Technical Effort Distribution

Critical Reflection

With hindsight, how I judge this project

What worked well
  • With hindsight, the portal-by-portal migration proved its worth: minimal risk, business validation at each step, immediate rollback
  • I stand behind the modular architecture I put in place: I could add, modify or deactivate feeds without side effects
  • Thanks to the onboarding I standardized, new-partner integration dropped from weeks to days
What could have been better
  • With hindsight, I would have built a centralized monitoring dashboard rather than watching individual email alerts
  • I would also have set up per-partner automated integration tests earlier to catch regressions upstream
If I had to redo it today
  • In 2026, I would pick an event-driven approach (Kafka/RabbitMQ) instead of CRON batch, with observability-first monitoring (OpenTelemetry, Grafana Tempo)
  • I would set up a partner specification registry from day one to halve onboarding time
  • I would add automated integration tests against each partner schema before deployment
  • I would build a centralized real-time monitoring dashboard instead of relying on email alerts
The lasting lessons this project brought me
  • I take away that in multi-partner systems, there is no "one size fits all" - each integration has its own constraints
  • I learned that long-running projects require a maintenance mindset from day one
  • For revenue-critical systems, I measured that observability matters more than preventing every single failure

Related journey

Professional experience linked to this achievement

Skills applied

Technical and soft skills applied

Image gallery

Project screenshots and visuals

Need an ETL syndication pipeline designed?

I delivered a multi-portal ETL syndication pipeline: PIM extraction, multi-format transformation (XML/CSV/JSON), FTP/SFTP delivery and monitoring over 4 years of continuous operation. Let's talk about your context.

Contact me