Open source consulting Paris · Remote · On-site

Data science consulting
that ships to production.

We work on the full stack — from problem framing and data strategy to model development and deployment. Open-source tools only. Your code, your models, your infrastructure.

Discuss your project → Expert-led training

our_process.sh copy

# how every engagement works

$ git clone your_problem
✔ framing done  # what's the actual business question?
✔ data audit    # what do you have, what's missing?
✔ model built   # python / R / open LLM, reproducible
✔ deployed      # fastapi · docker · gitlab CI · on-premise
✔ handed over   # docs + training, team is autonomous

$ echo "result: yours to own and extend"

our process

Four phases, one objective: production

Every engagement follows the same discipline — from the first conversation to the last line of documentation.

🎯

Strategy & framing

We start by identifying the real business problem — not the data problem someone thinks they have. What's the decision this model needs to support? What does success look like in production? What data exists, and what's missing?

What to build? What data exists? What's the ROI? What's the risk?

🔍

Analysis & modelling

Exploratory analysis, feature engineering, model selection and validation. We use Python (scikit-learn, PyTorch, statsmodels) or R (tidymodels, brms) depending on what fits your problem. Experiments tracked with MLflow from day one.

What's the insight? Which model? How to validate?

⚙️

Development & deployment

Production-grade code with uv-managed environments, GitLab CI/CD pipelines, containerised with Docker, exposed via FastAPI or served via Streamlit/Dash. On-premise or your cloud — your call. No SaaS dependency.

FastAPI / Docker GitLab CI/CD On-premise

🎓

Transfer & autonomy

Documentation, code review sessions and hands-on training so your team can maintain, extend and re-train the system. The engagement ends when your team doesn't need us — and comes back for the next challenge.

Full docs Team training Handover

services

What we build

Every service uses open-source tooling, ships with reproducible environments and is handed over with documentation.

🤖

AI & LLM systems

RAG pipelines on your proprietary documents, fine-tuned open LLMs, AI agents with tool use. Deployed on-premise — your data never leaves your infrastructure. Fully auditable, AI Act compliant.

RAGMistral AILlamaLangChainLlamaIndexon-premiseAI Act

→

⚙️

Python ML / MLOps

Custom machine learning models, from notebook to production API. uv for dependency management, MLflow for experiment tracking, FastAPI for serving, Docker for containerisation, GitLab CI for automated pipelines.

uvGitLab CI/CDFastAPIMLflowscikit-learnPyTorchDocker

→

📈

Data apps & dashboards

Interactive data applications built with Streamlit or Dash (Python) and Shiny (R). From rapid internal prototypes to production-grade apps served behind authentication, deployed on your infrastructure or via Docker.

StreamlitDashShinyPlotlyggplot2QuartoDocker

→

📊

R & statistical modelling

Advanced statistical modelling, econometrics, survival analysis, time-series forecasting, NLP. SAS-to-R migration with validation at every step. Posit infrastructure setup (Workbench, Connect, Package Manager).

RtidymodelsbrmsforecastrenvPositSAS migration

→

🔬

On-demand data analysis

A precise business question + your data = an interpreted, actionable answer in 3–5 working days. Flexible time-credit system. Segmentation, NLP, forecasting, scoring, survival analysis, experiment design.

NLPforecastingsegmentationscoringDOEsurvival

→

open source stack

Everything we use is open source

No black boxes. Every tool is inspectable, forkable, and replaceable. You're never locked in.

🐍Python

ML & AI

scikit-learnPyTorchHuggingFaceLangChainLlamaIndexOllamavLLM

📦Python

Packaging & quality

uvruffmypypytestpre-commit

📊R

Stats & viz

tidyversetidymodelsggplot2brmsforecastrenv

🔧DevOps

CI/CD & deployment

GitLab CI/CDDockerMLflowDVCFastAPIPrefect

🖥️Apps

Dashboards & apps

StreamlitDashShinyPlotlyQuarto

🏗️Infra

R environment

RStudio ServerPosit WorkbenchPosit ConnectPositron

philosophy

Why open source only

A constraint that protects you — not us.

No lock-in

You own the models and the code

Everything we build runs on open tools. Swap models, change infrastructure, extend the codebase — no proprietary runtime, no licence fee, no renegotiation.

Auditability

Explain every decision

Open models are inspectable. You can satisfy internal auditors, GDPR requirements and EU AI Act obligations. Proprietary black-box APIs cannot.

Data sovereignty

Your data stays on your infrastructure

We deploy on-premise or in your private cloud. Sensitive data — financial records, health data, legal documents — never touches a third-party API.

Reproducibility

Environments that work everywhere

uv lockfiles for Python, renv for R, Docker for deployment. Your GitLab pipeline runs the same code in dev, staging and production — no drift.

track record

200+ projects delivered since 2012

Energy, finance, healthcare, retail, research, telecoms — production systems running at scale.

EDFEngie AXADior L'OréalOrange LCLOCDE UbisoftNestlé ThalesScania CNRSCerba Healthcare CofaceInstitut Curie Société GénéraleIfremer

Tell us about your project.

A few lines about your data, your problem and your constraints — we respond within 48 hours with an honest assessment of what's feasible and what it takes.

Phone +33 1 72 25 40 82 Email info@stat4decision.com Response within 48h

Get in touch → ← Back to home