Tutorial Schedule and Location
The ECML PKDD 2025 conference will be held in the city of Porto, Portugal from 15 to 19 Setember 2025. This specific tutorial will take about 4 hours, including a 30 min break. The first part of the tutorial will be a presentation of the library and its metrics. The remaining time will be dedicated to hands-on sessions focused on each data modality. A detailed schedule is provided bellow.
pyMDMA presentation (15 mins) – Luís Rosado
- Introduction
- Target modalities
- Metric taxonomy description
- Available metrics, installation, and contribution
Image Tutorial (60 mins) – Ivo Façoco
- Dataset presentation
- Public RGB dataset
- Input Validation
- Extraction of image quality metrics
- Distribution analysis of extracted metrics
- Synthetic Validation
- Synthetic dataset explanation (model used, number of instances and type of conditioning)
- Feature extraction with pre-trained models
- Evaluation of fidelity and diversity concepts
- Sample selection through quality-based ranking. Comparison of best/worst generated examples via metric outputs.
Short Break (30 mins)
Time-Series Tutorial (60 mins) – Joana Rebelo
- Dataset Presentation
- Overview of the ECG dataset and its characteristics
- Input Validation
- Extraction of signal quality metrics
- Distribution analysis of extracted metrics
- Synthetic Validation
- Explanation of the synthetic dataset
- Feature extraction using the Time Series Feature Extraction Library (TSFEL)
- Evaluation of fidelity and diversity concepts
- Selection of synthetic samples using metric outputs.
Tabular Tutorial (60 mins) – Pedro Matias
- Dataset Presentation
- Dataset Loading
- High-quality public tabular datasets
- Low-quality public tabular datasets
- Data Preparation
- Attribute type detection, encoding, and scaling;
- Visualization through 2D-embeddings.
- Dataset Loading
- Input Validation
- Extraction of tabular quality metrics;
- Dataset selection through quality-based global ranking.
- Synthetic Validation
- Synthetic Datasets
- Description of generative models (traditional vs. deep learning);
- Visualization of real vs. synthetic using 2D-embeddings.
- Evaluation of fidelity and diversity concepts;
- Synthetic Datasets