Feature Engineering and Selection | Engineering Practice

Practice : Feature Engineering and Selection

Purpose and Strategic Importance

Features are the vocabulary through which a model understands the world. The quality, relevance, and integrity of features have a more direct impact on model performance than algorithmic choice in the majority of real-world AI applications. Teams that invest in disciplined feature engineering — creating meaningful representations of the underlying problem domain rather than simply feeding raw data to a model — consistently produce models that generalise better, require less training data, and are easier to interpret and maintain.

Feature selection also has safety implications. Proxy features — variables that correlate with protected characteristics like race or gender without directly representing them — can introduce discriminatory behaviour into models that are ostensibly treating all users equally. Rigorous feature selection includes examination of what each feature actually represents and whether its inclusion could produce unfair outcomes, not just whether it improves predictive metrics.

Description of the Practice

Applies systematic analysis to create features that capture domain knowledge and meaningful signal, going beyond raw data fields to derived, aggregated, and transformed representations.
Evaluates features for predictive importance, redundancy, and correlation with protected characteristics, removing or transforming features that pose fairness risks.
Documents the rationale for each feature in the training set — what it represents, how it was derived, and why it was included — as part of model documentation.
Uses statistical and model-based selection techniques (e.g., mutual information, SHAP values, recursive feature elimination) to identify the most informative feature sets.
Versions and stores features in a feature store where appropriate, enabling reuse across models and ensuring consistency between training and inference.

How to Practise It (Playbook)

1. Getting Started

Begin with domain expertise — involve subject matter experts in feature ideation before turning to statistical techniques, because the best features encode genuine domain knowledge.
Conduct exploratory data analysis on candidate features to understand their distributions, missing value patterns, correlations, and relationships with the target variable.
Screen for proxy discrimination by examining correlations between candidate features and protected characteristics, flagging any that warrant deeper scrutiny before inclusion.
Build a feature catalogue that records the definition, derivation method, and data source for every feature used in training, starting with your most important current model.

2. Scaling and Maturing

Implement a feature store to manage feature definitions, enable reuse across models, and ensure that features computed at training time match those computed at inference time.
Automate feature importance analysis as part of the training pipeline, producing importance reports at every training run to inform ongoing feature selection decisions.
Establish a feature review process that includes domain experts and data ethicists, not just data scientists, ensuring features are sound from both technical and ethical perspectives.
Track feature drift in production — changes in feature distributions over time that can cause model performance to degrade without any change to the model code.

3. Team Behaviours to Encourage

Treat feature engineering as a collaborative discipline that requires domain knowledge, not just technical skill — prioritise pairing between data scientists and domain specialists.
Document feature decisions thoroughly, including features that were considered and rejected and the reasons why — this institutional knowledge is valuable for future model iterations.
Challenge features that are technically valid but ethically questionable — performance improvements that come at the cost of fairness are not unambiguous improvements.
Share feature definitions and derivations across teams to enable reuse and consistency, rather than each team independently reinventing features for similar problems.

4. Watch Out For…

Feature leakage — including features that encode information that would not be available at inference time, producing over-optimistic training metrics that do not generalise to production.
Over-engineering features without validating whether they actually improve model performance on held-out data, adding complexity without value.
Ignoring temporal dynamics in feature creation, producing features that implicitly depend on future information or fail to account for distributional shift over time.
Failing to test feature definitions for consistency between training and serving pipelines, a common source of production model failures that are hard to diagnose.

5. Signals of Success

Feature engineering decisions are documented with clear rationale and accessible to all team members, not held in the heads of individual data scientists.
Feature selection analysis is run and reviewed at every model training cycle, not just at initial model development.
No features in production models encode proxy discrimination, as verified by formal fairness analysis at each release.
Feature reuse across models is common and supported by a well-maintained feature catalogue or store, reducing duplicated work and improving consistency.
Teams can explain what each feature represents and why it was included in clear, non-technical language when asked by stakeholders.