‘HiClass’: A Python Package that Provides Implementations of Popular Machine Learning Models and Evaluation Metrics for Local Hierarchical Classification

Classification is the process of grouping items into categories. Classification problems can be naturally modeled hierarchically, typically in the tree or directed acyclic graph form (or some combination). These types of classifications range from musical genre categorization all the way down to identifying viral sequences within metagenomic data sets and diagnosing chest X-ray images using COVID-19 as an example.

A flat approach to tree classification is a methodology that completely ignores the hierarchy between classes, usually predicting only leaf nodes. Although this method can be used easily and quickly for some problems without hierarchical features, it becomes more difficult with multiple levels of grouping in mind because then you have decision trees or pruning needed on top of what would’ve been done by regular linear models. The importance of the hierarchy when training a model is often overlooked. Still, it has been shown to lead in consistently better predictive results; therefore its being used in the research.

In this research paper, researchers from the University of Potsdam introduce HiClass, a Python library that implements the most common patterns for local hierarchical classifiers. These can be employed in different application domains where data is hierarchically structured and has an ideal shape like trees or directed acyclic graphs with missing values at intermediate levels on both sides (hierarchical).

HiClass is an open-source Python package for local hierarchical classification that fully complies with scikit. It provides implementations of the most popular machine learning models and includes tools to evaluate model performance on data sets containing hierarchy levels as well.

The paper and code links are given below.

Paper: https://arxiv.org/pdf/2112.06560v1.pdf

Gitlab: https://gitlab.com/dacs-hpi/hiclass