Machine learning still needs data scientists to optimise results
"Machine learning is not a 'one-size-fits-all' technology, but a growing library of technologies that need to be understood and deployed correctly to achieve meaningful results."
Machine learning (ML) can help communications service providers (CSPs) to manage the growing expense of creating and maintaining algorithms built by data scientists. The ability to allow machines to learn insights and relationships through the application of ML techniques means analytics can be applied to more use cases. In addition, ML is a key component in the creation of artificial intelligence, which enables applications to learn from their environments. However, ML is not simple and CSPs and vendors need to carefully select which ML algorithms are used for each use case.
The answer to the question "what machine learning algorithm should I use?" will depend on the data size, data quality and data type. In addition, the time ML takes to 'learn' should be considered because some techniques take too long for a given use case.
Supervised or unsupervised machine learning
Supervised learning techniques create algorithms based on a set of examples where historical data and outcomes are known, for example the usage pattern of subscribers that have subsequently churned. Most ML uses supervised learning and are broken into three areas: classification, regression and anomaly detection.
- Classification is when the data is used to predict a category. This can be two-class or multi-class classification. For example, it is used to predict a winner in a known list of outcomes.
- Regression is where a value is predicted – such as a Net Promoter Score (NPS).
- Anomaly detection is where an understanding of what normal data looks like and is used to identify outliers and unusual patterns, which might, for example, indicate fraud.
Unsupervised learning is where the data has no associated 'labels', so ML algorithms use grouping or clusters to organise data to understand potential structures before being able to predict outcomes.
Reinforcement learning is when the ML algorithm makes a single action – that is, it receives a notification on how good the decision was. Based on the feedback, the algorithm can adjust its strategy. This is common in robotic actions, such as robotic process automation, and is also used in many IoT applications where sensor data is used.
Selecting an algorithm
Vendors and CSPs must consider several criteria when selecting an ML algorithm, the desired accuracy needed, the time available to train the algorithm and the potential relationships in the data if it is linear or not. Accuracy depends on the use case and has a potential impact on learning times. Larger data sets will generally take longer to train. A large feature data set would be an Internet browsing history of a subscriber, for example, even though the data sets may be from a small population segment. Many ML algorithms use linear classification assuming trends or clusters can be separated with a straight line, this can have a profound impact on accuracy in some instances.
Having selected an algorithm, data scientists are needed to tune the process and to decide a number parameters such as the error tolerance, the number of iterations, training times or the number of features used (see Figure 1).
Figure 1: Common machine learning algorithms [Source: Analysys Mason]
Two-class/multi-classification (churn analysis, customer experience)
|Decision forest||G||E||Two-class/multi-classification, regression (churn)|
|Boosted decision tree||G||E||Two-class classification, regression|
|Neural networks||E||Two-class/multi-classification, regression (churn analysis, network optimisation)|
|Average perception||G||G||E||Two-class classification|
|Support vector machine||G||E||Two-class classification|
|Local deep support vector machine||G||Two-class classification|
|Bayes point machine||G||E||Two-class classification|
|Fast forest quantile||G||E||Regression|
|Support vector machine||G||G||Anomaly detection|
|PCA-based anomaly detection||G||E||Anomaly detection|
Key: G=good accuracy; E=excellent accuracy
Linear regression fits a line to the specific data set. It is a simple and fast method to align two data points, but can give poor results when the relationship is not linear.
This uses an 's'-shaped curve instead of a straight line to fit two or more classes of data.
Tree, forest, and jungles
These are all variations of decision tree, which subdivide the data feature (data attributes) space into regions with mostly the same labels (values or category). The decision tree methodology has several variations with the regions created using classification or regression techniques. Data scientists need to decide how big each region is to avoid 'overfitting' and potentially triggering intensive use of memory resources.
This uses a technique inspired by the brain and there are many variations. One variation using acyclic graphs passes features through a series of layers where each layer weighs inputs and summaries calculations before passing to the next layer. The simple calculations can learn boundaries and classifications quickly. So-called 'deep learning' uses this approach. The downside is that learning can take a long time.
Bayesian methods avoid 'overfitting' by making some assumptions about likely outcomes before they are trained, in addition there are fewer parameters than other methods that need configuration.
ML is a class of techniques that needs to be understood for each use case and will continue to require the skills of a data scientist to set up supervised learning. For unsupervised learning, ML technology providers can run multiple different algorithms in parallel to assess the most accurate at predicting outputs. Using this technique reduces the need for data scientists, allowing for a software programme to call an API within an Amazon, Azure or IBM, for example. This will not make data scientists redundant, but is likely to reduce the need for them.