Knowledge is power. Data mining means knowing more

Nowadays many processes in business are based on data. Data mining helps you recognise trends and patterns so that you can improve your process, develop your business and enjoy more success. It’s worth it for SMEs. Here’s why.

If you shop at Zalando, you leave behind data. Data is created when cars are tested, whenever you use a lift, and when somebody checks incoming goods at a warehouse. This data contains knowledge that can be valuable for your success.

You do not need luck to strike gold. You need data mining, and discovering a pattern in large quantities of data can be worth more than its weight. Such information can help small and medium sized enterprises serve their customers better, make their production more efficient, streamline their supply chain, improve product quality and reduce downtimes.

Amazon, for instance, uses data mining to suggest products: customers who bought a certain book bought this one too. Suggestions like these boost the online retailer’s sales by around a third.

Lift manufacturer Otis analyses data in conjunction with machine learning to perform ‘predictive maintenance’. This new service improves lift life cycles and increases customer satisfaction.

Data mining definition

Data mining is a computer-aided method which utilises concepts from information technology, statistics and mathematics to analyse data. Data mining algorithms reveal logical links as patterns or trends in data. This helps you identify and work on correlations, regularities, problems and weak points.

Statistics help to check hypotheses using small random samples and sample sizes, whereas data mining automatically generates new hypotheses using an endless quantity of data. Artificial Intelligence (AI) and machine learning are also used to analyse data.

‘Mining’, therefore, is not about accumulating data, it is about extracting knowledge from data and generating knowledge. That goes way beyond processes like evaluating KPIs in controlling.

Text mining is a related method which is about information in long text documents. It uses unstructured data, whereas data mining usually uses structured data from databases.

The kind of text that might be analysed includes e-mails, memos of discussions, news feeds, Web forms, online discussions and open-ended responses in surveys.

These can be recorded and made useful by means of text mining, for things like research and development, marketing and customer services. Some data mining services include a text mining feature.

Discovering knowledge in databases

Computer-aided mining is part of a complex process. Database specialists defined it as a standard in 1989 and called it ‘Knowledge Discovery in Databases’, or KDD for short.

This model aims to avoid making a source out of ‘primitive data sets’ – data containing no correlations. The phases of KDD constitute a ‘non-trivial process’, as specialists point out. They can be reiterated to increase the quality of analysis.

KDD produces valid, new, potentially useful and clear-to-follow patterns that are derived from the data

Infographic of the different phases of the knowledge discovery in databases

The knowledge discovery process

No data mining without Big Data

If you want to use data mining, you need ‘Big Data’, which means a large and relevant quantity of data sets. A simplified definition of Big Data is: an amount of data that no longer fits in an Excel table. Excel reaches its limit at 1,048,576 rows and 16,384 columns.

Data is created in so many places nowadays that Excel can be outgrown in mere minutes in certain businesses.

Data mining does not require any specific amount of data, it requires relevant data. But it can deal with plenty of bits and bytes. That is why we can safely say that Big Data is the right place for data mining.

The technical definition of Big Data is the systematic collection and storage of large, complex, fast-changing quantities of data.

These 6 Vs characterise Big Data:

  1. Velocity – the speed of collecting, processing and evaluating
  2. Volume – the quantity of data
  3. Variety – the diversity of complex data sets
  4. Veracity – truthfulness and credibility of data
  5. Value – how valuable data is to business
  6. Validity – securing data quality

A regular data server is not really big enough for storing and processing these quantities of data. It is worth working in collaboration with a data warehouse to process Big Data quickly and obtain real-time analyses.

CRM – a good source for data mining

If you document your customer relationships comprehensively and carefully in a Customer Relationship Management System (CRM), that is the best scenario for using data mining.

You can search for patterns in the data, and these can help you acquire new customers or animate customers who have not been active for a long time. You may even find ideas in the data about how to get back customers you have lost.

Data mining also helps you make better strategic decisions. The new knowledge influences campaigns and customer programmes as well as production processes and security concepts – not just once, but over and over again. If you analyse data in real time, you will respond much quicker to warning signs and successes.

Directly or indirectly, new knowledge derived from the data will boost sales, and therefore profits. It will help create value. The insights gained will help you develop new products and services and even new business models.

That is why data mining software is very useful and important for small and medium sized enterprises – even allowing them to overtake large businesses and corporations.

Check first, then analyse

Before you can begin data mining, you have to inspect and check the data material. Data often comes from a wide variety of sources such as databases, sensors and tracking.

This is the phase in which original data is gathered into data sets, making it more suitable for data mining. The key thing is to eliminate sources of error from the data collected.

That may include missing figures and wrong information. Data of that kind is called ‘noisy’. Inconsistent data also harms evaluations. It may include contradictory figures, such as an age that contradicts a date of birth.

Preparing data takes more time than the data mining itself. They often speak of a ratio of 80:20: 80 percent of the time is taken for preparation, 20 percent for analysis. The preparation of data depends very much on the question that is being investigated using data mining.

more information

Data mining methods

Various processes are then used to search for patterns and correlations. The focus will either be on observation questions or forecast questions.
  1. Outlier recognition: which objects do not follow the rules of interdependency, and why?
  2. Cluster analysis: what similarities occur a lot and can be gathered into groups in that respect?
  3. Classification: which predefined categories do these data belong to, to which they were not previously assigned?
  4. Association analysis: which two or more independent items correlate – and occur frequently together?
  5. Regression analysis: what relationship exists between one dependent variable and one or more independent variables?
  6. Predictive analytics: what predictions can be made using a variable?

Association analysis, for example, forms a foundation for online shopping recommendations. Banks use classification to check credit ratings. Clustering is the analysis process used to define groups for targeted advertising campaigns.

Data mining software: in-house, or in the Cloud?

There are various tools, all of which have pros and cons. That is why it has been proven useful to employ more than one tool for different tasks. Cloud-based products and Web services are good value and easy to scale up and down for additional users and analyses. That makes it easy to get going.

  • SAS: The leading supplier since 1976. This data mining software is used by many big clients. It is not cheap, but it is scalable. A graphical user interface makes it very easy to use.
  • KNIME: A team at Konstanz University has been developing this open source software since 2004. This process is now supported by a large global community of developers. There is also a commercial version.
  • Google Analytics: This free Web tool is easy to use for evaluating Web performance, social media campaigns and customer activity online.
  • Periscope Data: This Californian start-up successfully launched its Cloud-based service in the market. The company has since been bought by Sisense to augment its portfolio.
  • IBM Cognos Analytics: Not as well-known as IBM Watson, but just as clever. The tool offers self-service, it is scalable, and it can be used either in the Cloud or on your own system.

Data mining can start right now

If you think data mining will help you to digitalise processes and products, begin by inspecting all of your available data sources. You will then want to check the data quality: are the data complete, clear and correct?

You may want to include external data sources in your analysis. Some, like weather and traffic data, are public. Others may be licensable. Don’t begin the work alone: find a colleague to work with.

Data mining: one tool for all

Future employees will all be to understand and work with data, so nobody will need to programme computers, study data science or develop their own algorithms.

The most important thing you need as a user is curiosity. Your enquiring mind will pose the questions that data mining methods seek to answer.

All of today’s tools offer good ways of visualising results. Dashboards show users evaluations of issues relevant to them – personally configured and defined.

Soon your team will regularly be discussing hidden correlations and how to make use of them. Welcome to Data Driven Business!

Small and medium sized enterprises can quickly optimise business using #data #mining. How? The #jobwizards explain! http://bit.ly/2YqxFd9

CLICK TO TWEET
Future & Skills