As mentioned previously, we explored the rudimentary beginnings, from a specific perspective twenty years ago, of Internet of things (IoT), Data Monetization and Data Science as a Service (DSaaS). The foundation of the perspective was predicated on technology, statistics and competitive advantage sought by the William Wrigley Junior Company. Albeit a stretch for some, if one truly comprehends the nature of analytics and the evolution therein, the plausibility of this tangential use case reveals itself. To quote Deloitte “Researchers and scientists were into analytics before it was cool”. I simply say “…we were into analytics before it was cool”!
Fast forward to present day 2016. We have computing power in laptop devices that rivals many of the larger mainframe and departmental machines of the day. We have solid state storage, memory in the googol (10100) range (not really but it has leaped in staggering proportions), we have smart hand held devices that are connected worldwide, and, of course, we have the internet.
So what of modern IoT? We now have smart devices that link our world to everything. Most of us carry one every day in the form of an iPhone or android device. Most of our world today is smart chip (IoT) compliant or is quickly being assimilated; power tools (Milwaukee Tool), heavy equipment (Caterpillar), building infrastructure (Johnson Controls), hospitality (energy companies) and of course healthcare with telemedicine. We even have residential IoT devices that we use every day; door bells with cameras, smart phone savvy door locks and heating / cooling system controls (nest). Unbelievable. These devices are not only Bluetooth compliant, they speak internet lingo and many are cellular capable as well (Electric Imp). With the explosion of IoT devices and of course their connectivity, we have a myriad of “streaming data” sources; that is data from devices as they are being used as well as some that constantly stream. What do we do with it? Collect it of course! Enter big data with technologies such as Hadoop and Spark. Streaming data from IoT devices that is both analyzed on the fly (steaming analytics) and or stored in a data lake for future analysis. Why? For the purpose of machine learning and data monetization.
It is a little known fact that one of the top three companies on the planet involved in IoT, big data, machine learning and data science (around IoT data) is Caterpillar (Smart Iron). Caterpillar utilizing big data tools, streaming analytics and machine learning algorithms can compute, derive and predict incredible things surrounding their equipment on job sites; no matter how remote. As an example of IoT at work and its inherent complexity, Caterpillar operates in the tar sands of northern Canada where oil embedded in sand deposits is harvested and extracted to produce crude oil. There are no internet connections (that I know of) in the tar sands. Thus, Caterpillar erects satellite stations that collect streaming data, perform analysis and also relay streaming information to a central data lake far from the tar sands. Very clever, very complex and extremely innovative.
So why all the trouble, expense and elaborate technology? Data monetization. Caterpillar stores and analyzes all of this information to make predictions about job site expectations, equipment performance, climate differentiators and a myriad of other variables. Not only is this invaluable to companies utilizing Caterpillar equipment, it provides a plethora of valuable information into the design and construction of future Caterpillar machines. This is one example of IoT and data monetization. There are many other stories, across multiple industries that exude the same monetization derivation.
So we have explored IoT and touched on data monetization. What about data science? Data Science, as we know, is an interdisciplinary field that extracts knowledge or insights from data in various forms. I like to broadly categorize, as do many others, analytics into three categories, namely descriptive, predictive and streaming. Many others will argue that the final category is “prescriptive” analytics, which we will get to. Descriptive analytics are for all intent and purpose our basic statistical and business intelligence measures not based on probabilistic theory; mean, mode, median, histograms, etc. We see these on a daily basis from simple dash boards to complex reports, all based on historical data. Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events; all based on probabilistic modeling. As an aside, machine learning is a subfield of computer science that evolved from artificial intelligence. Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction - in commercial use, this is known as predictive analytics. These analytical models allow researchers, data scientists, engineers, and analysts to "produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning from historical relationships and trends in data. Streaming analytics is the application of analytics to data while it’s in motion, and before it’s stored – and includes data manipulation, normalization, cleansing and pattern of interest detection. Streaming analytics may (typically) utilize both descriptive (stored) and predictive analytics (models). With streaming analytics, the goal is to analyze data as close to the event location as possible, before its value is diminished due to information lag and before the volume of data overwhelms traditional analytic techniques.
Many would argue that there is yet another statistical Jedi; prescriptive analytics. It is popular belief that this term was coined by IBM much like the term “big data” was coined by the CTO of Pentaho. I am on the fence here. I can make the argument that prescriptive is another branch of analytics; however, upon close examination, prescriptive may be predictive on steroids. Call it as you may. I won’t argue.
As stated previously, “…we were into analytics before it was cool”. Now it is cool, in vogue and for many mystical in nature. Enter Data Science as a Service or DSaaS. Data is being collected and analyzed, in many different modalities and presented as a service for the purpose of monetization. There are specialty DSaaS companies based out of MIT, Stanford and Harvard now up and actively monetizing data. There are several models for DSaaS as you may surmise. Three obvious models exist (and of course hybrids therein):
- pure on-site consulting
- remote connection to the clients data using a hybrid of tools and techniques
- extraction and transformation of the data into a private cloud environment the client may access encompassing all the necessary tools, security and regulatory compliance as a service
The Data Sciences Group at NRC aspire to the latter model for several reasons including:
- data scientists may reside anywhere
- no hardware or computing outlay for client
- all tools and software on-site without client purchase
- lends itself well to smaller clients seeking end-to-end BI capabilities including DSaaS
- tools sets are known so data scientists do not have to adapt to client tools
Please stay tuned for further updates as we continue to rollout this new approach at NRC!
Want to learn more about what the Data Sciences Group can do for your company?