Serverless Computing: The Next Step Towards Analytics in the Cloud?

Cloud Computing is becoming increasingly popular and is also gradually making its way into today’s business intelligence and analytics (BIA) architectures. Whereas Dave Wells and Stephen J. Smith already discussed the rise of cloud data warehouses [1,2] recently, this article focuses on a new concept that gains increasing attention in the tech community: Serverless Computing. Since serverless computing is a rather novel term, this article takes a look at the underlying ideas as well as benefits and downsides, and then discusses how it can be used in analytics architectures.

What is Serverless Computing?

First, serverless does not mean that there are no servers. Technically, serverless computing is a novel cloud execution model that avoids long-running virtual resources by running code in “stateless compute containers that are event-triggered, ephemeral (may only last for one invocation), and fully managed by a 3rd party” [3]. For illustration, think of a machine that is just turned on to run a certain function and afterwards turned off again. The difference with serverless computing is that this scenario happens in milliseconds. According to this “just run a function” idea, serverless computing is also often referred to as Functions as a Service (FaaS).

Figure 1. Serverless Computing in comparison to IaaS and PaaS

Figure 1 depicts the key difference between serverless computing and common Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) models. With IaaS, a cloud vendor merely provides “processing, storage, networks, and other fundamental computing resources” on-demand [4]. With PaaS, the cloud vendor additionally takes responsibility for the operating system and the necessary middleware to run an application. This provides great flexibility for developing applications, but still requires engineers to think about scaling and optimization. The serverless model introduces another abstraction layer that detaches scaling from the logic of an application.

This can lead to several benefits like:

  • Reducing costs: Serverless computing can reduce operation costs by enabling more accurate pricing. In PaaS models, virtual resources have to run nonstop to be able to answer requests permanently. Severless computing, in contrast, is event-driven. This means resources are created on the fly when necessary, and thereby you only pay for the computing time you really need (down to milliseconds). This is especially great if you have scenarios with occasional requests or traffic peaks.
  • Simplify Operations: With serverless approaches, a large part of operations is outsourced to the cloud vendor. Although the operations team still has to monitor and deploy applications, they spend less time tuning and scaling it (which can be very challenging).
  • Faster scaling and time to market: Due to its additional abstraction, deployment in serverless approaches is less complex. Often you just upload code to the vendor’s environment and are good to go. This also reduces time-to-market as it becomes easy to rapidly rollout and try new applications. Moreover, if an application becomes popular, it can easily be scaled to meet the user’s demand.

However, serverless computing is undoubtedly not a solution for all problems, and there are certain downsides that make PaaS or on-premises solutions often more adequate. Some key challenges are:

  • Loss of control: Moving more responsibilities to the cloud vendor also comes with a certain loss of control and risks like downtimes, cost changes and upgrades that may break your application or cost a lot of money [5].
  • Vendor Lock-In: Current serverless environments are very specific and there are no standards yet that ensure portability of applications between vendors. Moreover, many serverless vendors provide heavily interdependent ecosystems of products that usually are not compatible with the ones of other vendors. This leads to the fact that the move of one application can be difficult and may entail a slew of changes.
  • Limitations in development, monitoring and debugging: To put it short, the vendor chooses whatever programming language, monitoring or debugging possibilities are available in his environment. Moreover, the additional abstraction layer limits monitoring and debugging, as it is usually not possible to dig deeper into the underlying infrastructure or replicate problems in local environments.

How Serverless Computing fits into Analytics Architectures

 Figure 2 shows how serverless approaches can be used in an analytics architecture. Therefore, an analytics architecture is viewed as a data pipeline that i) extracts and transforms data from various sources, ii) persists them in a data warehouse (DW), and iii) transforms it again according to the needs of data targets such as data cubes or other analytical systems. From this perspective, an analytics architecture is a concatenation of storing and transformation processes, and it becomes obvious that the only difference in a serverless approach is that all or some of these transformations happen in stateless functions in the cloud.

Figure 2. Data Architectures viewed as Data Pipelines

Obvious serverless BIA use cases are ETL (extraction, transformation and loading) or stream processes, since both encompass some kind of data pipeline with consecutive tasks that can be split in FaaS items. Moreover, today’s stream processing most commonly happens in cloud environments anyway, which makes an adaption of FaaS even more likely. However, there are also more sophisticated approaches like serverless machine learning [6] or even serverless data warehouses [7].

Most of the benefits described above also hold true for serverless computing in analytics architectures. For instance, the precise billing can save costs at rarely run ETL or data mining processes. Furthermore, the simple scalability can help to handle peaks or fast growth. Last but not least, FaaS can amplify agility in an architecture, since it speeds up development and deployment, reduces complexity, and increases reusability.

However, the discussed downsides also make clear that serverless computing is clearly no silver bullet for BIA. For example, serverless environments are still very constrained by the vendors, which makes an integration in heavily customized BIA architectures challenging or even impossible. Moreover, the serverless market is still immature, and the lack of standards as well as the high risk of lock-in effects make the use of serverless approaches very risky, especially for critical parts of infrastructure. And naturally, all other concerns about the cloud in general are still valid for serverless computing, like security and privacy issues or the challenge to transfer large data sets from on-premises environments.

Conclusion

This article describes the concept of serverless computing on a high level and shows that it is not entirely new, but rather an evolution that makes cloud computing cheaper, easier to use and thereby more accessible. However, this simplicity also comes with substantial restrictions regarding control and customizability. Moreover, the serverless market is still in its infancies.

The second part of this article illustrates serverless use cases in BIA architectures like data transformation in ETL, stream processing or machine learning. Here, besides benefits like cost savings, the increase in agility is probably a major driver for serverless computing, since decoupling and reusing components can reduce complexity and allow to quickly reconfigure architectures. As soon as there are more standards and portable solutions, it is even feasible that we see FaaS marketplaces that allow to easily use or mashup functions from various vendors or developers [8].

In conclusion, serverless computing is currently a great way to quickly spin up prototypes or create temporary or very specific solutions. However, as the technology and market offerings mature, it is very likely that we will see more serverless approaches in productive enterprise environments.

Further reading

[1] (2017) Wells, D.: “On the Path to Modernization: Migrating Your Data Warehouse to the Cloud”
https://www.eckerson.com/articles/on-the-path-to-modernization-migrating-your-data-warehouse-to-the-cloud
[2] (2017) Smith, Stephen J.: “Cloud Data Warehousing: Producing the Infrastructureless Culture”
https://www.eckerson.com/articles/cloud-data-warehousing-producing-the-infrastructureless-culture

 [2] (2016) Roberts, M.: “Serverless Architectures”.
https://martinfowler.com/articles/serverless.html

[3] (2011) Mell, P., Grance, T.: “The NIST Definition of Cloud Computing”
https://csrc.nist.gov/publications/detail/sp/800-145/final

[4] (2016) Majors, C.: “OPERATIONAL BEST PRACTICES #SERVERLESS”
https://charity.wtf/2016/05/31/operational-best-practices-serverless/

[5] (2017) Casalboni, A.: “Serverless Computing & Machine Learning”
https://blog.alexcasalboni.com/serverless-computing-machine-learning-baf52b89e1b0

[6]  Why build a “Serverless” Data Warehouse?”
https://blog.slicingdice.com/why-a-serverless-data-warehouse-ae3b00146c26

[7] There are already similar concepts, but currently mainly trigged by single vendors, e.g. Amazon’s Alexa Skills that often base on an serverless AWS Lambda functions.

Julian Ereth

Julian Ereth is a researcher and practitioner in the field of business intelligence and data analytics.

In his role as researcher he focuses on new approaches in the area of big...

More About Julian Ereth