Software : Cloud Computing : Data & Analytics

Website | Blog | Video

San Francisco, California, United States


With origins in academia and the open source community, Databricks was founded in 2013 by the original creators of Apache Spark™, Delta Lake and MLflow. As the world’s first and only lakehouse platform in the cloud, Databricks combines the best of data warehouses and data lakes to offer an open and unified platform for data and AI.

Assembly Line

Part Level Demand Forecasting at Scale


Authors: Max Kohler, Pawarit Laosunthara, Bryan Smith, Bala Amavasai

Topics: Demand Planning, Production Planning, Forecasting

Organizations: Databricks

The challenges of demand forecasting include ensuring the right granularity, timeliness, and fidelity of forecasts. Due to limitations in computing capability and the lack of know-how, forecasting is often performed at an aggregated level, reducing fidelity.

In this blog, we demonstrate how our Solution Accelerator for Part Level Demand Forecasting helps your organization to forecast at the part level, rather than at the aggregate level using the Databricks Lakehouse Platform. Part-level demand forecasting is especially important in discrete manufacturing where manufacturers are at the mercy of their supply chain. This is due to the fact that constituent parts of a discrete manufactured product (e.g. cars) are dependent on components provided by third-party original equipment manufacturers (OEMs). The goal is to map the forecasted demand values for each SKU to quantities of the raw materials (the input of the production line) that are needed to produce the associated finished product (the output of the production line).

Read more at Databricks Blog

Using MLflow to deploy Graph Neural Networks for Monitoring Supply Chain Risk


Organizations: Databricks

We live in an ever interconnected world, and nowhere is this more evident than in modern supply chains. Due to the global macroeconomic environment and globalisation, modern supply chains have become intricately linked and weaved together. Companies worldwide rely on one another to keep their production lines flowing and to act ethically (e.g., complying with laws such as the Modern Slavery Act). From a modelling perspective, the procurement relationships between firms in this global network form an intricate, dynamic, and complex network spanning the globe.

Lastly, it was mentioned earlier that GNNs are a framework for defining deep learning algorithms over graph structured data. For this blog, we will utilise a specific architecture of GNNs called GraphSAGE. This algorithm does not require all nodes to be present during training, is able to generalise to new nodes efficiently, and can scale to billions of nodes. Earlier methods in the literature were transductive, meaning that the algorithms learned embeddings for nodes. This was useful for static graphs, but the algorithms had to be re-run after graph updates such as new nodes. Unlike those methods, GraphSAGE is an inductive framework which learns how to aggregate information from neighborhood nodes; i.e., it learns functions for generating embeddings, rather than learning embeddings directly. Therefore GraphSAGE ensures that we can seamlessly integrate new supply chain relationships retrieved from upstream processes without triggering costly retraining routines.

Read more at Ajmal Aziz on Medium

Optimizing Order Picking to Increase Omnichannel Profitability with Databricks


Authors: Peyman Mohajerian, Bryan Smith

Topics: BOPIS, Operations Research

Organizations: Databricks

The core challenge most retailers are facing today is not how to deliver goods to customers in a timely manner, but how to do so while retaining profitability. It is estimated that margins are reduced 3 to 8 percentage-points on each order placed online for rapid fulfillment. The cost of sending a worker to store shelves to pick the items for each order is the primary culprit, and with the cost of labor only rising (and customers expressing little interest in paying a premium for what are increasingly seen as baseline services), retailers are feeling squeezed.

But by parallelizing the work, the days or even weeks often spent evaluating an approach can be reduced to hours or even minutes. The key is to identify discrete, independent units of work within the larger evaluation set and then to leverage technology to distribute these across a large, computational infrastructure. In the picking optimization explored above, each order represents such a unit of work as the sequencing of the items in one order has no impact on the sequencing of any others. At the extreme end of things, we might execute optimizations on all 3.3-millions simultaneously to perform our work incredibly quickly.

Read more at Databricks Blog

Virtualitics’ integration with Databricks sorts out what’s under the surface of your data lake


Organizations: Virtualitics, Databricks

Databricks users can benefit from Virtualitics’ multi-user interface because it can enable hundreds more people across the business to get value from complex datasets, instead of a small team of expert data scientists. Analysts and citizen data scientists can do self-serve data exploration by querying large datasets with the ease of typing in question and AI-guided exploration instead of writing lines of code. Business decision makers get their hands on AI-generated insights that can help them take smart, predictive actions.

Read more at Virtualitics Blog

How to Build Scalable Data and AI Industrial IoT Solutions in Manufacturing


Authors: Bala Amavasai, Vamsi Krishna Bhupasamudram, Ashwin Voorakkara

Organizations: Databricks, Tredence

Unlike traditional data architectures, which are IT-based, in manufacturing there is an intersection between hardware and software that requires an OT (operational technology) architecture. OT has to contend with processes and physical machinery. Each component and aspect of this architecture is designed to address a specific need or challenge, when dealing with industrial operations.

The Databricks Lakehouse Platform is ideally suited to manage large amounts of streaming data. Built on the foundation of Delta Lake, you can work with the large quantities of data streams delivered in small chunks from these multiple sensors and devices, providing ACID compliances and eliminating job failures compared to traditional warehouse architectures. The Lakehouse platform is designed to scale with large data volumes. Manufacturing produces multiple data types consisting of semi-structured (JSON, XML, MQTT, etc.) or unstructured (video, audio, PDF, etc.), which the platform pattern fully supports. By merging all these data types onto one platform, only one version of the truth exists, leading to more accurate outcomes.

Read more at Databricks Blog