GPU + In-Memory Data Management for Big Data Analytics
In this project, we develop a toolchain based on In-Memory Data Management and Parallel Data Processing in the GPU using CUDA for large and intensive Smart Meter analytics. The global rollout of Smart Meters opens a new business paradigm for utilities with data collection/transaction at such a high volume and velocity. For instance, a million meters collect data at 15-minute intervals. If each meter reading is 1,000 bytes (1kB), then the total transaction data collected from one million customer meters will reach about 30TB per year.
Therefore, our aim is to utilize the processing power of the GPU and the high-throughput and low latency features of In-Memory databases to develop an adequate Big Data Analytics platform. A platform shall serve for an instant, in-depth analysis of massive volumes of Smart Meter data, towards advanced segmentation based on energy consumption patterns, and energy efficiency benchmarking. To achieve this goal, we research online learning methods to utilize new information to extend the existing knowledge bases.
The goal of our toolchain is to create a cognitive engine upon an application server with a web front-end interface.
The underlining GPU implementation focuses on a MapReduce schema with a vectorized interface. SAP HANA is our In-Memory computing platform which not only offers In-Memory data persistence, but also calculation logic (directly in the database) for extremely low-latency pre-processing.
Although the project is primarily concerned with the use case of Smart Meter data, the toolchain is also applicable for Big Data analytics in other domains, such as Industry 4.0, Smart Cities, Personalized Medicine, etc.
Yong Ding, email@example.com
Oct. 2014 – Apr. 2015