Engineering Implementations in Deep Learning Recommender Systems

Zhe Wang; Chao Pu; Felice Wang

doi:10.1017/9781009447515.008

6 - Engineering Implementations in Deep Learning Recommender Systems

Published online by Cambridge University Press: 08 May 2025

Zhe Wang ,

Chao Pu and

Felice Wang

Book contents

Get access

Summary

While previous chapters discussed deep learning recommender systems from a theoretical and algorithmic perspective, this chapter shifts focus to the engineering platform that supports their implementation. Recommender systems are divided into two key components: data and model. The data aspect involves the engineering of the data pipeline, while the model aspect is split between offline training and online serving. This chapter is structured into three parts: (1) the data pipeline framework and big data platform technologies; (2) popular platforms for offline training of recommendation models like Spark MLlib, TensorFlow, and PyTorch; and (3) online deployment and serving of deep learning recommendation models. Additionally, the chapter covers the trade-offs between engineering execution and theoretical considerations, offering insights into how algorithm engineers can balance these aspects in practice.

Keywords

ML platform data pipeline big data ML training frameworks online deployment and serving engineering trade-offs

Information

Type: Chapter
Information: Deep Learning Recommender Systems , pp. 191 - 224

DOI: https://doi.org/10.1017/9781009447515.008 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Chang, Fay, et al. Bigtable: A distributed storage system for structured data. ACM transactions on Computer Systems (TOCS), 26.2 (2008): 4.Google Scholar

Ghemawat, Sanjay, Gobioff, Howard, Leung, Shun-Tak. The Google file system. 2003.Google Scholar

Dean, Jeffrey, Ghemawat, Sanjay. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51.1, 2008: 107–113.Google Scholar

Li, Mu, et al. Scaling distributed machine learning with the parameter server. 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), Broomfield, CO, USA, October 6–8, 2014.Google Scholar

Li, Mu, et al. Parameter server for distributed machine learning. Big Learning NIPS Workshop, 6(2), 2013.Google Scholar

Abadi, Martín, et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems: arXiv preprint arXiv: 1603.04467 (2016).Google Scholar

Abadi, Martín, et al. TensorFlow: A system for large-scale machine learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, November 2–4, 2016.Google Scholar

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.