With the deep learning framework PaddlePaddle widely used in Baidu, PaddlePaddle platform based on Baidu private and public cloud also landed accordingly. In terms of technology selection, the traditional technology like Slurm cannot meet the fault tolerance and auto-scaling capacity of PaddlePaddle, and there is a lack of flexibility for the support of heterogeneous hardwares like GPU, FPGA, and RDMA. Therefore, we introduced Kubernetes in the next-generation platform design to achieve the platformization of the deep learning framework. In this session, we will share the architecture of Baidu's deep learning platform PaddleCloud and the series of work we have done at IaaS and CaaS.
Smart devices have become more and more widely used today, and user scenes such as the Internet of Things(IoT) require a backend with complex computing capabilities. The traditional backend deployment relied on a cloud server. Until Baidu Function Computing came out, function invoking is so easy and the price is so low so that a cloud server is unnecessary. In addition, Baidu DuerOS Ecology, Internet of Things(IoT) Ecology, and Edge Computing have increased the ability of CFC. This session share the architecture of CFC and the practice of combining usage with Baidu's AI ecology.
Many deep learning frameworks use Python as the frontend language binding and CPP to build backend execution engine, but Python has many limitations such as its poor speed.
PaddlePaddle fluid is designed as a new deep learning programming language, it uses ProgramDesc as IR to describe the execution process of a neural network. ProgramDesc is composed by blocks, a block is a sequence of operations, each block will have its own scope, just like the stack, but they are different because backward pass needs to use the forward scope. Control flow operations like if/else/while is first class citizens in Fluid.
PaddlePaddle fluid tries to build a new deep learning programming language to make it easy to describe and train the neural network.
As the big data era comes, computing endless data in real-time has become a necessity in many scenarios. Take Baidu as an example, trillions of data comes to real-time computation platform everyday. From the year of 2011, DStream, a true streaming computation engine with its own scheduler system have been proposed, implemented and put into practice. It supports low-level but flexible API and configuration. Moreover, it support logging / monitoring / paging / tracing / releasing / dictionary / etc., which are crucial in production. Along the time, as DStream are for developers and needs learning curve, Spark Streaming are introduced for data scientists. Our team follows the Spark community. Moreover, best practices with DStream in production complexity are contributed back to SparkStreaming. We adapt Baidu home-brewed storage, messaging system, PaaS, etc., to Spark Streaming. In this session, we’d like to share our experience with DStream and Spark Streaming in Baidu.