Virtual Infrastructure for CERN CMS data analysis

The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments, such as the CERN HEP analyses, requires continuous exploration of new technologies and techniques. We have developed a hybrid solution of an open source cloud with a secure indirection plane and network file system for CMS data analysis. Our aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis

The Architecture

The infrastructure is based on Openstack components for structuring a private cloud together with the Gluster filesystem. While virtualization management has progressed rapidly with most of the Openstack modules one important component that is currently not part of the Openstack suite is a cloud file system. To overcome this limitation we have used GlusterFS, a network-based file system for high availability. GlusterFS uses FUSE and is promoted as the 'first true unified file and object data storage' with the ability to scale to petabytes. With its Openstack connector once installed it can serve as the underlying file system for Openstack deployed clouds this way connecting Glusterfs to any hypervisor (KVM in our case), but also performing as the cloud NAS.

architecure1.png

Cloud based and Grid enabled infrastructure

We integrated the above state-of-the-art cloud technologies with the traditional Grid middleware infrastructure. This approach implies no changes for the end-user, while the production infrastructure becomes programmable and is enriched by the high-end resilient and scalable components. To achieve this, we have run Advance Resource Connector (ARC) as a meta-scheduler. Both Computing Elements (CE) and Worker Nodes (WN) are running on VM instances inside the Openstack cloud. Our solution has been credited as stable and has been running for few months fully in production handling jobs on daily basis. Currently we consider our approach as semi-static, as the instance management is manual yet provides scalability and performance. During the last year of the project, we investigated an elastic management solution by including the EMI authorization service (Argus) and the Execution Environment Service (Argus-EES) into the design.

Structure

Secure connectivity

We introduce indirection in the form of cryptographic addresses that allow us to decouple the virtual machines and servers in the datacenter in a secure way. The concrete implementation of the indirection architecture that builds on cryptographic addresses is implemented with the Host Identity Protocol (HIP). Our prototype system is able to support secure tunnels between VMs with VM migration support. Indirection is also very useful for load balancing and resource management purposes