![]() |
NOVA Networked Object-Based EnVironment for Analysis |
| Motivation |
The new generation of HENP experiments such as those at RHIC and LHC must contend with processing and mining heretofore unprecedented amounts of data with highly complex analysis software developed and used by large worldwide communities of physicists. Object oriented (OO) programming has been identified and adopted by these communities as an efficient and powerful approach to developing capable, robust, maintainable software in this environment.Vital to fully realizing the benefits of object oriented technology is careful attention to how this technology is employed and delivered In our view, object oriented frameworks -- integrated sets of classes providing solutions to problems of large-scale of data analysis -- provide the necessary foundation to dramatically improve the software development process for future applications.
The BNL-based core computing group of RHIC's STAR experiment has developed an object oriented analysis framework that itself builds on prior STAR work on the STAF analysis framework and an ongoing program of collaboration with the CERN team developing the ROOT system. The ROOT system is the foundation of STAR's analysis framework. The NOVA (Networked Object-based Environment for Analysis) project seeks to leverage the expertise developed through this existing BNL effort to develop an application-neutral, object oriented analysis framework that incorporates the latest standards and technologies in component software, component middleware and distributed computing to support a new level of distributed object oriented physics analysis.
Several barriers confront the application developer when moving to OO software: the need for retraining and adaptation to a new approach to designing software; the learning curve that must be climbed in order to produce quality OO software designs; and the cost in time and effort of establishing a functional and productive infrastructure capable of supporting OO application development. NOVA will help to solve these problems by providing an object-oriented infrastructure, a consistent application programming model, and data analysis templates which can be used to start building applications. These components will ease the transition to object oriented technology by providing a well-tested baseline of functionality and services. Physicists can design their solutions using a proven programming model instead of developing (reinventing) a unique approach. Finally, the use of a shared architecture will make it easier to integrate solutions from different experiments.
The success of this effort with constitute a first step in establishing an important new computational science component in the research effort at BNL. It will lead to a BNL-supported software product that will provide new capabilities serving BNL and HENP community physicists participating both in BNL-hosted research such as the RHIC program and in worldwide collaborations such as the LHC. It will improve the depth and visibility of the Laboratory's contribution to HENP community software and better position the Laboratory for important roles in computing and software for near and long term projects.
| Goals |
To develop a set of software tools forwhich can be applied in many varied global computing environments (RHIC, LHC, muon collider...). These tools will be used via implementation-neutral interfaces, with select implementations provided for products of wide application or interest in the community (eg. ROOT, Objectivity, and the Grand Challenge HENP data access project), to the extent possible with the available manpower.
- coordination of software development, use and maintenance in a widely distributed community
- distribution, control and monitoring of physics analysis in a distributed, heterogeneous computing environment
- distributed data access, large database bookkeeping and file cataloguing, data locality management
- promoting enhanced robustness and reusability of software through object oriented and component software techniques
| Approach and Architecture |
Many well-developed experiments already have established object oriented frameworks in production or under development. However, the present generation of object oriented analysis frameworks are very limited in several respects:The NOVA project will not reinvent or evolve existing analysis frameworks, but rather will provide new capabilities in these areas.
- Support for distributed analysis in large, geographically dispersed collaborations
- Management of the complete analysis process from analysis software development through data set selection, data retrieval, bulk data filtering, analysis production, results analysis and iteration of the process
- Integrated support for tracking, validating, and debugging physics analysis in a collaborative environment in which effective communication and documentation of the analysis process is both important and difficult
Analysis frameworks have typically been large monolithic systems requiring full 'buy-in' to the system in order to make use of them, with ROOT being an example of such a "vertically integrated" system. A more recent trend, which we will follow, is to develop modular components providing application-neutral interfaces which can be used in isolation to extend the capability of existing analysis systems. The NOVA framework will consist of small, interoperable components designed for flexibility and ease of reuse. We will provide select implementations of these components using HENP and software community standard tools and will integrate and test them with at least one large "vertical" analysis framework (ROOT).
We will focus principally on supporting C++ based analysis. This is the analysis software language for all RHIC, LHC experiments and most other large experiments. Other efforts (JAS) are underway to develop distributed OO analysis frameworks based on the Java language.
NOVA will be developed using an iterative process driven by user participation and closely coupled to prototyping in real-world experiments (STAR, ATLAS).
| Tools and technologies |
Requirements for tools and technologies employed in NOVA
Existing experience and the evolutionary path of HENP computing suggest requirements which should be met by tools and technologies employed within NOVA:Following these requirements we have identified the following tools for application in the development of NOVA.
- Should be free or nearly so, such that the buy-in cost of using the system is very small
- Should be widely used, true or defacto standards, with good support and showing good growth
- Should be known within or on a growth path within the HENP community
ROOT
- An object oriented toolkit for HENP analysis, visualization, and I/O that supports hierarchical OO data models
- Baseline implementation layer for analysis server framework
- Client analysis tools and visualization
- OO persistency for event, analysis data
MySQL
- Open Software relational database for data catalogue, event store navigation, mobile analysis client state persistency, software signature management
XML
- Software source distribution between mobile analysis client and analysis server
- Analysis filter, query specification
Apache web server modules
- Distributed client/server communication between mobile analysis client and analysis server
- Web-base control, monitoring
CORBA
Developments since our original proposal have led us away from some tools under consideration at that time:
- Component middleware for low-volume control data
- NILE will not be pursued. Delivery delays have continued -- a working product has not been delivered -- and the principal developer has left the project.
- STAF has been retired in favor of ROOT in both the large experiments employing STAF (STAR and PHENIX). ROOT supports hierarchical OO data models unconstrained by CORBA IDL limitations that exist in STAF.
| Implementation Approach to Project Goals |
| Distributed Software Management |
- Generic distribution, management and version coordination tools layered over CVS and integrated with mobile analysis client, problem reporting system, code navigation system, discussion system. Doesn't presume anything beyond CVS (eg. no SRT)
| Distributed Analysis |
- Client/server model with mobile analysis client (JAS-like, but C++ rather than Java based) served by central analysis server, with communication and data exchange via standard protocols (ftp, http and sockets for data exchange; HTML/XML and CORBA for control and monitoring)
- Analysis monitoring tool based on maintenance of client and batch server state information in an Apache web server module
- Analysis control tool based on XML and web protocols (or CORBA)
| Event Data and Distributed Data Access |
- Central data and file catalogues implemented in MySQL
- Data locality monitoring and control; large database bookkeeping
- Monitoring and control integrated with Grand Challenge
- Data storage implementation based on ROOT I/O
- Data model specification portable to other implementations
- Import/adaptation tools for other implementations (eg. Objectivity)
| Software Robustness and Reusability |
- Dynamic customization of a stable core framework through shared libraries
- Software signatures (shareable library IDs) stored in mobile analysis client state DB for assured reproducibility and invariance of analysis environment
- Persistent coupling of methods and data
| Project Domains and Components |
Bold items are project component deliverables, either fully implemented in the project or third party tools customized or extended for NOVA. Non-bold items are third party tools used by (or with provision for use by) by NOVA. Italic items are application components used with NOVA.
| Data Management Domain |
- Catalog interface
- Data Catalog
- Data Repository
- Grand Challenge Architecture (GCA)
| Analysis Server Domain |
- Analysis daemon
- Analysis catalog
- Offline control framework
- Dynamically loaded applications
- CVS code repository
| Mobile Analysis Domain |
- Mobile analysis client
- ROOT analysis
- nanoDST
- Grand Challenge Architecture query
- Web browser
| Web Middleware Domain |
- Analysis client state server
- Client state database
- Analysis monitoring module
- Bug system
- HyperNews discussion system
- Web server
| Schedule and Milestones |
Consistent with the provided funding level, the dedicated manpower devoted to this project consists of one full-time person for one half year, augmented by contributions from the project leader (Torre Wenaus) and from several existing BNL STAR Computing group members (Valery Fine, Victor Perevoztchikov, Jeff Porter). The dedicated developer (Sasha Vanyashin, a visitor to BNL from Royal Institute of Technology, Stockholm) arrived April 22. Accordingly, dedicated effort and milestones are clustered in the second half of the year, with limited design work and prototyping taking place in the first half of the year.
| Milestone | Activity | Deliverable | Schedule |
| 1. | Design, prototyping | Status report | April |
| 2. | Implementation, testing | Status report and year two plan | August |
| 3. | Documentation, refinement | Delivery, final report, users manual | September |
| References |
| Torre Wenaus, Sasha Vanyashin |
|