Design Productivity Enhancement Through Nbs & Netbatch
By: Mike • Essay • 1,963 Words • May 22, 2010 • 1,625 Views
Design Productivity Enhancement Through Nbs & Netbatch
Introduction:
Why NetBatch? At my workplace, we have way more computing needs for the number of machines we own. Hence, it would be economically infeasible to buy enough machines to satisfy our peak consumption, which is growing constantly. NetBatch is a tool, which allows our organization to maximize utilization of the available computing resources. This paper discusses about NetBatch and NBS, a package around NetBatch that handles job management, which use principles of queuing, job scheduling, sequencing to achieve its goals.
How does it work? Each person has a computer on his or her desk that is a source of computing power. When that person isn’t using that computer to do interactive work, it sits idle. With NetBatch, however, we can take advantage of those untapped hours of computing time. At night, whenever a person is absent from work or any time when a computer is not being used to some predefined utilization, NetBatch can run jobs there. Users who are in need of computing power submit “jobs” on such machines subject to a few restrictions. NetBatch queues the jobs and runs them when they are at the front of the queue and when an appropriate machine is available. This allows us to accommodate peak loads by distributing the demand across a large number of machines at all times. Typically, different projects are on different computing cycles, so one group may be in a slump when another is peaking and NetBatch provides a good solution for the needs of the entire design community in our organization. An overview of the job submission process is provided in Appendix A. This describes the flow of a typical job from the time a user has a need to perform a computing task to the time the job completes, or crashes.
NetBatch: Structure
NetBatch terminology:
Each user picks an allocated pool of the netbatch, the class of machines to run the jobs on and a queue slot priority flag defined by qslot and submits a computing job. Pool is a set of machines that can run NetBatch jobs. Each pool consists of one master machine and a number of servers. The master machine monitors the status of all machines in the pool, such as processor load, number of interactive users, Qslot weights, and queues the jobs submitted, and schedules the jobs on the servers . Classes are a mechanism that allows users to match jobs with suitable machines. Jobs should only run on machines that meet the job specific requirements. Individual machines may run many job classes and a same class may exist across several pools. Qslots are a scheduling mechanism (mostly FCFS) for NetBatch to prioritize tasks and schedule them accordingly. They are similar to checkout lines in a grocery store. Each Qslot has a “weight” which determines what percentage of the available resources it can use. In our environment we allocate Qslots with weights proportional to the resources forecasted for each project across the organizational teams. Within each Qslot, hierarchical nodes can be defined that allow different job types/subgroups within the project to share the allocation and further divide the allocation among themselves. A NetBatch master with several equally weighted qslots “Fair-shares” the jobs between the qslots.
To summarize, NetBatch is basically a job-scheduling tool with the following structural aspects:
• The arrivals of jobs into NetBatch are random and more than one job is usually submitted at a time.
• The service times depend upon several factors such as the machine size, memory and other machine characteristics, nature of the jobs submitted by users etc. Thus, there is a great deal of uncertainty in the processing times of the jobs.
• The service discipline is complicated and is handled by a job management/scheduling tool (NBS), which will be described later in this paper. The service discipline is basically determined by the pool which can take on values such as express_pool or a regular batch pool, the class which characterizes to an extent the amount of computing time, resource required ex: 2hours (for a job that need to finished in approx 2 hours), 4hours etc., and the qslot which determines priority over jobs competing for compute server resources.
• The queue size can be practically considered unlimited as mostly the queues are unlikely to get filled.
• NetBatch is a multi-server networked system with several sets of machine pools, each pool having several different types of machine distinguished by classes.
The user of multiple Pools segregated by batch/interactive usage, model etc provides several advantages such as a scalable infrastructure, i.e each pool can hold more wait jobs,