Introduction to grid computing

A scientist studying proteins logs into a computer and uses an entire network of computers to analyze data. A businessman accesses his company’s network through a PDA in order to forecast the future of a particular stock. An Army official accesses and coordinates computer resources on three different military networks to formulate a battle strategy. All of these scenarios have one thing in common: They rely on a concept called grid computing. Clusters And Grid

A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.Grid computing is something similar to cluster computing, it makes use of several computers connected is some way, to solve a large problem.

The big difference however, is that a cluster is homogenous while grids are heterogeneous. The computers that are part of a grid can run different operating systems and have different hardware whereas the cluster computers all have the same hardware and OS. A grid can make use of spare computing power on a desktop computer while the machines in a cluster are dedicated to work as a single unit and nothing else. Grid are inherently distributed by its nature over a LAN, metropolitan or WAN. On the other hand, the computers in the cluster are normally contained in a single location or complex.

Also in Clusters, all nodes are set to perform a same task, controlled and scheduled by some same application (OS) In Grid Computing, nodes perform different tasks and may be running diffrent applications independently. So, a Grid may also consist of several Clusters.Let us take an eg. Consider 3 colleges GEC, PCC and RIT.Now we connect the GEC systems together to form a cluster so that there is efficient sharing of resource and power.

Similarly connect all systems of PCC and also RIT so that they form PCC and RIT clusters.Suppose GEC systems run on windows, PCC on linux and RIT on MAC.Now when we combine each of these clusters from different places or institutions we form a grid!Now ofcourse someone has to have control over this grid as a whole…which we call as the‘Middleware’.

Now what is middleware?It is the software that organizes and integrates the disparate computational facilities belonging to a Grid.The various tasks it carries out is seen in the figure. One of the main strategies of grid computing is using software to divide and apportion pieces of a program among several computers, sometimes up to many thousands. The middleware is like the “brain” of the grid

The “nervous system” that communicates between the different parts of the Grid is the network So how do you run a job on a grid? Many jobs and subjobs enter the grid. They are stored in a job queue. The scheduler allots each job to a CPU resource. This job is then executed! As simple as that.. So the middleware is like the manager and the CPU resources are the employees.The manager only takes care of distributing the job to the employees.Of course information security authentication are the additional features that our manager carries out here.. Grids address two distinct but related goals:

1.Providing remote access to IT assets2..Aggregating processing power.If this is so then how is the web different from a grid??? Web is a service for sharing information over the Internet The Grid is a service for sharing computer power and data storage capacity over the Internet . Some primary focus of the grid are: high performance parallel computing geographically dispersed collaboration virtual resources virtual organizations and large datasets.

So why use a grid when we have a supercomputer?The primary advantage of grid computing is that each node can be purchased as commodity hardware, which when combined can produce similar computing resources to a multiprocessor supercomputer, but at lower cost. This is due to the economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small number of custom supercomputers.In other words many desktop computers are cheaper than a small supercomputer! Secondly a grid has no computing upper and lower bounds.

The size of grid computing may vary from being small — confined to a network of computer workstations within a corporation, for example — to being large, public collaboration across many companies and networksFor a small small institution we can use a smaller grid or simply a cluster.And as the demand and size of the institution grows, so will the size of the grid!Unlike supercomputers where high speed might not be needed for which the user has no choice but to pay the same higher price. However it has its own shortcomings

Firstly the primary performance disadvantage is that the various processors and local storage areas do not have high-speed connections. One more disadvantage of this feature is that the computers which are actually performing the calculations might not be entirely trustworthy. The designers of the system must thus introduce measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results, and from using the system as an attack vector. At its most basic level, grid computing is a computer network in which each computer’s resources are shared with every other computer in the system.

Processing power, memory and data storage are all community resources that authorized users can tap into and leverage for specific tasks. A grid computing system can be as simple as a collection of similar computers running on the same operating system or as complex as inter-networked systems comprised of every computer platform you can think of.However who will pay for the grid still remains a MILLION DOLLAR question!!!!