Tuesday 28 December 2010

Monitoring

So, let's say, we opened our fruit shop, and said "I would like to sell fruits to all the people in this locality"; and then went along to define "Volumes". How would we do it? We would make some assumptions about

  • Growing end-users year on year - Let's assume that there are a 100 households around this place where we want to sell fruits. So, we'll say 100 households; with an average member count per household to be around 4 people per home. Say we plan that we will expand to 150 households in year 2 and 300 households in year 3.
  • number of users using the system concurrently - Let's assume that the shop is in the midst of the location, so, we will have roughly 50% people going across this shop each day; and needing to refresh fruits every 3rd day of the week. That means, about 50 people every 3 days coming over to buy fruits. Since each home has roughly 4 people and the fruits need to last couple of days, it means about 600 fruits will be used every 3rd day.
  • the pattern of peaks and troughs that is expected - People will come over either in the morning on their way to work or in the evening on their way back from work. On weekends, the number of people coming in will be higher. Lets assume equal split in morning and evening; so 25% of the people will come over in morning and 25% in evening; thus giving us a peak of roughly 25 people coming in across the morning 30 mins perhaps; and on weekends, say 70% of the entire community turns up with their families randomly across the day; giving us about 70*4 = 280 ; say 30% of these come in at once sometime during the day; so it would be around 84 people in the shop at once; that's about 21 households, lesser than the daily peak, but still more people in the shop ! :-)

Well, we made these assumptions; but once we set shop, only then we will know the true pattern. And to take note of the true pattern, we need to "monitor".

It's easy to monitor as a shopkeeper; keep a pen and paper and keep noting the timings when people come in and go out and what they buy; so that the quantities & varieties of fruits can be adjusted over days and months; and years as customer base grows. Over time, the shopkeeper learns more and more and builds a good judgement about the behaviour /likes/dislikes of customers and is able to predict & judge exactly how much he will need to stock when he opens a new shop or begins operating in a new town. Well, if a simple shopkeeper "capacity plans" in this way; why couldn't an IT system be capacity planned & managed using the same principles?

In it's simplest form of implementation, this type of monitoring of users' usage of an IT system; is done by writing database fetches or queries that run overnight in quiet times; fetching timings of various user actions; and then doing a post analysis to know what users are actually doing. However, whilst a shopkeeper keeps track of which customers bought which fruits (and/or demanded for which fruits) to determine how many fruits to stock; an IT system needs to convert the user transaction data into units of computational resources used; i.e. understand how users' requests are consuming computational resources like CPU processing power, RAM, paging, network bandwidth, storage space and so on; to be able to estimate how many IT resources will be required to support that user requirement.

So, monitoring is required at various levels;

  • the first level is to know what are users doing; and how this aligns with the assumptions we made at the start when we opened the shop (i.e website/ application/ service in IT terms)
  • the second level is to know how does what users do impact computational resource usage;
  • the third level is to know how our user base is growing; and predict what that means eventually to our computational resource requirements.

In most real life IT scenarios; some of these three levels may be extremely mature and some could be missing. It's only when all three levels of monitoring are present that accurate and predictive capacity planning is possible.

There is no missing a key point - that monitoring and tracking alone provides a view of reality and how it aligns to assumptions made; the moment a difference from assumptions or past trends is observed; it is reason enough to trigger off a whole lot of questions to explain the causes behind the difference - this approach when followed religiously leads to amusing revelations of how systems work & their bottlenecks; but the key is to put the "appropriate" monitoring & tracking mechanisms in place upfront for all major IT systems to measure & quantify variances between reality & assumption. For eg. in our fruit seller example, if the seller did not note the number & types of fruits sold at various times of day and the choices of the people; and instead relied on noting the timestamp of when he ran out of his stock of apples or oranges each day; it would hardly help the purpose to stock enough and earn well (both the praise of the customers and money) !! It is quite important to first make the right assumptions and then to measure the right things; in line with the assumptions made.

Volumetric assumptions made at design stage & Monitoring in-life go hand in hand. Thus it is a very good idea to design the monitoring along with the system, during the design phase of an IT system; however this tends to happen only in a very mature performance engineering practice.

No comments:

Post a Comment