Friday, November 5, 2010

... and ipset comes to the rescue

We require that hosts located on certain subnets register before they are allowed to use the full capabilities of our campus network. By default, all hosts on such a network are blocked, until the registration is performed. While blocked, however, they need to get redirected to the registration page. For years now, we have used iptables for dynamic dns redirection when performing these host registrations.

To achieve this, a source NAT mapping is generated to a lying dns server when a host on a network that requires registrations. This determination is done by a realtime processor script acting on DHCP transitions. Once a device has been registered the entry is updated to point to the true DNS and and the iptables entry is updated.

Unfortunately, iptables became a performance bottleneck. The finite time for inserting and deleting a rule for a chain increases as the number of rules in the table increases. In addition, adding a rule takes two calls to iptables (one to list and make sure the rule does not exist already, a second one to actually add the rule). This is necessary to prevent duplicated rules in the same chain.

As the demand for our service increased over the years, the number of rules in iptables increased along with it. Eventually the realtime processor script became "almost realtime" and then "almost never in time" during peak periods of use.

Analysis showed that the average time to add or delete an iptables rule was 0.17s (over 100k samples). At times where the DHCP transition rate could be as high as 1000/min, the iptables updates ran several minutes behind, making the registration process effectively unusable.

We researched an alternative method of storing the host ip address mappings using ipset (http://ipset.netfilter.org/). Designed to work with iptables, ipset is basically a very efficient bucket for storing network address (ip and port) information. Time to add and remove addresses is almost constant and much smaller than using iptables at 0.0065s (over 100k samples). This allows more head room for capacity!
toolaverage timestdev
iptables0.17s0.12
ipset tested0.0065s time0.034
pset in production0.008s0.027

Thursday, September 23, 2010

Jolly times with the postgresql optimizer and explain analyze

Suddenly yesterday one of my rex functions (REX is our REST/SQL mapper tool) jumped off a cliff, performance-wise. With a hint from Nick (who will forevermore be known as "the database whisperer") I was able to narrow the performance problem down to this particular query in the function that REX is calling, specifically the family(network)=4 element of the where clause:

SELECT DISTINCT network, sector.sector::text
FROM nfa_k20_network
LEFT JOIN info_site ON info_site.siteid = nfa_k20_network.site_id
LEFT JOIN dev_location on dev_location.site_id = info_site.id
LEFT JOIN dev_classification on dev_classification.device_id =
dev_location.device_id
LEFT JOIN sector on sector.id = dev_classification.sector
WHERE sector.sector is not null and family(network) = 4;

But I was unable to get the optimizer to generate the same plan on netdbeval/allyourbase as I got on netdbdev. I'm assuming this is an indication that the data size in one of the tables tipped the query optimizer over into selecting a less ideal strategy, although I'm not clear
on how or why.

The good strategy uses hashes, the bad one a nested join and a filter.
Here's where things start to run amok:

Good (netdbdev) http://explain.depesz.com/s/ukE

Hash Join (cost=3185.37..4996.42 rows=1 width=28) (actual
time=953.056..1227.372 rows=7379 loops=1)
* Hash Cond: (public.dev_state.device_id = interface.device_id)

Bad (netdbeval) http://explain.depesz.com/s/TI6

Nested Loop (cost=1851.83..8488.08 rows=1 width=34) (actual
time=247.812..404814.965 rows=1465 loops=1)
* Join Filter: (interface.device_id = public.dev_state.device_id)

Wowsers! Check out how the actual time compares to the estimated cost!

Explicitly disabling nested loops in psql before trying this query caused it to go back from taking several minutes to taking 3 seconds, so we clearly want to get postgres to use the first strategy.

The choice seems to be driven by the estimated row count that ifc_ip will return:

ripit_2=# explain analyze select count(*) from ifc_ip where family(ip)=4;
QUERY PLAN

------------------------------------------------------------------------------------------------------------
Aggregate (cost=268.85..268.87 rows=1 width=0) (actual
time=15.433..15.434 rows=1 loops=1)
-> Seq Scan on ifc_ip (cost=0.00..268.70 rows=61 width=0) (actual
time=0.048..9.489 rows=8498 loops=1)
Filter: (family(ip) = 4)
Total runtime: 15.481 ms
(4 rows)

Note that the estimated row count is 61, while the actual row count returned is 8498.

I tinkered with a number of settings (work_mem, effective_cache_size) to see if I could drive the query planner to go back to making better choices. I also tried analyzing and reindexing the table. The basic problem seems to be how much filtering the query planner expects the "family(ip)=4" clause to result in and I don't see how the pg_statistics table could capture that without adding an index (heck I don't even see where that mysterious 61 number is coming from).

A simple and reasonable solution, however, seems to be to rewrite the query as "family(ip) != 6" (the goal of the query is to eliminate any IPv6 addresses from getting returned).

Doing this results in an estimate that captures something closer to the expected size of the ifc_ip result set (on the high end!) and has the desired effect on the overall query.

ripit_2=# explain analyze select count(*) from ifc_ip where family(ip)!=6;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------
Aggregate (cost=299.00..299.01 rows=1 width=0) (actual
time=15.380..15.381 rows=1 loops=1)
-> Seq Scan on ifc_ip (cost=0.00..268.70 rows=12119 width=0) (actual
time=0.049..9.353 rows=8498 loops=1)
Filter: (family(ip) <> 6)
Total runtime: 15.432 ms
(4 rows)

The downside, of course, is that I've no idea what might cause the query planner to change its mind in future about this table or perhaps another table lurking in the query. So my sense is that it might be a more robust solution to add an index, but I'd like to understand better how the returned rows estimation is generated before I go adding more random indexes to tables which, after all, aren't free . . .

Friday, August 6, 2010

Presenting ... UW Service Tools

You might have noticed that we've changed the blog's name from UW Network Management Tools to UW Service Tools. The name change reflects reorganization within UW Information Technology. Our group has moved from Network Systems to the Technology Management division's Application Infrastructure Engineering unit--a move we're excited about.

For the most part, what we do on a daily basis remains the same: we'll keep making good software. As for what's new, our focus is widening. We'll be working on automating a wide range of technology services to support UW-IT and UW.

We'll be posting more news here about our new work soon.

Monday, June 14, 2010

Meerkat Framework: Presentation at SASAG ...

Meerkat Framework presentation to the Seattle Area System Administrators Guild (SASAG) on June 10, 2010

Friday, April 30, 2010

How Kittyfetch Was Named

In our regular team meeting on Friday April 30, we began discussing the etymology of "kittyfetch", one of our device backup tools. The tool's earlier incarnation was called RANCID (in our wiki it's almost always CAPITALIZED) but we weren't clear how kittyfetch became "kittyfetch."


I searched through my chat logs and found that the name dated from November 1, 2007. The transcript follows (emoticons removed):


vaughan: What do we call the new rancid thingy?

jtate: anything but rancid, please

hsherman: squirrel.

or maybe magpie

nick?

wells: ooo, squirrel is a great name!

hsherman: "animals who hoard"

vaughan: What are those birds that plunder campsites Clarence wonders?

Anyone?

benroy: camp robbers?

pack rats

hsherman: raccoons, except for the not being birds thing

jtate: type squirrel ten times in a row and then tell me it's a great name

vultures

amcharg: i would not like it if raccoons could fly

jtate: they do: pigeons

hsherman: it never needs typing

amcharg: true and i dislike pigeons

hsherman: just a cute picture.

this will actually have the side effect of limiting requests for enhancements to the system.

wells: squirrelsquirrelsquirrelsquirrelsquirrelsquirrelsquirrelsquirrelsquirrelsquirrel

are you kinding? It just *rolls* off the finger tips

nickchen

why do we need a new name?

config-backup is good for me

or coba

cobac

jtate: copacobana

vaughan: squirrels

yeah, that does type nice.

gincoba biloba

ok, config-backup it is.

wells: LOL

nickchen: maybe something like, a cat or something

kitty

kitty go get my backup

jtate: puma, meerkat, i see a trend

cat's don't follow commands well

cats

nickchen: kitty failed to backup

hsherman: meerkats aren't cats.

nickchen: http://upload.wikimedia.org/wikipedia/commons/3/3f/Toilet_Trained_Cat_22_Aug_2005.jpg

hsherman: maybe we could continue the non-cat cat theme with "catfish".

vaughan: LOL re: "kitty go get my backup"

[ vaughan is still laughing in his office box. ]

benroy: I think Mr. Bo Jangles is a cat.

wells: $ hai kitty does gets backup star-k12

- kitty can haz backup star-k12

- oh noez, kitty does haz not backup star-k12, sez: "port connection failed"

- kthnxbye

nickchen: lol

amcharg: lulz

nickchen: i shall change all the log messages to lolcats

wells: we should totally make the "hai kitty" command ...

amcharg: s/backup/cheezburger/

wells: haha


And so the transcript ends. Somehow, and the history books are unclear, from that day forward the tool was known as kittyfetch.


To an outsider, the above may not be funny. For us, it's hilarious--and here's why. At a basic level, you need to know about lolcats. Lolcats are an internet meme that may never die. They and their idiolect sneak into our team's everyday communication. How lolcats speak is, well, funny. Very funny. The fact lolcat-speak (translators exist) enters our work at all is evidence of our team's cohesion: we have in-jokes. To put it a basic way, lolcats aside: we laugh together.


Laughing together and working together can seem to some like two mutually exclusive things, but they're not. Chris Roberts and Wan Yan at the University of Missouri-Columbia found that, "humor among colleagues ... enhances creativity, department cohesiveness and overall performance." Laughter, or humor, in the workplace is a good thing. Our team collaborates well and humor emerges from that collaboration and energizes it.


Yet, it's become an adage: you can't make fun mandatory. And our fun isn't. So, how did our team come together in such a way that we find kittyfetch a hilarious name? No one knows for sure. But we're glad it happened.


For more, you can read Roberts and Yan's study: http://www.sciencedaily.com/releases/2007/10/071031130917.htm

Monday, April 19, 2010

Avaya PBX admin web service

At the UW we've been heavily invested in Avaya telecommunications products for nearly two decades. In particular, we've been invested in the DEFINITY platform, now called Communications Manager (CM). We started using CM when it was owned by AT&T. It was later acquired by Lucent and then separated to become Avaya.

CM is the cadillac of enterprise phone systems (PBX) with almost every feature imaginable. Like many enterprise vendors, they provide GUI applications written for Windows and they work fine--if they have the features that you want to use. Once you want to start integrating with other enterprise systems and automating tasks, you run into situations where the provided tools hinder work more than they help.

The primary interface to these PBX systems is with a terminal. Originally it was a serial terminal but now telnet and ssh are available. When you log in you select a "terminal type." Most people use the screen-based VT220 interface where you type commands and use tab and return to move around the screen to make changes. It's not fun to use those screens for scripting, but there's another undocumented terminal type called "OSSI" that's designed for scripting and is used by Avaya's own GUI tools.

In this post I'll describe the web service system I've built around the OSSI terminal connection. The end result is contained within a Perl module that comprises the heart of this system.

The system breaks down into three parts:
1. DEFINITY_ossi.pm
2. A separate PBXD server instance for each PBX.
3. A single web CGI interface that passes XML requests between the client applications and the correct PBXD instance.

DEFINITY_ossi.pm is the Perl module that knows how to connect to all of our PBX systems, process commands, and return data either formatted like the output of the VT220 screen or as a set of OSSI fields.

PBXD is a server based on the Net::Server::PreFork CPAN module that manages a pool of connections to a single PBX and provides an XML interface to DEFINITY_ossi.pm. There is 1 instance of PBXD running for each of our 3 PBX systems. Each PBXD instance has its own config file.

Finally the web interface is a Perl CGI script that knows about each PBXD instance and proxies the XML requests from clients to the specific PBXD instance for each PBX.

Here's a visual description of the system


Once it's setup and deployed you have a scalable platform for rapid development of web applications to manage all of your PBX systems. We've moved both batch and interactive application to the new system and it's been working well. I hope you've found this post interesting and that it will give you some ideas and code that can help manage you own PBX systems.

Thursday, March 11, 2010

REX: Database Driven Web Services Service

REX presentation given to the University of Washington Web Application Developers meeting:

Rex: Database Driven Web Services Service

Monday, February 1, 2010

KPI part 2: materializing views

In my first KPI post, I mentioned how performance became an issue and how we solved it by pre-generating the results of the summarized data we wanted to display. That was dandy ... once. But it was obvious we'd want to do something similar for other data already stored in our database. As a huge fan of consistency and repeatability, I didn't care for the idea of a mess of tables and another mess of cron jobs and scripts to keep those different tables up to date, each in their own custom way.

A little research turned up the concept of 'Materialized Views'. The idea is simple: generate a view and use it to create and maintain the contents of a static table. A wide variety of "refesh" techniques can be applied (depending on the importance of having current data and limiting impact on either inserts to the source tables or to the end user viewing the summary. Some databases implement this natively. PostgreSQL isn't one of them.

Some great discussions of the concept are already available, so I'll just highlight how we chose to implement it.

First, we created a schema in our 'ironfist' database to contain the views as well as the tables where the materialized content is stored (an MV table) along with a master table too keep track of them:

CREATE SCHEMA matview;
GRANT ALL ON SCHEMA matview TO PUBLIC;

CREATE TABLE matview.views (
mv_name NAME NOT NULL PRIMARY KEY
, v_name NAME NOT NULL
, last_refresh TIMESTAMP WITH TIME ZONE
);

For our application, stale data wasn't a concern. All of our capacity KPIs either show monthly data or show current data. Regenerating the data weekly (and even the performance hit of regenerating all the data each time) doesn't cost us much. So the set of functions we used to create a materialized view infrastructure is fairly simple.

Before starting, we test and create a view in the 'matview' schema then use the various matview functions to create and maintain the MV table.

This first function creates an MV table based on an existing view. We tell it the name (which is used to find the view and to create the corresponding table) and the columns we wish to use as the primary key (this helps with future in-place updates). For example:

select matview.create('kpi_network_switch_ports','sector,month');

The create function looks like:

CREATE OR REPLACE FUNCTION matview.create(name text,primary_cols text) RETURNS VOID AS
$$
DECLARE
entry matview.views%ROWTYPE;
view_name text:= 'matview.' || name || '_v';
matview_name text:= 'matview.' || name || '_mv';
BEGIN
SELECT * INTO entry FROM matview.views WHERE mv_name = matview_name;
IF FOUND THEN
RAISE EXCEPTION 'Materialized view ''%'' already exists.',
matview_name;
END IF;
EXECUTE 'REVOKE ALL ON ' || view_name || ' FROM PUBLIC';
EXECUTE 'GRANT SELECT ON ' || view_name || ' TO PUBLIC';
EXECUTE 'CREATE TABLE ' || matview_name || ' AS SELECT * FROM ' || view_name;
EXECUTE 'ALTER TABLE ' || matview_name || ' ADD PRIMARY KEY ' || '(' || primary_cols || ')';
EXECUTE 'REVOKE ALL ON ' || matview_name || ' FROM PUBLIC';
EXECUTE 'GRANT SELECT ON ' || matview_name || ' TO PUBLIC';
EXECUTE 'CREATE TRIGGER ' || name || '_trigger' ||
' BEFORE INSERT ON ' || matview_name ||
' FOR EACH ROW EXECUTE PROCEDURE insert_or_replace_if_duplicate ()';

INSERT INTO matview.views (mv_name, v_name, last_refresh)
VALUES (matview_name, view_name, CURRENT_TIMESTAMP);

RETURN;
END
$$
SECURITY DEFINER
LANGUAGE plpgsql;

The final insert populates the table with the current results of a SELECT * on the view. The table can subsequently be 'refreshed' with the matview.refresh function. This function refreshes either a single table by name or 'all' (all MVs listed in the matview.views table):

CREATE OR REPLACE FUNCTION matview.refresh(name text) RETURNS VOID AS
$$
DECLARE
query text;
mview matview.views%ROWTYPE;
matview_name text:= 'matview.' || name || '_mv';
BEGIN
IF (name = 'all')
THEN
query := 'SELECT * from matview.views';
ELSE
query := 'SELECT * FROM matview.views WHERE mv_name =' || quote_literal(matview_name);
END IF;

FOR mview in EXECUTE query LOOP
EXECUTE 'INSERT INTO ' || mview.mv_name || ' SELECT * FROM ' || mview.v_name;
UPDATE matview.views
SET last_refresh=CURRENT_TIMESTAMP WHERE mv_name=mview.mv_name;
END LOOP;

IF NOT FOUND THEN
RAISE EXCEPTION 'Materialized view % does not exist.', matview_name;
END IF;

RETURN;
END;
SECURITY DEFINER
LANGUAGE plpgsql;

In the initial MV table creation (in the matview.create function), you'll notice that we added a BEFORE INSERT trigger to the table. When we refresh the table, we update all rows in the table. Other refresh approaches involve placing triggers on data source tables (we wanted to avoid that, especially since some of our tables are stored on another database server).

We could just empty the table and completely regenerate it, but some of our views take several minutes to run. During that time our table would be empty of data. And if the view failed (if, say, the dblink connection to our network modeling database failed) the table would remain empty. Stale data is better than no data.

Finally, although we have a last refresh time that last refresh time doesn't guarantee that running the view again won't change existing values for existing rows in the table. For "total number of devices active on the network in January", that number will update on January 1st, then again on January 2nd, etc. The refresh time is useful for confirming how up to date our data is, but not useful for reducing the work needed to update it.

So instead, we regenerate all data in an MV, adding a row if no row with the same primary key values exists, but otherwise replacing (not updating!) the matching row with a new one using the following function as a trigger:

CREATE FUNCTION insert_or_replace_if_duplicate() RETURNS trigger
AS $$
/* Description: Trigger function to insert a new row, first deleting if it would violate the primary key

To use this function, create a trigger on the appropriate table as follows:

CREATE TRIGGER availability_summary_insert_trigger
BEFORE INSERT ON kpi.availability_summary
FOR EACH ROW EXECUTE PROCEDURE insert_or_replace_if_duplicate();

This before trigger checks to see if the insert would violate the primary key of the table.
If it does, it will delete the entire row before returning to the insert call. This is easier
than trying to figure out which fields to update.
*/
DECLARE
table_name text := TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME;
where_clause text := ' WHERE TRUE';
colname text;
BEGIN
FOR colname IN select column_name
from information_schema.key_column_usage
natural join information_schema.table_constraints itc
where itc.table_schema=quote_ident(TG_TABLE_SCHEMA)
and itc.table_name=quote_ident(TG_TABLE_NAME)
and itc.constraint_type='PRIMARY KEY'
LOOP
-- the syntax is hairy, but this casts the NEW row to the appropriate table and lets us pick
-- out the columns we want to compare.

where_clause := where_clause || ' AND ' || colname || '=' || '(' || quote_literal(new) || '::' || table_name || ').' || colname;
END LOOP;

-- now that we know what row we will conflict with, delete it before doing our insert.

EXECUTE 'DELETE FROM ' || table_name || where_clause;
RETURN NEW;
END
$$
LANGUAGE plpgsql;


This works because we are assured that updates will always generate data for all columns. The above function wouldn't be useful for cases where only some column data was provided for subsequent updates.

Finally, of course, we have a function that cleans up any MV tables if we choose to remove them:

CREATE OR REPLACE FUNCTION matview.drop(name text) RETURNS VOID
AS
$$
DECLARE
entry matview.views%ROWTYPE;
view_name text:= 'matview.' || name || '_v';
matview_name text:= 'matview.' || name || '_mv';
BEGIN
SELECT * INTO entry FROM matview.views WHERE mv_name = matview_name;

IF NOT FOUND THEN
RAISE EXCEPTION 'Materialized view % does not exist.', matview_name;
END IF;

EXECUTE 'DROP TABLE ' || matview_name;
DELETE FROM matview.views WHERE mv_name=matview_name;
RETURN;
END
$$
SECURITY DEFINER
LANGUAGE plpgsql;


The final post in this series will discuss some of what we did with the Google Visualization API to use and display the data from our Materialized Views.

Friday, January 22, 2010

PostgreSQL fun: cross-database joins

Before I discuss how we implemented Materialized Views, I should give a little background about our database configuration. We have two production PostgreSQL databases running on two different servers. The first database, "ripit_2" is our primary database, containing network modeling information (our devices, their interfaces, network contact people, and all associated attributes) and other critical support information (e.g. meerkat network events).

Non-critical, highly active data is stored on our second database, ironfist.

We've tried to keep our PostgreSQL installation as vanilla as possible. We rely on a fabulous and responsive team of five DBAs who support not only our PostgreSQL servers, but also many campus MS SQL, IBM DB2, IBM INFORMIX, ORACLE, and SYBASE servers! To reduce our demands on their limited time and to avoid locking ourselves into a particular database, we historically shied away from installing add-ons or depending on PostgreSQL specific features.

When the KPI project started, we knew that much of the information it generated would be stored on ironfist, such as the daily device and subnet availability numbers we planned to produce. And KPI would summarize information already available on ironfist, including arp cache information and switch mac address table information. Some information would come from the ripit_2 database, but we anticipated storing all the final summary results on ironfist.

The lack of cross-database joins became a headache almost immediately. PostgreSQL doesn't natively support cross-database joins, so in the past we did a mixture of replicating static data from ripit_2 into ironfist tables or performing some types of joins into client code (in java, Perl or Python). There are obvious downsides to both of these approaches, as we wanted to move to a model extensively based on web services using the rex SQL/XML mapper.

As an experiment we decided to install the dblink contrib module on ironfist. By doing this, we could have ironfist perform queries against ripit_2 and use the results to join against data in the ironfist database. We found a couple of helpful articles online. The Postgres OnLine Journal provided a great set of examples and highlighted some of the limitations. A typical query might look like:

SELECT *
FROM dblink('dbname=ripit_2 host=dbr port=5432 user=db_kpi password=testing',
'select make, model from system_id_types')
as p(make text, model text);

So we installed dblink and are now using it for the KPI project extensively.

The biggest issues we've found with dblink are:

performance: your query must take into account that the full data set from the remote server is going to be retrieved before any joins against local data can happen. Reduce that data set in your query on the remote server as much as possible.

authentication: either you need to be superuser or you need to have the remote db set up to require a password from the source db host/user and that password has to be included in the query. Yuck!

explicit structure declarations: a local ironfist query (or view, or function) that uses a dblink call must explicitly list the row output structure. Note the use of "(make text, model text)" in the above.

The only one of these that we addressed in a general way is the authorization issue. While the KPI project was, so far, the only consumer of dblink queries, we realized that other projects, with different ripit_2 authorization needs on the ironfist end might make use of dblink in future. So we installed dblink in its own schema (dblink) and added a couple of tables and helper functions. In addition we set up some authentication infrastructure between our two databases (and between the development and eval systems as well):


-- created a 'readonly' db group 'db_ro_group'
CREATE ROLE dblink_readonly nologin;
-- created a user dblink_kpi_or in the group
CREATE USER dblink_kpi_ro IN ROLE dblink_readonly password 'testing';

# for dblink access from ironfist database hosts (prod, eval and dev), require md5 passwords
host all +dblink_readonly 192.168.1.1 255.255.255.255 md5 #dblink from dbi
host all +dblink_readonly 192.168.1.2 255.255.255.255 md5 #dblink from dbieval
host all +dblink_readonly 192.168.1.3 255.255.255.255 md5 #dblink from dbidev

We wanted to ensure that functions would not have to hardcode passwords. Passwords change and having to replace some unknown number of functions, scattered throughout schemas in the ironfist database someday was a painful thought.

CREATE schema dblink;
GRANT USAGE ON schema dblink to public;

CREATE TABLE dblink.users (
id integer NOT NULL,
"user" character varying,
password character varying
);

CREATE OR REPLACE FUNCTION dblink.connect_ripit_2(text, text) returns text
as $SQL$
SELECT 'dbname=ripit_2 host=' || $2 || ' port=5432 user=' || $1 ||' password=' || password
FROM dblink.users
WHERE dblink.users.user = $1
$SQL$
LANGUAGE SQL
SECURITY DEFINER
STABLE;

Note the "SECURITY DEFINER" attribute of that function. We restricted access to the dblink.users table so that only user postgres can select from it. "SECURITY DEFINER" means that the function dblink.connect_ripit_2 can read the table regardless of who is executing that function. Someone can run the function, but won't be able to directly view the table of users and passwords.

Below I'll show one of the shorter KPI views we created which uses dblink. This view generates a histogram of active devices by type per month per sector for the past 60 months. The entire query ru ns on the ripit_2 database server (no data from ironfist is joined for this one), using the dev_state and dev_state_history tables. The raw data looks like:

sector | month | device_type | make | model | count
-------------+------------+-----------------+--------+--------------+-----------
uwcampus | 2010-01-01 | layer2 | smc | swsmc8612t | 2
uwcampus | 2010-01-01 | layer2 | smc | swsmc8612xl3 | 299
uwcampus | 2010-01-01 | layer2 | smc | swsmc8624t | 72
uwcampus | 2010-01-01 | router | cisco | 1811 | 2

And so on.

Here's a taste of what we can prese nt with the Google Visualization API from the above data: a stacked bar chart showing co unts of the di fferent device types by month:







A "SELECT *" on the following query takes almost 60 seconds! "Materializing" this view (by pre-generating its results and storing them in a table on ironfist) were critical to providing a responsive user interface.

CREATE OR REPLACE VIEW matview.kpi_network_devices_v AS
SELECT * FROM dblink.dblink(dblink.connect_ripit_2('dblink_kpi_ro'::text, 'dbr'::text),
$$ SELECT sector, month, device_type, make, model, system_type, count(device_id)
FROM ( SELECT CASE WHEN sector.sector = 'mcis'
THEN 'uwmedcenter'
WHEN sector.owner = 'uw'
THEN 'uwcampus'
ELSE sector.owner
END as sector,
intervals.month,
device_types.type as device_type,
system_id_types.id as system_type,
make,
model,
dev_state.device_id
FROM ( SELECT (date_trunc('month', now())::date - s.a * '1 mon'::interval)::date as month
FROM generate_series(0,60) as s(a)
) as intervals
JOIN ( SELECT device_id,system_id,installdate, null as modified FROM dev_state
UNION
SELECT device_id, system_id, installdate,modified FROM dev_state_history
WHERE dev_state_history.deleted = true
) AS dev_state
ON (dev_state.modified > intervals.month or dev_state.modified is NULL)
AND dev_state.installdate < system_id="system_id_types.id" system_type="device_types.id" device_id =" dev_state.device_id" id="dc.sector">
See our "dblink_connect_ripit_2" function which specifies the user we want to connect as and which database server we want to use? Note that we can use $$ for quoting here. My first query was crazy full of escaped quotes until I realized that $$ quoting worked just fine in views as well as functions.

Thursday, January 14, 2010

KPI part 1: database/XML mapping populates Key Performance Data charts

Recently we were asked to develop a page of Key Performance Indicator charts showing network availability and other network capacity statistics. We have a lot of historical data, scattered throughout flat files, RRD files and in our PostgreSQL database. In the past we've responded to similar requests by producing ad-hoc charts and tables, ending up with several web pages with disparate look and feel and levels of detail.

This time, we wanted to come up with a complete system that could be re-used; settling on a preferred (preferably javascript-based) flexible charting system, a consistent look and feel and a manageable way of storing the underlying data. Ideally some of those ad-hoc charts would even be migrated to this new system.

Just identifying how to calculate the data for the initial KPI was challenging (when you measure average network availability for a campus network, where do you measure? Do you weight the 'service points' equally? And what's the smallest meaningful time range to show?)

We determined that UW campus network availability would be measured by 'average uptime of unweighted subnet prefix' and that we would show average availability over week, quarter and year.

So, where do we get this data? We merge it from several of our production network management tools:

Database We use a PostgreSQL database to model our network. The Network Operations Center maintains tables of devices, interfaces, networks, contacts, events and all their related attributes, using a mixture of manual addition and automated device discovery systems.

Meerkat historical data Meerkat is our locally developed open source network monitoring system. It tracks all network events (writing them to the database) and generates alerts for the Network Operations Center. We have a companion web service which can provide availability information for devices or network prefixes from the detected network events.

Meerkat availability service The companion availability service reports raw availability for a given device or network prefix. Device availability is easy: if there's an event showing the device was unavailable in our database, its raw availability for the year is reduced by the length of the outage window. Determining network availability requires topology discovery and awareness of redundant paths. Only if outage events occur interrupting all paths from the monitoring system to the network will the raw availability of the network be reduced.

RT::Outages planned event tickets RT::Outages is locally written tool which interacts with the RT trouble ticket system. It provides a workbench out network operations staff can use to generate outage reports. The workbench allows selection of devices and interfaces involved in an outage and provides a date picker for consistent entry of outage window start and end times. Contact lists are automatically generated from the database device and network contact lists. And, essential for the KPI project, RT::Outages is also used to schedule planned events.

For the KPI project, some changes had to be made to enhance the Meerkat availability service. Availability isn't conventionally calculated on all network downtime; just on unplanned outages. The RT::Outages service has that information available and we use that to permit the Meerkat availability service to exclude those events. In addition, the service only provided an annual availability measurement. To get quarterly and weekly measurements we updated it to accept arbitrary time ranges.

Now that we knew where to get our raw information what would we do with it?

We started by writing a nightly tool to fetch the availability of each of the "service points" on our network (service points are network prefixes for the UW campus and UW Medical Centers networks, sites for the Washington K20 networks). To enable this, we made use of another locally developed open source initiative:

REX REX is a SQL/XML mapper that provides a framework for turning PostgreSQL database queries into XML documents available through a REST service. It leverages the XML functions available in PostgreSQL. In addition to fetching information from the database this way, we can also update the database.

The nightly process, for example, is a Perl program that connects to a REX-based REST service for the list of service points, processes each service point by sending a request to the Meerkat availability service for the availability for the previous day, then uses XML::Simple to generate an XML document to post back to a final service which updates the database with all the service point measurements. Given a hash of measurements:

$VAR1 = {
'measurement' => [
{
'owner' => 'uwcampus',
'availability' => '100.0',
'includes_planned_events' => 'false',
'service_point' => '128.95.138.0/24',
'datetime' => '2010-01-12'
}
]
};

We update the database with the following XML document:

<measurements>
<measurement availability="100.0" datetime="2010-01-12" includes_planned_events="false" owner="uwcampus" service_point="128.95.138.0/24" />
</measurements>

using the following code, using Perl, XML::Simple and LWP::UserAgent::Determined:

post_measurements( { ua => $ua,
post_document => XMLout ($measurements, RootName=>'measurements')
} );

sub post_measurements {
my ($param_ref) = @_;
my $post_document = $param_ref->{post_document} || croak 'No document to post';

my $ua = $param_ref->{ua} || LWP::UserAgent::Determined->new;
my $url = "https://$options{ws}/rex/xml/kpi_availability/ADD_DATA";
my $request = HTTP::Request->new('POST', $url);

$request->content_type('application/xml');
$request->content($post_document);
my $response = $ua->request($request);
}

Typically we send multiple measurements at a time. This produces a table full of measurements, one per day per service point.

When we display them, however, we want to summarize the daily service_point measurements to weekly, quarterly and annual measurements for an entire collection of service points (e.g. the service points in the UW campus network or in the K20 statewide network). For our prototype, we intended for the front end to process XML, so we wrote a PL/pgSQL function (see end of post) to generate an XML document. The REX-based service supplying data to the front end was configured to run this on demand. For more information on using the xml functions in PostgreSQL, see http://www.postgresql.org/docs/8.3/interactive/functions-xml.html

We soon found that running this on demand was much too slow (seven seconds!) and that the pchart system we initially used didn't provide all the features we wanted and locked us in to PHP.

Stay tuned; in my next posts I'll outline how we implemented Materialized Views to rescue us from performance problems like those experienced above and how we implemented our final Javascript-based front-end using Google Charts via the Google Visualization API.

CREATE FUNCTION rex_fn.kpi_availability_read(owner_selected text, year_selected text,
include_planned_events boolean)
RETURNS xml
AS $$
BEGIN
/* FUNCTION: rex_fn.kpi_availability_read()
Description: Summarizes weekly, quarterly and yearly availability measurements
Affects: Makes no changes to the data
Arguments: name of a supported owner, a year and whether or not the measurements should include or
exclude planned event windows
Returns: xml document like:

<availability>
<yearlyavailability>
<plannedincluded>true</plannedincluded>
<owner>
<name>k20</name>
<year>
<number>2009</number>
<averageavailability>99.8159518537013916</averageavailability>
<quarter>
<number>1</number>
<averageavailability>99.8047910271571390</averageavailability>
<complete>true</complete>
<week>
<number>01</number>
<averageavailability>99.6165324837199837</averageavailability>
<start_date>1230796800</start_date>
<end_date>1231315200</end_date>
<complete>true</complete>
</week>
</quarter>
</year>
</owner>
<yearlyavailability>
</availability>
*/
BEGIN
RETURN xmlelement (name availability,
xmlagg(availability_frag)
)
FROM ( SELECT xmlelement (name yearlyAvailability,
xmlconcat (xmlelement (name plannedIncluded, include_planned_events),
xmlagg (xmlelement (name owner, owner_frag))
)
) AS availability_frag
FROM ( SELECT xmlconcat(xmlelement (name name, owner),
xmlagg(xmlelement (name year, year_frag))
) AS owner_frag
FROM ( SELECT owner, year, xmlconcat (xmlelement (name number, year),
xmlagg (quarter_frag)
) AS year_frag
-- Clause to produce the "yearly availability"
FROM ( SELECT owner,
to_char(datetime,'YYYY') AS year,
xmlelement ( name averageAvailability,avg(availability) ) AS quarter_frag
FROM kpi.availability
WHERE owner = owner_selected
AND includes_planned_events = include_planned_events
AND to_char(datetime,'YYYY') = year_selected
GROUP BY owner, year
UNION ALL
-- Call helper function to produce the xml for the "quarterly availability"
SELECT *
FROM rex_fn.kpi_availability_read_quarter (owner_selected,
year_selected,
include_planned_events)
) AS sub4 GROUP BY owner, year
) AS sub3 GROUP BY owner
) AS sub2
) AS sub1;
END;
$$
LANGUAGE plpgsql;

CREATE FUNCTION rex_fn.kpi_availability_read_quarter(owner_selected text, year_selected text,
include_planned_events boolean)
RETURNS SETOF kpi_availability_quarter_record_type
AS $$
/* FUNCTION: rex_fn.kpi_availability_read_quarter()
Description: helper function for rex_fn.kpi_availability_read -- generates quarterly xml subtree

Affects: Makes no changes to the data
Arguments: name of a supported owner, a year and whether or not the measurements should include or
exclude planned event windows
Returns: set of xml documents
*/
BEGIN
RETURN QUERY SELECT owner, year, xmlelement(name quarter, xmlconcat (xmlelement (name number, quarter),
xmlagg (week_frag)
)
) AS quarter_frag
-- Clause to produce the "quarterly availability"
FROM ( SELECT owner, to_char(datetime,'YYYY') AS year, to_char(datetime, 'Q') as quarter,
xmlconcat (xmlelement ( name averageAvailability,avg(availability) ),
xmlelement ( name complete,
CASE WHEN to_char(max(datetime),'Q') != to_char( max(datetime) + interval '1 day','Q')
THEN TRUE
ELSE FALSE END
)
) AS week_frag
FROM kpi.availability
WHERE owner = owner_selected
AND includes_planned_events = include_planned_events
AND to_char(datetime,'YYYY') = year_selected
GROUP BY owner, year, quarter
UNION ALL
-- Call helper function to produce the xml for the "weekly availability"
SELECT * FROM rex_fn.kpi_availability_read_week (owner_selected,
year_selected,
include_planned_events)
) AS sub GROUP BY owner, year, quarter ORDER BY year, quarter ASC;
RETURN;
END;
$$
LANGUAGE plpgsql;

CREATE FUNCTION kpi_availability_read_week(owner_selected text, year_selected text, include_planned_events boolean)
RETURNS SETOF kpi_availability_week_record_type
AS $$
/* FUNCTION: rex_fn.kpi_availability_read_week()
Description: helper function for rex_fn.kpi_availability_read -- generates weekly xml subtree
Affects: Makes no changes to the data
Arguments: name of a supported owner, a year and whether or not the measurements should include or
exclude planned event windows
Returns: set of xml documents
*/
BEGIN
RETURN QUERY SELECT owner,
year,
quarter,
xmlelement(name week, xmlconcat (xmlelement (name number, week),
xmlagg ( xmlelement (name averageAvailability, averageAvailability) ),
xmlagg ( xmlelement (name start_date, extract (epoch from start_date)) ),
xmlagg ( xmlelement (name end_date, extract (epoch from end_date)) ),
xmlagg ( xmlelement (name complete,
CASE WHEN start_date + interval '1 week' = end_date + interval '1 day'
THEN TRUE
ELSE FALSE END)
) -- xml element week
) -- xmlconcat
) AS week_frag
FROM ( SELECT owner AS owner,
to_char(week_start_date(datetime,'Mon'),'YYYY') AS year,
to_char(week_start_date(datetime,'Mon'), 'Q') AS quarter,
to_char(week_start_date(datetime,'Mon'),'WW') AS week,
min(datetime) AS start_date,
max(datetime) AS end_date,
avg(availability) AS averageAvailability
FROM kpi.availability
WHERE owner = owner_selected
AND includes_planned_events = include_planned_events
AND to_char(week_start_date(datetime,'Mon'),'YYYY') = year_selected
GROUP BY owner, year, quarter, week
) AS sub
GROUP BY owner, year, quarter, week
ORDER BY quarter, week ASC;
RETURN;
END;
$$
LANGUAGE plpgsql;