Friday, January 22, 2010

PostgreSQL fun: cross-database joins

Before I discuss how we implemented Materialized Views, I should give a little background about our database configuration. We have two production PostgreSQL databases running on two different servers. The first database, "ripit_2" is our primary database, containing network modeling information (our devices, their interfaces, network contact people, and all associated attributes) and other critical support information (e.g. meerkat network events).

Non-critical, highly active data is stored on our second database, ironfist.

We've tried to keep our PostgreSQL installation as vanilla as possible. We rely on a fabulous and responsive team of five DBAs who support not only our PostgreSQL servers, but also many campus MS SQL, IBM DB2, IBM INFORMIX, ORACLE, and SYBASE servers! To reduce our demands on their limited time and to avoid locking ourselves into a particular database, we historically shied away from installing add-ons or depending on PostgreSQL specific features.

When the KPI project started, we knew that much of the information it generated would be stored on ironfist, such as the daily device and subnet availability numbers we planned to produce. And KPI would summarize information already available on ironfist, including arp cache information and switch mac address table information. Some information would come from the ripit_2 database, but we anticipated storing all the final summary results on ironfist.

The lack of cross-database joins became a headache almost immediately. PostgreSQL doesn't natively support cross-database joins, so in the past we did a mixture of replicating static data from ripit_2 into ironfist tables or performing some types of joins into client code (in java, Perl or Python). There are obvious downsides to both of these approaches, as we wanted to move to a model extensively based on web services using the rex SQL/XML mapper.

As an experiment we decided to install the dblink contrib module on ironfist. By doing this, we could have ironfist perform queries against ripit_2 and use the results to join against data in the ironfist database. We found a couple of helpful articles online. The Postgres OnLine Journal provided a great set of examples and highlighted some of the limitations. A typical query might look like:

FROM dblink('dbname=ripit_2 host=dbr port=5432 user=db_kpi password=testing',
'select make, model from system_id_types')
as p(make text, model text);

So we installed dblink and are now using it for the KPI project extensively.

The biggest issues we've found with dblink are:

performance: your query must take into account that the full data set from the remote server is going to be retrieved before any joins against local data can happen. Reduce that data set in your query on the remote server as much as possible.

authentication: either you need to be superuser or you need to have the remote db set up to require a password from the source db host/user and that password has to be included in the query. Yuck!

explicit structure declarations: a local ironfist query (or view, or function) that uses a dblink call must explicitly list the row output structure. Note the use of "(make text, model text)" in the above.

The only one of these that we addressed in a general way is the authorization issue. While the KPI project was, so far, the only consumer of dblink queries, we realized that other projects, with different ripit_2 authorization needs on the ironfist end might make use of dblink in future. So we installed dblink in its own schema (dblink) and added a couple of tables and helper functions. In addition we set up some authentication infrastructure between our two databases (and between the development and eval systems as well):

-- created a 'readonly' db group 'db_ro_group'
CREATE ROLE dblink_readonly nologin;
-- created a user dblink_kpi_or in the group
CREATE USER dblink_kpi_ro IN ROLE dblink_readonly password 'testing';

# for dblink access from ironfist database hosts (prod, eval and dev), require md5 passwords
host all +dblink_readonly md5 #dblink from dbi
host all +dblink_readonly md5 #dblink from dbieval
host all +dblink_readonly md5 #dblink from dbidev

We wanted to ensure that functions would not have to hardcode passwords. Passwords change and having to replace some unknown number of functions, scattered throughout schemas in the ironfist database someday was a painful thought.

CREATE schema dblink;
GRANT USAGE ON schema dblink to public;

CREATE TABLE dblink.users (
id integer NOT NULL,
"user" character varying,
password character varying

CREATE OR REPLACE FUNCTION dblink.connect_ripit_2(text, text) returns text
as $SQL$
SELECT 'dbname=ripit_2 host=' || $2 || ' port=5432 user=' || $1 ||' password=' || password
FROM dblink.users
WHERE dblink.users.user = $1

Note the "SECURITY DEFINER" attribute of that function. We restricted access to the dblink.users table so that only user postgres can select from it. "SECURITY DEFINER" means that the function dblink.connect_ripit_2 can read the table regardless of who is executing that function. Someone can run the function, but won't be able to directly view the table of users and passwords.

Below I'll show one of the shorter KPI views we created which uses dblink. This view generates a histogram of active devices by type per month per sector for the past 60 months. The entire query ru ns on the ripit_2 database server (no data from ironfist is joined for this one), using the dev_state and dev_state_history tables. The raw data looks like:

sector | month | device_type | make | model | count
uwcampus | 2010-01-01 | layer2 | smc | swsmc8612t | 2
uwcampus | 2010-01-01 | layer2 | smc | swsmc8612xl3 | 299
uwcampus | 2010-01-01 | layer2 | smc | swsmc8624t | 72
uwcampus | 2010-01-01 | router | cisco | 1811 | 2

And so on.

Here's a taste of what we can prese nt with the Google Visualization API from the above data: a stacked bar chart showing co unts of the di fferent device types by month:

A "SELECT *" on the following query takes almost 60 seconds! "Materializing" this view (by pre-generating its results and storing them in a table on ironfist) were critical to providing a responsive user interface.

CREATE OR REPLACE VIEW matview.kpi_network_devices_v AS
SELECT * FROM dblink.dblink(dblink.connect_ripit_2('dblink_kpi_ro'::text, 'dbr'::text),
$$ SELECT sector, month, device_type, make, model, system_type, count(device_id)
FROM ( SELECT CASE WHEN sector.sector = 'mcis'
THEN 'uwmedcenter'
WHEN sector.owner = 'uw'
THEN 'uwcampus'
ELSE sector.owner
END as sector,
device_types.type as device_type, as system_type,
FROM ( SELECT (date_trunc('month', now())::date - s.a * '1 mon'::interval)::date as month
FROM generate_series(0,60) as s(a)
) as intervals
JOIN ( SELECT device_id,system_id,installdate, null as modified FROM dev_state
SELECT device_id, system_id, installdate,modified FROM dev_state_history
WHERE dev_state_history.deleted = true
) AS dev_state
ON (dev_state.modified > intervals.month or dev_state.modified is NULL)
AND dev_state.installdate < system_id="" system_type="" device_id =" dev_state.device_id" id="dc.sector">
See our "dblink_connect_ripit_2" function which specifies the user we want to connect as and which database server we want to use? Note that we can use $$ for quoting here. My first query was crazy full of escaped quotes until I realized that $$ quoting worked just fine in views as well as functions.

Thursday, January 14, 2010

KPI part 1: database/XML mapping populates Key Performance Data charts

Recently we were asked to develop a page of Key Performance Indicator charts showing network availability and other network capacity statistics. We have a lot of historical data, scattered throughout flat files, RRD files and in our PostgreSQL database. In the past we've responded to similar requests by producing ad-hoc charts and tables, ending up with several web pages with disparate look and feel and levels of detail.

This time, we wanted to come up with a complete system that could be re-used; settling on a preferred (preferably javascript-based) flexible charting system, a consistent look and feel and a manageable way of storing the underlying data. Ideally some of those ad-hoc charts would even be migrated to this new system.

Just identifying how to calculate the data for the initial KPI was challenging (when you measure average network availability for a campus network, where do you measure? Do you weight the 'service points' equally? And what's the smallest meaningful time range to show?)

We determined that UW campus network availability would be measured by 'average uptime of unweighted subnet prefix' and that we would show average availability over week, quarter and year.

So, where do we get this data? We merge it from several of our production network management tools:

Database We use a PostgreSQL database to model our network. The Network Operations Center maintains tables of devices, interfaces, networks, contacts, events and all their related attributes, using a mixture of manual addition and automated device discovery systems.

Meerkat historical data Meerkat is our locally developed open source network monitoring system. It tracks all network events (writing them to the database) and generates alerts for the Network Operations Center. We have a companion web service which can provide availability information for devices or network prefixes from the detected network events.

Meerkat availability service The companion availability service reports raw availability for a given device or network prefix. Device availability is easy: if there's an event showing the device was unavailable in our database, its raw availability for the year is reduced by the length of the outage window. Determining network availability requires topology discovery and awareness of redundant paths. Only if outage events occur interrupting all paths from the monitoring system to the network will the raw availability of the network be reduced.

RT::Outages planned event tickets RT::Outages is locally written tool which interacts with the RT trouble ticket system. It provides a workbench out network operations staff can use to generate outage reports. The workbench allows selection of devices and interfaces involved in an outage and provides a date picker for consistent entry of outage window start and end times. Contact lists are automatically generated from the database device and network contact lists. And, essential for the KPI project, RT::Outages is also used to schedule planned events.

For the KPI project, some changes had to be made to enhance the Meerkat availability service. Availability isn't conventionally calculated on all network downtime; just on unplanned outages. The RT::Outages service has that information available and we use that to permit the Meerkat availability service to exclude those events. In addition, the service only provided an annual availability measurement. To get quarterly and weekly measurements we updated it to accept arbitrary time ranges.

Now that we knew where to get our raw information what would we do with it?

We started by writing a nightly tool to fetch the availability of each of the "service points" on our network (service points are network prefixes for the UW campus and UW Medical Centers networks, sites for the Washington K20 networks). To enable this, we made use of another locally developed open source initiative:

REX REX is a SQL/XML mapper that provides a framework for turning PostgreSQL database queries into XML documents available through a REST service. It leverages the XML functions available in PostgreSQL. In addition to fetching information from the database this way, we can also update the database.

The nightly process, for example, is a Perl program that connects to a REX-based REST service for the list of service points, processes each service point by sending a request to the Meerkat availability service for the availability for the previous day, then uses XML::Simple to generate an XML document to post back to a final service which updates the database with all the service point measurements. Given a hash of measurements:

$VAR1 = {
'measurement' => [
'owner' => 'uwcampus',
'availability' => '100.0',
'includes_planned_events' => 'false',
'service_point' => '',
'datetime' => '2010-01-12'

We update the database with the following XML document:

<measurement availability="100.0" datetime="2010-01-12" includes_planned_events="false" owner="uwcampus" service_point="" />

using the following code, using Perl, XML::Simple and LWP::UserAgent::Determined:

post_measurements( { ua => $ua,
post_document => XMLout ($measurements, RootName=>'measurements')
} );

sub post_measurements {
my ($param_ref) = @_;
my $post_document = $param_ref->{post_document} || croak 'No document to post';

my $ua = $param_ref->{ua} || LWP::UserAgent::Determined->new;
my $url = "https://$options{ws}/rex/xml/kpi_availability/ADD_DATA";
my $request = HTTP::Request->new('POST', $url);

my $response = $ua->request($request);

Typically we send multiple measurements at a time. This produces a table full of measurements, one per day per service point.

When we display them, however, we want to summarize the daily service_point measurements to weekly, quarterly and annual measurements for an entire collection of service points (e.g. the service points in the UW campus network or in the K20 statewide network). For our prototype, we intended for the front end to process XML, so we wrote a PL/pgSQL function (see end of post) to generate an XML document. The REX-based service supplying data to the front end was configured to run this on demand. For more information on using the xml functions in PostgreSQL, see

We soon found that running this on demand was much too slow (seven seconds!) and that the pchart system we initially used didn't provide all the features we wanted and locked us in to PHP.

Stay tuned; in my next posts I'll outline how we implemented Materialized Views to rescue us from performance problems like those experienced above and how we implemented our final Javascript-based front-end using Google Charts via the Google Visualization API.

CREATE FUNCTION rex_fn.kpi_availability_read(owner_selected text, year_selected text,
include_planned_events boolean)
AS $$
/* FUNCTION: rex_fn.kpi_availability_read()
Description: Summarizes weekly, quarterly and yearly availability measurements
Affects: Makes no changes to the data
Arguments: name of a supported owner, a year and whether or not the measurements should include or
exclude planned event windows
Returns: xml document like:

RETURN xmlelement (name availability,
FROM ( SELECT xmlelement (name yearlyAvailability,
xmlconcat (xmlelement (name plannedIncluded, include_planned_events),
xmlagg (xmlelement (name owner, owner_frag))
) AS availability_frag
FROM ( SELECT xmlconcat(xmlelement (name name, owner),
xmlagg(xmlelement (name year, year_frag))
) AS owner_frag
FROM ( SELECT owner, year, xmlconcat (xmlelement (name number, year),
xmlagg (quarter_frag)
) AS year_frag
-- Clause to produce the "yearly availability"
FROM ( SELECT owner,
to_char(datetime,'YYYY') AS year,
xmlelement ( name averageAvailability,avg(availability) ) AS quarter_frag
FROM kpi.availability
WHERE owner = owner_selected
AND includes_planned_events = include_planned_events
AND to_char(datetime,'YYYY') = year_selected
GROUP BY owner, year
-- Call helper function to produce the xml for the "quarterly availability"
FROM rex_fn.kpi_availability_read_quarter (owner_selected,
) AS sub4 GROUP BY owner, year
) AS sub3 GROUP BY owner
) AS sub2
) AS sub1;
LANGUAGE plpgsql;

CREATE FUNCTION rex_fn.kpi_availability_read_quarter(owner_selected text, year_selected text,
include_planned_events boolean)
RETURNS SETOF kpi_availability_quarter_record_type
AS $$
/* FUNCTION: rex_fn.kpi_availability_read_quarter()
Description: helper function for rex_fn.kpi_availability_read -- generates quarterly xml subtree

Affects: Makes no changes to the data
Arguments: name of a supported owner, a year and whether or not the measurements should include or
exclude planned event windows
Returns: set of xml documents
RETURN QUERY SELECT owner, year, xmlelement(name quarter, xmlconcat (xmlelement (name number, quarter),
xmlagg (week_frag)
) AS quarter_frag
-- Clause to produce the "quarterly availability"
FROM ( SELECT owner, to_char(datetime,'YYYY') AS year, to_char(datetime, 'Q') as quarter,
xmlconcat (xmlelement ( name averageAvailability,avg(availability) ),
xmlelement ( name complete,
CASE WHEN to_char(max(datetime),'Q') != to_char( max(datetime) + interval '1 day','Q')
) AS week_frag
FROM kpi.availability
WHERE owner = owner_selected
AND includes_planned_events = include_planned_events
AND to_char(datetime,'YYYY') = year_selected
GROUP BY owner, year, quarter
-- Call helper function to produce the xml for the "weekly availability"
SELECT * FROM rex_fn.kpi_availability_read_week (owner_selected,
) AS sub GROUP BY owner, year, quarter ORDER BY year, quarter ASC;
LANGUAGE plpgsql;

CREATE FUNCTION kpi_availability_read_week(owner_selected text, year_selected text, include_planned_events boolean)
RETURNS SETOF kpi_availability_week_record_type
AS $$
/* FUNCTION: rex_fn.kpi_availability_read_week()
Description: helper function for rex_fn.kpi_availability_read -- generates weekly xml subtree
Affects: Makes no changes to the data
Arguments: name of a supported owner, a year and whether or not the measurements should include or
exclude planned event windows
Returns: set of xml documents
xmlelement(name week, xmlconcat (xmlelement (name number, week),
xmlagg ( xmlelement (name averageAvailability, averageAvailability) ),
xmlagg ( xmlelement (name start_date, extract (epoch from start_date)) ),
xmlagg ( xmlelement (name end_date, extract (epoch from end_date)) ),
xmlagg ( xmlelement (name complete,
CASE WHEN start_date + interval '1 week' = end_date + interval '1 day'
) -- xml element week
) -- xmlconcat
) AS week_frag
FROM ( SELECT owner AS owner,
to_char(week_start_date(datetime,'Mon'),'YYYY') AS year,
to_char(week_start_date(datetime,'Mon'), 'Q') AS quarter,
to_char(week_start_date(datetime,'Mon'),'WW') AS week,
min(datetime) AS start_date,
max(datetime) AS end_date,
avg(availability) AS averageAvailability
FROM kpi.availability
WHERE owner = owner_selected
AND includes_planned_events = include_planned_events
AND to_char(week_start_date(datetime,'Mon'),'YYYY') = year_selected
GROUP BY owner, year, quarter, week
) AS sub
GROUP BY owner, year, quarter, week
ORDER BY quarter, week ASC;
LANGUAGE plpgsql;