Is there a database nerd in the house?

xzzy · January 2008

Got this project I've been mulling over in my head since the summer and can't come to a conclusion whether it's a stupid idea or not. I've got a system working now but am not happy with the way it's scaling. There's got to be a better way.

Basically the problem is this: I have 3000+ machines I need to be able reinstall by clicking a button. The reinstall part is pretty simple, and is not the source of difficulty. The problem is that these 3000 machines are not all used in the same way, and by extension, need slightly different configuration input when being installed (for the curious, this is done using redhat kickstart installs).

The bits of information are pretty standard; an IP address, a hard disk partition, which OS version to use, some software to install, and maybe a few commands that get run to customize something on the node. Right now, every group of machines that install identically are put into a "cluster" that has the required bits of information associated with it. If we get a new batch of machines or another group of machines are used differently, a new cluster is defined.

The problems I'm seeing are when machines with the same end use have different hardware specifications (for example, old nodes have 250GB drives, new ones have 500GB. So you need a new partition table). Or maybe we need a machine that is 99% identical to nodes in a group, but needs a super special command run on the install. Under the current setup I have, this requires making a new "cluster" that only has one machine in it. Experience has shown that such a setup quickly grows confusing, leaving people unsure which cluster to place a new node in (which more often than not, results in the creation of a new cluster).

What I've been thinking of is some kind of inheritance system. Keep the top level cluster definition, but allow making sub clusters under it, that would permit overriding values and/or inherit values from the parent. This seems like it could be a good idea. It could provide a decent tree-like system that allows for a maximum re-use of data, permitting consistency where needed and customization where needed.

But short of trying it and seeing what happens, I can't decide if it will actually improve the situation. What I'm trying to avoid is the type of situation described here:

http://thedailywtf.com/Articles/Soft_Coding.aspx

The situation isn't identical to mine, but potentially suffers from the same pitfalls: by trying to make software more flexible, you end up making it more difficult to maintain.

Does anyone have any experience with this sort of problem, and perhaps have some suggestions?

Janin · January 2008

This is a problem similar to user permissions, and I think a good solution would be to duplicate the idea of "roles". In short, you allow a cluser to contain sub-clusters, and then merge all the sub-clusters plus any specific configuration to create the configuration.

For example, a configuration might look like:

Configuration "Sally the accoutant"
=====================

Clusters
=====
* Base Red Hat (Might be common to all your configurations)
* 500 GB drive (contains all details for terminals with 500 GB drives)
* Accountant (might want different set of programs here)

Configuration parameters
===============
* Theme: High Contrast (Sally has bad eyesight)
* Mouse cursor: Extra Large

Also, you might be interested in Sabayon for managing GConf keys.

xzzy · January 2008

That's something I've considered, but the problem is resistant to defining "baselines".

If everything is subject to being changed at any time, allowing subclusters opens the door to clusters within clusters within clusters.. so on into infinity when the system becomes a mess.

For example, if I had 100 machines all bought at the same time that, initially, get installed the same way. That is, OS 5, 500GB disks, and an X window manager. At this point it's easy.. the baseline is the RedHat release, everyone gets the same partition table and we define subgroups for using KDE or Gnome. Forward a year when a new release comes out (OS 6 or whatever) and say only half the users want to upgrade. Or what if some percentage of users get new machines with larger hard drives, but they want the OS and window manager to be identical to their old machine.

Basically the end result is, eventually, every machine becomes it's very own subcluster and shares nothing with any other machine. The only way to avoid this is force the baseline on users.. and I guarantee you that's not something I have the authority to do.

This seems like a fairly generic problem that could apply to a number of situations, not just installing systems. Someone brilliant out there has to have solved this, and I just can't find them.

Pheezer · January 2008

Table of computers

Table of configuration options

Table of computers-config options

or table 1, 2 and 3 in that order.

Table 1 stores all of your PC data like where it is and its name, and has a unique numeric PC code attached to each.
Table 2 stores all of your config options, like which os, the partition sizes, software installed, whatever, and has a unique numeric option code attached to each.
Table 3 can have an infinite number of unique table1-table2 links. It just lists in each row a PC ID and a config option ID.

Then when you need to know info about a PC you do a query that joins table 3 to table 1 and table 2, and you pull over all of the fields you need. You can code the query into your app to auto-insert the PC ID from a list into the base query and then spit out the results in whatever format you use.

EDIT:
You're basically just normalizing the database. It's not soft coding, it's just how you design this sort of thing when you have an endless list of properties that you may or may not need or want to link to every record in a different table. You can even include tech specs in the config options table to allow for those to be easily updated on each record while still making it simple for queries to do mass adds and removes and changes.

Penny Arcade

Quick Links

Is there a database nerd in the house?

Posts