If you already have a plan in place for disaster or emergency response of any kind (e.g., fire, earthquake, electrical problems), you're probably not going to have to change it significantly to meet your security needs. If you don't have such a plan already, you can probably use your security incident response plan with only minor modifications for most emergencies.
Your incident response plan need not be an elaborate document, but you need to have something, even if it's only an email message that records and confirms the details you've all worked out over lunch at the local sushi bar. You'll be better off than many sites even if you do nothing more than think about the issues and discuss them with the relevant people.
What's in your plan?
The response plan is primarily concerned with two issues: authority and communication. For each part of the incident response, the plan should say who's in charge and who they're supposed to talk to. Although you'll specify a few steps people will take, incidents vary so much that the response plan mostly specifies who's going to make decisions, and who they're going to contact after they've decided -- not what they're going to decide. This section summarizes the different parts of a response plan.
The two cases you really want to plan for are these:
In the second case, it's going to be embarrassing and expensive if you disconnect the network and get five people out of bed, all to prevent somebody from doing the work they're paid to do.
Either way, it's not a decision you probably want made by a night operator, or by a user acting alone because he or she can't figure out how to call somebody who knows how to tell a real incident from a false alarm.
At a small site, you might want to simply post a number that users can call to get help outside of office hours (for instance, a pager number). Users might be encouraged to shut down personal machines if they suspect an attack and know how to shut the machine down gracefully. You want to be very cautious about this, however, because an ungraceful shutdown, particularly of a multi-user machine, may be more damaging than an intruder.
At a larger site, one that has on-site support after hours, you should instruct the on-site support people to call a senior person if they see a possible security incident. They should be told explicitly not to do anything more than that unless circumstances are extreme, but to keep trying to contact senior personnel until they get somebody who can take a look at what's going on.
Teamwork is great, but emergencies call for leadership. You don't want to have everybody doing their own thing and nobody in charge, and you certainly can't afford to stand around arguing about it. If your senior technical person is absent, do you want someone less senior but more technical to do the evaluation, or do you want someone more senior but less technical? How much time are you going to spend searching for the senior technical person when you have an emergency to deal with, before proceeding to your next candidate for the hot seat?
At a small site, you may not have a lot of options; if only one person has the skills necessary to do something about an attack, your policy will simply list that person as the one in charge in case of a security incident. If that person is unavailable, authority should go to somebody levelheaded and calm who can take stopgap actions and arrange for assistance (for example, from a relevant response team). In this situation, technical skills would be nice, but resourcefulness and calm are more important.
At a larger site, probably more than one person could be in charge. Your plan may want to say that the most senior will be in charge by default or that whoever is specified as being on call will be in charge. Either way, the plan should state that if the default person in charge is unavailable, the first of the other possible people to respond is in charge. Specifying what order they're going to be contacted in is probably overkill; let whoever is trying to reach these people use his or her knowledge of the situation. If none of those people are available, you'll usually want to work up the organizational hierarchy rather than down. (A manager, particularly a technical one, is probably better equipped to cope than an operator.)
In a small organization, you will pick your fallback candidates by name. In a large one, you will usually specify fallbacks by job title. If job title is your criterion, it's important to base your decision on the characteristics of the job, not of the person currently in it. Don't write into your plan that the janitor should decide, on the theory that the current janitor also is the most sensible and technical of those who aren't system administrators. The next janitor might be an airhead with a mop.
If you are at a site with multiple computer facilities, do you want to take the entire site off the Internet if one facility has been compromised, or is it better (or even possible) to take just that facility off the Internet?
At most sites, the reasonable plan is to disconnect the site as a whole from the network as soon as you know for sure that you have an intruder connected to your systems. You may have a myriad of internal connections, with a triply redundant, diversely cabled, UPS-protected routing mesh, which can make "disconnecting" a daunting prospect (the system keeps "fixing" itself). On the other hand, you probably have only one (or a small handful) of connections to the outside world, which can be more easily severed.
Your plan needs to say how to disconnect the network, and how the machines should be shut down. Be very careful about this. You do not want to tell people to respond to a mildly suspicious act by hitting the circuit breakers and powering off every machine in the machine room. On the other hand, if an intruder is currently removing all the files on the machine, you don't want them to give that intruder a 15-minute warning for a graceful shutdown.
This is one case in which you need clear, security-specific instructions in your plan. Here's what we recommend you do:
Vendors and service providers
People at other sites
If many people must be notified, you may wish to use a phone tree or an alert tree. In such a tree, shown in Figure 27-2, each person notifies two or three other people; it is a geometric progression, so a large number of people can be rapidly notified with relatively little work to any one person. Everybody should have a copy of the entire tree, so that if people are unavailable, their calls can be taken over by someone else (usually the person above them on the tree). It's best to set it up so that as many calls as possible are toll-free, and so that people are notifying other people they know relatively well (which increases their chances of knowing how to get through). There's no need for an alert tree to reflect an organizational chart or a chain of command.
Your plan should also show a sample notification message for the users of your systems, which can sometimes be tricky. Your message needs to contain enough information so that legitimate users understand what's happening. They need to know:
Exactly which things that they normally do aren't going to work
When service will be restored
What they're supposed to do (including leave you alone so that you can concentrate on the response)
That you're going to tell them the details later
Think about how you are going to send your message. If you send it via electronic mail, remember that the intruder may see it. Even if you know that your own systems are clean, don't assume that other people's are. Don't say anything in your message that you don't want the attacker to know. Even better yet, use a telephone.
Some sites use a simple code phrase to announce a system attack that they can include in electronic mail. This can rapidly degenerate into bad spy fiction, but if you have an agreed-upon phrase that isn't going to alert an intruder (and isn't going to cause people who don't know it or don't remember it to give the game away by asking what on earth you're talking about), it can be effective. Something like "We're having a pizza party; call 3-4357 to RSVP" should serve the purpose.
Should you contact your organization's security department? At some organizations, the security department is responsible only for physical security. You'll want to have a contact number for them in case you need doors unlocked, for example, but they are unlikely to be trained in helping with an emergency of this kind, so you probably won't need to notify them routinely of every computer security incident. However, if a group within your organization is responsible for computer security, you are probably required to notify that group. Find out ahead of time when the members of the group want to be notified and how, and put that information in the plan. Even if that group cannot help you respond to your particular type of incident (perhaps because they may be personal computer specialists or government security specialists), it's advisable to at least brief them on the incident after you have finished responding to it.
any vendors and service providers have special contact procedures for security incidents. Using these procedures will yield much faster results than going through normal support channels. Be sure to research these procedures ahead of time and include the necessary information in your response plan.
If you are providing Internet service for other sites, however, or have special network connections to other sites, you should have contact information in the plan and should contact them promptly. They need to know what happened to their service and to check that the attacker didn't reach them through your site.
Reinstalling an operating system from scratch is time consuming, unpleasant, and often exposes underlying problems. For example, you may discover that you no longer know where some of your programs came from. For this reason, people are extremely reluctant to do it. Unless your incident response plan says explicitly that they need to reinstall the operating system, they probably won't. The problem is, this leads to situations where you have to get rid of the same intruder over and over again because the system hasn't been properly cleaned up. Your response plan should specify what's acceptable proof that the operating system hasn't been tampered with (for instance, a comparison against cryptographic checksums of an operating system known to be uncompromised). If you don't have those tools, which are discussed in Chapter 10, "Bastion Hosts", or if you can't pass the inspection, then you must install a clean operating system, and the plan should say so.
The plan should also provide the information needed to reinstall the operating system; for example:
Where are the backups, and how do you restore from them?
Where are the records that will let you reconstruct third-party or locally written programs?
A good time to review your incident response plan is after a live drill, which may have exposed weaknesses or problems in the plan. (See Section 27.5.7, "Doing Drills" at the end of this chapter.) For example, a live drill may uncover any of the following: