TIP 143: An Interpreter Resource Limiting Framework

Login
Author:         Donal K. Fellows <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        25-Jul-2003
Post-History:   
Tcl-Version:    8.5
Tcl-Ticket:     926771

Abstract

This TIP introduces a mechanism for creating and manipulating per-interpreter resource limits. This stops several significant classes of denial-of-service attack, and can also be used to do things like guaranteeing an answer within a particular amount of time.

Rationale

Currently it is trivial for scripts running in even safe Tcl interpreters to conduct a denial-of-service attack on the thread in which they are running. All they have to do is write a simple infinite loop with the for or while commands. Of course, it is possible to put a stop to this by hiding those commands from the interpreter, but even then it is possible to simulate the DoS effect with the help of a procedure like this (which even under ideal conditions will take over three days to run):

proc x s {eval $s; eval $s; eval $s; eval $s}
x {x {x {x {x {x {x {x {x {x {x {x {x {x {sleep 1}}}}}}}}}}}}}}

The easiest way around this, of course, is the resource quota as used by heavyweight operating systems, perhaps combined with the alarm(2) system call. Or at least that would be the way to do it if it wasn't for the fact that those are ridiculously heavy sledgehammers to take to this particular nut. Luckily we control the execution environment - it is a Tcl interpreter after all - so we can implement our own checks. It is this that is the aim of this TIP.

Efficiency

Efficiency of any resource monitoring system is naturally a major concern; it is relatively simple to create a resource monitoring system but quite a lot harder to arrange for that monitoring to be cheap enough that its use does not greatly impact on the general speed of the program (in this case, Tcl.) The costs of checking time limits in particular can be somewhat excessive because of the necessity of performing a system call to carry out the check, but if you're performing any action a lot, it can get costly.

My strategy for limiting the impact upon performance is to provide a programmer-tunable per-limit parameter called the granularity. This specifies how often (out of all the locations in the processing of the Tcl interpreter and bytecode engine) the limits should be checked; the granularity is per-limit because there is this distinct difference in the cost of checking different kinds of limits, and the limits are tunable because the ideal frequency depends very largely on the code being limited. This can be combined with the fact that the test to see if any limits have been turned on can be made extremely cheap (i.e. just a comparison of a memory location to zero) and it allows unlimited interpreters to perform at almost the same speed as before.

A Tcl API for Limit Control

I propose to add a subcommand limit to the interp command and to the object-like interpreter access commands. It will be an error for any safe interpreter to call the limit subcommand; master interpreters may enable it for their safe slaves via the mechanism interpreter aliases, as is normal for security policies.

The first argument to the interp limit command will be the name of the interpreter whose limit is to be inspected. The second argument will be the name of the limit being modified; this TIP defines two kinds of limits:

time: Specifies when the interpreter will be prohibited from performing further executions. The limit is not guaranteed to be hit at exactly the time given, but will not be hit before then.

command: Specifies a maximum number of commands (see info cmdcount) to be executed.

The third and subsequent arguments specify a series of properties (each of which has a name that begins with a hyphen) which may be set (if there are pairs of arguments, being the property name and the value to set it to) or read (if there is just the property name on its own). If no properties are listed at all, all dictionary (see [111]) of all properties for the particular limit is returned. The set of properties is limit-specific, but always includes the following:

-command: A Tcl script to be executed in the global namespace of the interpreter reading/writing the property when the limit is found to be exceeded in the limited interpreter. If no command callback is defined for this interpreter, the -command option will be the empty string. Note that multiple interpreters may set command callbacks that are distinct from each other; an interpreter may not see what other interpreters have installed (except by running a script in those foreign interpreters, of course.) The order of calling of the callbacks is not defined by this TIP.

Where a callback returns any kind of exceptional condition (i.e. the result isn't TCL_OK) a background error is flagged up.

-granularity: Limits will always be checked in locations where the Tcl_Async*() API is called, as this is called regularly by the Tcl interpreter (with a few exceptions relating to event loop handling.) However, the cost of just checking a limit can be quite appreciable (it might involve system calls, say) so this property allows the control of how frequently, out of the opportunities to check a limit, are such limits actually checked. If the granularity is 1, the limit will be checked every time. It is an error to try to set the granularity to less than 1.

When an interpreter hits a limit, it first runs all command callbacks in their respective interpreters. Once that is done, the interpreter rechecks the limit (since the command callbacks might have decided to raise or remove it) and if it is still exceeded it bails out the interpreter in the following way:

When resource limits are being used, unlimited master interpreters should take care to use the catch command when calling their limited slaves. Otherwise hitting the limit in the slave might well smash the master as well, just because of general error propagation. But that is good practise anyway.

Time Limits

Time limits are specified in terms of the time when the limit will be hit. Setting the limit to the current time ensures that the limit will be immediately activated. Time limits have two options for specifying the limit.

-seconds: This sets the absolute time (in seconds from the epoch, as returned by clock seconds) that the time limit is hit. If set to the empty string, the limit is removed. If no time limit is set, this option is empty when inspected.

-milliseconds: This sets the number of milliseconds after the start of the second specified in the -seconds option that the time limit is hit. May only be set to the empty string if the -seconds option is empty and present, and may only be set to a numeric value if the -seconds option is unspecified or present and non-empty (it is always safe to leave this option unspecified.) If no time limit is set, this option is empty when inspected.

Where a time-limited interpreter creates a slave interpreter, the slave will get the same time-limit as the creating master interpreter.

Command Limits

Command limits are specified in terms of the number of commands (see info cmdcount for a definition of the metric) that may be executed before the limit is hit. Command limits have the following extra option for specifying and inspecting the limit.

-value: The number of commands (integer of course) that may actually be executed. If set to the empty string, the limit is removed, and when introspecting, unlimited interpreters return empty strings for this value.

Where a command-limited interpreter creates a slave interpreter, the slave will get the command-limit 0 after initialisation (i.e. after the return from the call to Tcl_CreateSlave()) and will be unable to execute and commands until the limit for the slave is raised. Master interpreters implementing security policies for safe interpreters might want to set such limits semi-automatically to something more useful by deducting command-executions from the creating interpreter to its new slave.

A C-level API for Limit Control

It is also desirable for there to be a general C API for controlling resource limits. Not only does this provide control to extension authors that can't be easily smashed by Tcl scripts by accident, but it also makes implementation of the Tcl API to the limit subsystem easier to create as well.

Above, type is either TCL_LIMIT_COMMANDS or TCL_LIMIT_TIME, handlerProc is a pointer to a function that takes two parameters (a ClientData and a Tcl_Interp*) and returns void, and deleteProc is a pointer to a function that takes a single parameter (a ClientData) and returns void. The key is that handlerProc_s are called when a limit is hit (they are used to implement the guts of the -command option), and when the callback is deleted for any reason (including a call to Tcl_LimitRemoveHandler and deleting the limited interpreter) the _deleteProc is called to release the resources consumed by the clientData context.

Use Cases

WebServices

One use for this sort of code might be in a web-services context where it is important to return a message to some client code within some interval. Using an in-process limiting mechanism allows this to be implemented in a far more light-weight fashion, as the alternative would be to fork off a new small application server for each incoming request and it would be considerably more complex to have a scripted executive that decides (possibly by examining the stack) whether a failure to deliver an answer within bounds is serious, or whether some extra resources should be granted to allow execution to run to completion. Other high-performance server applications would also be likely to gain from this sort of thing.

Profiling

It is possible to use the limiting code (and especially the script callbacks) to write a Tcl profiler. Every time the limit runs out, the callback can examine the Tcl stack in the limited interpreter and then assign some more resources to last up until the next profile trap.

Untrusted Code Execution

As indicated earlier, these limits can be used to increase control over untrusted code running in safe interpreters. While it would be necessary to extend this to memory consumption for every aspect that could be impacted by some malicious code to be controllable, having control over the number of commands that may be executed and how long those commands may take gives a much higher degree of control than currently exists, and is thus a monotonic improvement.

Possible Future Directions

There are some obvious other things that could be brought within this framework, but which I've left out for various reasons:

call stack: We already do limiting of this, so bring that within this framework would be a nice regularisation. On the other hand, such a change would not be backward-compatible, and it might not be safe to perform the callbacks either (especially as overflowing the stack is fatal in a way that overrunning on time is not.)

memory: Limiting the amount of memory that an interpreter may allocate for its own use would be very nice. But conceptually difficult to do (what about memory shared between interpreters?), expensive to keep track of (memory debugging is known to add a lot of overhead, and memory limiting would be more intrusive) and a really big change technically too (i.e. you'd be rewriting a significant fraction of the Tcl C API to make sure that every memory allocation knows which interpreter owns it.)

open channels, file sizes, etc.: Most other things can be limited in Tcl right now without special support.

Implementation

An example implementation of this TIP is available as a patch http://sf.net/tracker/?func=detail&aid=926771&group_id=10894&atid=310894 .

Copyright

This document is placed in the public domain.