Standard Timetable Data Format

Andrew Cumming

Introduction

A standard data format would be useful to many people working on automatic timetabling. It would allow comparisons between the performance of different approaches. A set of benchmark problems may be created and would have a choice of products to create their data files and to solve their timetable problems

Any attempt to create a standard format must negotiate a compromise between allowing sufficient generality to be applicable to a wide range of problems but not so general that the format is unwieldy or difficult to process.

This paper is based on a seminar at the ICPTAT'95 conference in at which the standard was discussed; it is necessarily only partial. A more detailed version of a workable standard will hopefully emerge.

Principles guiding the format

We require that:

It should be easy to extend the format maintaining "backward compatibility":

That is: data in older formats may be read by programs compatible with later versions of the format. Older programs for reading and writing data will degrade gracefully as the standard evolves.

We should attempt to make the format easy to read:

We should use the simplest possible syntax.

It should be popular and widely used:

The format should encompass a wide range of "real-life" constraints. In particular it should include both course and exam timetables.

It should be as small as possible:

We include nothing that does not need to be included. There should be a clear and simple meaning. There should be as little syntax as possible.

We concede that:

The basic format is verbose:

It is not a particularly efficient way of storing the data. While the actual volume of data is unlikely to be of concern in itself, a verbose syntax makes editing the data "by hand" cumbersome. However the use of simple conversion filters or macro languages such as the C pre-processor can reduce the size of files.

We might view the standard as low level language that is generated by higher level languages.

It will carry less data than is required for course administration:

It will probably not be the format that most of us use internally. It should of course include all the data required to produce good timetables.

Scope

The standard format may be used to describe each of the following.

Components

The following are the main proposed components:

Times: A finite number of times are offered for each event to be scheduled. It is possible to determine the day on which a given time-slot occurs - either each time-slot name incorporates the number of the day or possibly each time-slot is declared to be on a given day. It is possible to calculate the number of minutes between time-slots - again, either by incorporating the time in the name of the time slot or by declaring time-slot names to be at given time. The proposal is for every time slot to be referred to in the form day.hh:mm where day is a number from 0 and hh and mm are hours and minutes in two digit twenty-four hour format.

Events: Events are given a unique name. An event may be given a duration, it will in general have a number of attendees, it may have a selection of suitable rooms where it may be placed and it will have a number of time-slots into which it may be scheduled.

Resources: A resource may be a room or a person or a group of people. We need to identify people or named groups of people (rather than simply generate clash lists) so that we can define a penalty function over an individual's timetable.

Staff and students: There seems to be no good reason to distinguish between staff and students, indeed to do so would imply that staff may not also be students, while this may be unusual it would be mean to legislate against it. Usually the member of staff is not considered to contribute to the size of an event - we can simply give staff a size of zero.

Rooms: Rooms differ from people in that they may sometimes be shared by events (mostly in exam timetables) and that they may be distant from each other and require travel time. As each event is given a specific list of rooms we do not need to be concerned special facilities that some rooms may have but others may not. Indeed we do not even need room capacity and class size unless rooms are being shared.

Some omissions from the format

Availability: If a person is unavailable at certain times then those slots must be excluded from the list of slots offered to the events which that person attends. Alternatively a number of placed dummy events may be used; there needs to be one dummy event for each time slot. If a room is unavailable at a certain time then we have no option but to place dummy events in it for each of the times. There is an argument for having a keyword to indicate availability (or non-availability, but not both) for rooms.

Choosing attendees: There is no means of offering a choice of who is to attend an event - the system may not be used to decide who is to teach a given event for example.

Split events: There is no easy way to allow events to be split across more than one room. If we allow events to be split then we must provide a means of penalising such splits.

Zones: The distance keyword specifies the time required to get from one room to another. The amount of data can usually be reduced by considering rooms to be within zones or campuses or sites. Only travel from one site to another is significant.

Share costs: there is no cost associated with sharing rooms. Presumably we would (at least sometimes) prefer not to have events sharing rooms however it does not seem obvious to me how the cost should be calculated.

Atom declaration: It might be useful to have a list of all the atoms declared at the top of the file. It would be a positive boon to those editing by hand as simple typos could be caught; it would be no great chore for those generating the data from programs.

Defaults: We might allow reference to all.sessions, all.rooms or all.events possibly making huge savings in the volume of data. We could simply have a filter which expands such references or make the terms part of the standard which would allow programmers to take advantage of possible memory savings. In order to allow this we would have to insist on declarations of all sessions, rooms, events and people.

Hard and soft constraints: The event-event restrictions may only be hard constraints, the constraints on gaps, spans and day-loads must be soft.

No-load events: We might wish to create "dummy" events to ensure that people get breaks at specific times however these events would considered to be work when evaluating the number of consecutive events or calculating the load per day. It may be desirable to have either a loading to give a measure of the work involved (a teacher might have a load of 2 for a lecture, 1 for a tutorial and 0 for lunch) or a simple no-load key word to indicate placed breaks.

Weeks: The structure of slots permits sessions and days, there is no mechanism for representing events which occur almost regularly. For example while most events in a weekly timetable may be considered to run every week of the term or semester, some events may occur on only a few weeks, the pattern may allow two or more events to be put in the same slot as the do not coincide. It may even be desirable be allowed to move events from week to week - there would have to be a penalty for this.

Set-up times: Some events may require a set-up time. This is a period of time before the event when the room is free but the attendees do not have to be available.

Key word list


Key Word            Notes                                    Parameters         

attends                                                      people event       

capacity            The size of each room may be specified   room integer       

cost.event.session  Gives the cost of placing the given      event session      
                    event in the given session               penalty            

cost.free.day       Gives the cost associated with the       resource integer   
                    given resource having exactly  the       penalty            
                    specified number of free days                               

cost.gap            The cost of an individual having a gap   resource time      
                    of at least  the specified time during   penalty            
                    the day, a late start or early finish                       
                    does not constitute a gap                                   

cost.load.day       Indicates the cost associated with each  resource time      
                    day for which the work load equals or    penalty            
                    exceeds the given time                                      

cost.resource.sessi Gives the cost of any event using the    resource session   
on                  given resource (person or room) at the   penalty            
                    given time                                                  

cost.span           The cost of an individual having to      resource time      
                    work for the specified time or longer    penalty            
                    without a break                                             

cost.travel.count   Gives the cost of each occasion of       resource penalty   
                    travel for the specified people                             

cost.travel.time    Specifies the cost (per hour?) of        resource penalty   
                    travel time spent by the individual or                      
                    group specified                                             

day                 Not required. This declares that a day   integer            
                    number may be  used.                                        

distance            The time required to travel between the  room room time     
                    given rooms, this introduces hard                           
                    constraints on the minimum time between                     
                    distantly placed events with common                         
                    attendees                                                   

ee.atleast.between. There must be at least time minutes      event event time   
time                between the end of one event and the                        
                    start of the other                                          

ee.atmost.between.d There must be at most int days between   event1 event2 int  
ay                  event1 and event2                                           

ee.exactly.before.s event2 must start exactly int slots      events1 event2     
lot                 after event1 starts                      int                

ee.xxx              There are 18 combinations of             event1 event2      
                    ee.[atleast|atmost|exactly].[between|bef [int | time]       
                    ore].[time|slots|days]                                      

event               These need not be explicitly declared.   event              

intersects          Indicates that the two specified rooms   room room          
                    are mutually exclusive. If one is in                        
                    use then the other may not be used                          

lasts               The duration of an event                 event time         


offer.room          Indicates that the given event must be   event room         
                    assigned a room and that the named room                     
                    is one option                                               

offer.session       Indicates that the given event must be   event session      
                    assigned a time and that the given time                     
                    may be suitable                                             

people              These need not be explicitly declared.   people             
                    A label of this type may represent                          
                    either a person or a group of people                        
                    following identical timetables                              

place.room          Indicates that the given event has been  event room         
                    assigned the given room                                     

place.session       Indicates that the given event has been  event session      
                    assigned the given time                                     

room                These need not be explicitly declared.   room               

session             These need not be explicitly declared.   integer            
start-time         
max-duration       

share               Indicates that the event may share a     event              
                    room with another event subject to the                      
                    capacity of the room                                        

show.day            Not required. This gives a  name for a   int dayname        
                    day - it may be useful in printing                          
                    timetables                                                  

size                The number of people in the group        people integer     



There may be additional restrictions that may be regarded as extensions to this basic scheme. Additional key words may be added by individuals; unrecognised key words should be ignored. For example, our data includes key words which give instructions on printing timetables - these may be easily filtered.

Data syntax

Having decided the information that is to be contained in the standard we must consider the syntax of the data that is to represent that information. My view is that we should restrict ourselves to tables of data and find a suitable way to represent these tables.

The proposal given here is that each statement consists of a key word followed by a number of parameters. The number and the type of the parameters are determined by the keyword making the format extremely easy to process. The order in which statements occur is not in general significant.

Delimiters

As all statements have a fixed, known size which depends on the keyword delimiters are not strictly required. However it may be useful to have some means of denoting the end of a statement as this will allow files with unknown keywords to be processed. End of line or semicolon would be ideal candidates.

Attends example

The attends keyword takes two parameters, people and event. For each event there must be one attends statement for every person or group of people attending the event.

The recreational chemistry course consists of one lecture and one tutorial. The tutorial groups are bsc.1.a bsc.1.b and bsc.1.c, Dylan takes all lectures and tutorials.

attends bsc.1.a rec.chem.L 
attends bsc.1.a rec.chem.T.a 
attends bsc.1.b rec.chem.L 
attends bsc.1.b rec.chem.T.b 
attends bsc.1.c rec.chem.L 
attends bsc.1.c rec.chem.T.c 
attends dylan rec.chem.L 
attends dylan rec.chem.T.a 
attends dylan rec.chem.T.b 
attends dylan rec.chem.T.c 

This can be made more succinct by allowing extensions to attends. For example attends.cross takes a list of resources and a list of events and makes all resources attend all events.

attends.cross [bsc.1.a bsc.1.b bsc.1.c dylan] [rec.chem.L] 
attends.cross [dylan] [rec.chem.T.a rec.chem.T.b rec.chem.T.c] 
attends bsc.1.a rec.chem.T.a 
attends bsc.1.b rec.chem.T.b 
attends bsc.1.c rec.chem.T.c 

The cross convention

Each line or statement of a data file consists of a key word; usually the name of a function or relation followed by a fixed number of arguments. The number of arguments depends on the key word. For example, attends takes two parameters: a resource and an event. If a key word has .cross attached then each of the parameter positions is replaced by a list instead of a single item. The meaning of such a statement is the Cartesian product of all the items in all of the lists. So long as someone writes a simple filter in C or awk to convert the more succinct format into the verbose-standard (and this need be done only once) we can be vague about which is permitted. Any number of such filters might be employed and piped together.

When swapping data we may either take the verbose standard version or take the concise version together with the appropriate filter(s).

The cross convention is not proposed as part of the standard - it is simply a tool which may be employed to make writing the data easier.

Data Volume

We estimate that the cost of the proposed format is at worst a three-fold increase in the volume of data. Consider the more succinct format where an event is represented by a record with a number of fields some of which may be repeated:

event-name, duration, share, {attendee}*;

bsc.1.a, 3:00, true, bsc1.a dylan; 

This becomes

lasts rec.chem.T.a 3:00 
share rec.chem.T.a 
attends dylan rec.chem.T.a 
attends bsc.1.a rec.chem.T.a 

For each field we need to have a key word, the name of the event and the value of the field taking roughly three times as much space as the more succinct format. The advantage is that we do not have to think of everything, when a new field is introduced - for example the set-up time - the old data is still perfectly valid, it does not have to be translated.

Time slots

A number of days are declared in the appropriate order. Time slots or sessions are referred to by the day number and the time., for example 0.09:00

Optionally the days may be declared or named:

day 0 
day 1 
day 2 
day 3 
day 4 
show.day 0 mon 
show.day 1 tue 
show.day 2 wed 
show.day 3 thu 
show.day 4 fri 

Or

show.day 0 montag 
show.day 0 dienstag ... 

Using the cross notation...

day [0 1 2 3 4] 

It may be useful to explicitly declare all possible time slots. The session relation does this, it is from a day to a time slot. A time slot is a start time together with the maximum duration. session : day <-> time x time In the following example Weds (day 2) is a half day.

session.cross [0 1 3 4] [09:00 10:00 11:00 12:00 14:00 15:00 16:00] [1:00] 
session.cross [2] [09:00 10:00 11:00 12:00] [1:00] 

The above declarations introduce the following session declarations:

session 0 09:00 1:00 
session 0 10:00 1:00 
session 0 11:00 1:00 
... 
session 4 16:00 1:00 

These sessions are referred to as 0.09:00, 0.10:00, 0.11:00, 0.12:00...

We can specify the length of an event

lasts rec.chem.L 1:00 
lasts.cross [rec.chem.T.a rec.chem.T.b] [3:00] 

Where the duration of an event is not given we may assume that the resources concerned will be free by the beginning of the next slot. Allowing the duration to be omitted may introduce considerable complexity particularly in the calculation of gaps, loads and travel time.

Offering choices

The system is to be offered choices for where and when to place events. Each line of the form

offer.time rec.chem.L 0.09:00 

adds one more option to the list of times at which the event takes place. We would wish to use the cross operator here:

offer.time.cross [rec.chem.L] [0.09:00 0.10:00 ... ] 

Even with this notation the amount of text is uncomfortably large. Simple tools such as the C pre-processor may be employed to reduce the load.

#define mon 0 
#define tue 1 
#define wed 2 
#define thu 3 
#define fri 4 
#define MORN(d) d.09:00 d.10:00 d.11:00 d.12:00 
#define DAY(d) MORN(d) d.14:00 d.15:00 d.16:00 
#define ALLTIMES DAY(mon) DAY(tue) MORN(wed) DAY(thu) DAY(fri) 
offer.time.cross [rec.chem.L] [ALLTIMES] 

The meaning of the above is the huge list of statements:

offer.time rec.chem.L 0.09:00 
offer.time rec.chem.L 0.10:00 
offer.time rec.chem.L 0.11:00 ... 

Penalties

Penalties are added. A typical event will incur penalties from several sources. A democratic approach would be to consider penalties to be proportional to the size of the class effected. Alternatively the number of people-resources attending could be the measure. This would typically give an entire tutorial group the same weighting as a single member of staff.

The evaluation of a timetable is likely to be the most difficult part to standardise. Different institutions are likely to have their own peculiar requirements. We propose that the standard include a relatively small number of basic evaluation metrics and allow for additions. Where an attempt is made to optimise a time table using data which includes an unknown function the optimiser must choose between finding out about the unknown function and writing code to evaluate it, or simply ignoring the function and failing to optimise that feature. We should write programs that cope with unknown functions sensibly - typically to throw up a warning - but continue processing.

References

ttp mail-list

ftp://ftp.cs.notts.ac.uk/Postings/data-syntax David Corne, Kirk Jackson, Peter Ross, Ben Paechter et al.