Table of Contents
I will propose to move the mapping of user input device events to actions from an application to a system-wide level.
User input devices encompass keyboards, mice, trackballs, joysticks, graphic tablets among others.
Events are what such devices emit to report key presses, mouse movement and similar.
Actions are everything applications do in direct response to user interaction, for example the execution of commands like Delete or Scroll Upwards.
Linux based operating system environments are the target, but the concepts are not necessarily limited to this scope.
There are cases of applications using different shortcuts or mouse events for the same or similar actions. For example, the GNOME image viewer maps the scrollwheel to zooming, while almost
everywhere else, including the GIMP image editor, it is mapped to scrolling. The user can’t easily change hard-coded mappings like that.
Say a user wants Ctrl-P to bring up preferences in every application instead of a printing dialogue (maybe he doesn’t have a printer). Or wants Ctrl-mouse-wheel to scroll sideways everywhere. Not
all applications have customisable shortcuts and only few have mouse options. Going through preferences in any single application to achieve consistent behaviour wouldn’t really be an option,
Conflicts between application and windowmanager keyboard shortcuts are a common problem. Compositing WMs with their many shortcuts only make it worse. Avoiding conflicts is hard, as one can’t
easily check the shortcuts of all installed applications.
3 Expected Benefits
Application developers should have no reason to hard-code event-action mappings. Use of the new system should introduce configurability at a low cost.
Listing all mappings in a central place (with searching and filtering, of course) will make it straightforward to apply the same mapping to identical or similar actions in all installed
applications. Conflicts will be obvious immediately and resolving them will no longer require changing preferences in 2 places or having to accept a hard-coded mapping.
It will become feasible to have different mappings optimised for varying keyboard layouts, left or right-handed users or users with disabilities.
All applications can benefit from support for alternative input devices without an extra cost.
To develop a good concept, one first needs to see what is given: the components of the current system and their interaction.
- User input devices
- Events emitted by the devices
- The software stack concerned with user input
- Interpretation of input events
- The application actions to be mapped
4.1 User Input Devices
- Alphanumeric Keyboards
- Pointing devices
- Pointing sticks (TrackPoint)
- Eye-tracking devices
- Wii Remote
- MIDI Controllers
- Fader / knob-boxes
The list could surely be extended. At least the more common devices should be supported explicitly.
Devices can have several independent means of manipulation. A mouse for example might have one or more buttons and likely a wheel. I will refer to these as Elements.
The specific design of input devices is not so much of concern as the different events they send.
There are also events send back to devices, for example to control force-feedback.
4.2.1 Discrete Events
Simple Events like key-down and key-up or directions from digital joysticks.
4.2.2 Continuous Events
Events with a value attached. Examples: x and y coordinates of a pen on a tablet, the position of a fader on a MIDI controller. These can be interpreted relative to earlier events (mouse position)
or as absolute values (pen on tablet in absolute mode).
4.3 Software Stack (Linux, BSD)
Events from input devices are first handled by the driver layer of the kernel. The kernel passes the events on to user-space. The virtual console layer makes sure only applications on the current
console receives the events.
To work with console applications outside an X-server, my proposal will need at least one component below the X-server.
4.3.2 The X-server
If an X-server is running, it grabs all keyboard input (as long as it’s on the current virtual console). If there’s mouse support on the console level via gmp, it has to be stopped to not conflict
with the X-server, as far as I remember.
The X-server takes care of interpreting pointing device coordinates and drawing the mouse cursor and passes other events on to the window that has input focus. Windows are not only the obvious
movable and resizable rectangles. Many widgets are their own X windows. Widgets that don’t receive input like labels can be exceptions.
4.3.3 The Windowmanager
A windowmanager is responsible for drawing window frames and title bars and handles the position, size and z-order of windows. It sets the input focus to a window either if the pointer is moved
over it or if it’s clicked on, depending on the focus policy. Windows also receive input focus if they are selected from a panel, window list or similar means.
4.4 Interpretation of Events
4.4.1 Keyboard Mnemonics
Keys or key combinations that can be used instead of a pointing device to access functions of graphical user interface elements. Best known are the keys for navigating menus, for example:
Alt-R, R for View: Ruler.
Shortcuts are single alphanumeric keys or keys combined with one or more modifier keys (Shift, Ctrl, Alt, Super/Windows). Different from keyboard
mnemonics, they are not navigational, but are directly associated with commands.
Shortcuts should be mnemonic like Ctrl-C for Copy. Obviously this is a problem with translations. Usually shorcuts are assigned with English in mind and left that way.
4.4.3 Key Sequences
Issuing commands not by pressing keys simultaneously, but by chaining keys / key-modifier combinations. Key Sequences are a characteristic feature of Emacs. They are rather hard to memorize but
offer a means to make a large number of commands available.
In some cases keys/shortcuts are used to cycle through a number of modes or options. With only 2 states, it can be called toggling.
4.4.5 Pointing Device Specialities
Events related to pointing devices are interpreted in some specific ways. Applications do not receive the events directly, but rather meta-events, then. This is the case for double and tripple
clicks and drag-and-drop operations.
4.4.6 Mouse Gestures
Specific pointer movement and click combinations to issue commands. Usually performed by holding down a mouse button and drawing lines or simple shapes.
The Opera web browser is probably the best known mouse gesture supporting application (http://www.opera.com/products/desktop/mouse/).
There are also programs to add gesture support on a system level for several platforms.
Many applications have modes, where some or all of the same events can be mapped differently. These modes can be expressed and triggered in several ways:
- separate windows
- regions in a window
- on-screen buttons or events to switch between modes
Widgets have their own mappings. GTK+ sliders for example can be moved with the mouse wheel if the pointer is above them. Modifier keys can be used for additional functionality. This is mainly a
matter of the toolkit, often lacking configurability.
Some applications allow starting and stopping of loops synchronized to the beat. So a command can be delayed until the next measure, beat or fraction.
While having such functionality on a more general level would be interesting, it is no core issue here.
Simple commands without arguments (other than provided by the context), e.g. Save.
Setting a value like volume or amount of red in a colour.
4.5.3 Common Actions
There are many common actions with standardized names like Save, Save As, Copy, Paste, Print. They should always have the same shortcuts and are thus
candidates for being handled in an application-independent way.
- Work at least with alphanumeric keyboards and common pointing devices (mice, trackballs, touch pads, pointing sticks, tablets)
- Allow mapping of input events to actions for all applications and for single applications.
- Allow applications to register their actions.
- Support modes in applications (mapping on a per mode level)
- Support keyboard mnemonics and shortcuts.
- Provide information to applications so they can indicate mappings (e.g. list shortcuts in menus).
- Allow mapping also for widgets (ideally cross-toolkit)
- Allow mappings to generic actions (e.g. Save)
- Have a list of all mappings that can be filtered for applications and events used.
- Allow all mappings or subsets to be handled as packages (export, import, distribution, selection).
- Have both CLI and GUI based configuration.
- Allow configuration to be accessed from within applications.
- Allow shortcuts independent of input focus (example use: play/pause for an audio player independent of its window)
- Allow application shortcuts to take precedence over WM shortcuts (WM shortcut will only be triggered if no application with the same shortcut has focus)
- Support pointer gestures (mouse gestures)
- Support key sequences
6 System Design
6.1.1 Alternative 1: Deep
Doing the most for consistency even on the console at the cost of far reaching changes.
- Receive input events from the kernel before any other program.
- Allow selection of a pointing device to control the cursor and forward its events to the X-server.
- Interpret input events to trigger application actions directly (applications do no get to see the input events).
6.1.2 Alternative 2: Light
Try to limit the scope and depth of changes to the system and applications.
- Receive non-pointer input events from the X-server or grab for them if the X-server can be changed to not do so.
- Do mapping configuration but let applications check for their mnemonics and shortcuts themselves (like now).
6.2 Mapping Organisation
6.2.1 Common Actions
Common actions need standardised names (already given for many menu items, like Save, Cut, Copy …).
Events could be mapped to not further specified actions, putting them into effect for all applications that have actions of the given name. Actions of a specific application could be referred to
with an Application:Action pair.
A Mode has to be the smallest item defining the scope of mappings. So for mapping, applications can be groups of modes, but have no other significance.
An application may have both application wide and mode dependent actions. It must be possible to have several modes active at the same time and to put them into a sequence in which they are
checked for mappings.
So modes shall be sets of mappings that can be activated or deactivated depending on the state of the application they belong to (running, input focus, mode switching commands).
Currently WM mappings take priority over application mappings. This could be modeled with modes that always come first. To allow application mappings to take precedence, WM mappings should be
handled with normal modes. The same for focus independent mappings in general.
6.3 Migration and Cross Platform Strategy
There could be a library that allows applications to use the system level mapping infrastructure if present, or to have only per application mappings otherwise. So ideally an application would
stay fully functional on every platform, but would use mapping infrastructure automatically, if present.
6.4 Positive Side Effects
The mode system could serve as a model for the implementation of modes and mappings in applications.
Registering actions and allowing them to be triggered explicitly should be very interesting for scripting, voice control, remote control, and GUI/Engine separation.
Besides possible corrections and filling in details, the next step will be to design a CLI and a GUI tool for configuring mappings.
Thanks to JM Ibanez, Ross Burton and especially Daniel Stone for explaining input event handling to me on the xorg mailing list (archive).