Using two or more channels.

My aim is to build what could be described as a trainable universal interface device which can be used by people with fairly severe physical disabilities. The "universal" requirement is important; because one-off solutions are very expensive to produce, I want the device to be as widely applicable as possible. That's a difficult condition to satisfy, because pretty well everyone with severe disabilities is unique, with an individual combination of abilities which can be used as communication channels. That in turn is why the device has to be trainable - the universal device must be adapted to individual needs in each case.

The most common "high-tech" communication solution for someone in this group at present is to identify the potential output channel - usually some movement of a limb, though tongue movements, eye movements, or other signals can be used - which is likely to work best, and provide some sort of switch which can be operated by exercising the channel. In very favourable circumstances, the person can control the channel well enough ( which includes speed, endurance, consistency, cognitive abilities, and other qualities ) to use Morse or a similar code directly, but very often that isn't possible. Other means may then be used, such as selection from a grid of letters or words by using switch operations to control the horizontal and vertical grid coordinates. This is slow; a highish speed is 30 selections per minute, and at the lower end "speeds" of about three selections per minute are not uncommon.

That statement is really insufficiently emphatic : this communication rate is very slow indeed. That might seem obvious, but knowing the number doesn't tell the full story. It took me years to realise just how restrictive it must be to be forced to communicate at that sort of speed - and how do I know whether I've got it right even now ? But if that's what your best output channel can do, how can you do better ? Methods such as predictive word finders, special means of encoding, and so on have been developed, and are helpful, but they're all ultimately limited by the very narrow bandwidth of the primary communication channel between the person and the world.

It is my hope that you can do better by learning to use more than one channel. If the best channel is slow, the next must be slower, but the two taken together must be better - in principle - than the best channel alone. The clever trick is to make both channels available, and to develop encoding schemes which can be used with the two ( or three, or four ... ) channels to take advantage of the augmented bandwidth. It's these encoding schemes which are the root of the flexibility I hope to achieve, and where the trainability comes in. Because the people's conditions, the number of channels, and their absolute and relative speeds, reliabilities, and so on will be different for each potential user of the system, optimum performance will require a unique encoding - but if the machinery can be trained to recognise the unique codes in each case, there is a good chance of making it work.

Getting the machinery to work is not the only, or even the most important, part of the job : the machinery is no good if people can't use it. The machinery must be so designed that people can learn to use it at the start, and so that they can improve with practice. When we learn to write, we start by forming characters separately, one by one, with great care and very slowly. Later, we learn to join the characters together - and later still we develop our own characteristic handwriting style. Each step is accompanied by an increase in speed, but the steps take time. An interface supporting such a process of development must be adaptable in the sense that it will learn to recognise how we are forming the various symbols in the unique "handwriting" style which we develop for ourselves. So much for improving with practice - but how do we design the machinery to be learnable from the start ? Ah - that's harder.

I've skipped the question of whether it's possible at all. Can people really learn to communicate using an arbitrary code based on more or less coordinated movements of various body parts ? I believe that there are good grounds for optimism. Our brains seem to be very good at learning and organising such activities. We do it all the time : typists use fingers, drivers use hands and feet ( and watch an expert using something like a mechanical digger some day ), organists use hands, fingers, and feet - and all for communicating messages which are not "naturally" the functions of the body organs concerned. And if that doesn't convince you, think about those who really do communicate as a matter of course, at ordinary conversational speeds, by gesture alone - those who use the various manual sign languages. It really works; we really can do it.

But back to the machinery; it's easier, I know more about it, and we can't get very far with testing the rest until we have some machinery which will do it. What will the signals be like ? In principle, I'd like to accommodate any sort of transducer which might be useful, which includes simple switches, multi-way switches, and analogue sensors of many sorts. In practice, I have to start somewhere, so I'm beginning with the most common current case of one or more on-off switches, each associated with one channel of communication. The sorts of signal which you can make will therefore be combinations of marks and spaces on one or more channels.

Does that mean that I've missed out the whole analogue range ? Not really. Even though all our traditional means of communication rely on analogue media - sound frequencies for speech, coordinates on paper for writing, coordinates in space for gestures, and so on - we never use true analogue symbols for anything but the most basic of emotional communications. We always quantise the available bandwidth : phonemes in speech, characters in writing, words in manual sign. The reason is obvious, once you've thought about it, and exactly the same as the reason for the success of digital computers : without a certain tolerance, you could never get two people saying exactly the same thing. We form our words, letters, signs slightly differently, and there has to be an acceptable range into which we can all fit acceptably well and rather quickly if we are to communicate at conventional speeds. ( Which gets us back to the original problem. ) Restricting the range of interest to on-off channels is therefore a simplification, but it's not an oversimplification.

Coming back to the sorts of signal we have to handle, then, consider just one channel. If all you can do is switch an electric current on and off, how can you encode messages ? The only way is to make patterns of on and off periods in time. You can use periods of different lengths, and arrange them in any order you please, just so long as you can make symbols which you can consistently recognise and reproduce. In practice, unless we have some sort of external help with timing we seem to be able to distinguish between only about three periods - short, long, and very long. ( Yet another example of quantising a continuous variable ! ) Morse code is a prime example, for it is composed of combinations of marks and spaces which can be short, long, or very long.

If we use two channels, much the same considerations apply, but we can now use patterns on either channel, and we can use the relative timing of events on the two channels. Three channels give us even more, and so on. Again there are complications from quantisation. We have to distinguish between "simultaneous", "before", and "after" - but none of these can be interpreted as precisely as we would interpret them in physics.

So far, I've been working exclusively with Morse. It's a well known and well understood test case, and, as I pointed out, it has exactly the structure I've described. With single-key Morse, the timing of the intervals - both marks and spaces - is what makes the code, while with two-key transmissions the relative timing of the operations in the two channels is important. ( The only thing it doesn't have is analogue encoding, but there's plenty of work to be going on with, and I can wait a bit longer for the hard stuff ! ) I can't be accused of cheating by choosing an oversimplified example, because it's a real code, and real people - including quite a large number of people with various sorts of disability - really use it. There are therefore real examples from which I can get some idea of the characteristics of the signals produced, such as the variability both within one operator's keying and between different operators, which I need to ensure that my device will deal with realistic signals.

My approach so far has been centred on neural network techniques, because I think they will help with the flexible quantisation boundaries and with other floppy characteristics of human communication, such as the development of individual "handwriting". It is clear that neural networks alone won't solve the whole problem, but I still think that they - or something like them - have a useful role to play.

In the long term, I shall broaden out from Morse to include more general codes of the sorts I've described. Not everyone has the motor control to generate Morse reliably, or quickly enough to be reasonably recognisable. ( While short, long, and very long are relative, if the speed gets too slow everything just sounds very long. ) For the time being, though, and for the immediate future, Morse is just what I want.

Some students have worked on this topic :

More if you want it :

WORKING NOTES :

AC45 "An intelligent interface for the disabled" ( Some early notes );
AC78 "Some notes on a proposed interface device" ( The general-purpose interface device, and a sort of research plan ).

Go back to me ( Alan Creak, in case you've forgotten );
Go back to Computer Science.