This Recommendation specifies the service description and the requirements for speech-to-speech translation (S2ST) accomplished by connecting distributed S2ST modules all over the world through a network. This service provides S2ST that recognizes the speech in one language@ translates the recognized speech into another language@ and then synthesizes the translation into speech. People who speak different languages can communicate using this service. The applications and services using network-based S2ST technologies are characterized by the following components: ?C S2ST client: ? user client for speech/text input and output. ?C S2ST servers: ? speech recognition: speech is recognized and transcribed; ? machine translation: text in source language is translated into text in target language; ? speech synthesis: speech signal is created from text. ?C Communication protocol: ? communication protocol to connect user clients and the above S2ST servers. In order to extend the network-based S2ST to other modalities (e.g.@ sign language)@ a communication protocol is incorporated for modality conversion (MC)@ which converts single/multiple modality information to different single/multiple modality information. The communication protocol for MC needs to have an expandable structure. ?C Modality conversion markup language (MCML): ? XML schema that serves as a data description for data exchanged among modality conversion modules.