Procedures using text to speech application on a mobile device generally at this time is user must manually enter the word to be actualized in speech. In this study, designed a words input system for text to speech application using digital image processing. This system makes users simply to do the words capturing that will be voiced without manually typing in the text area input.The method used in this system includes image acquisition, image pre-processing, character segmentation, character recognition, and integration with text to speech engine on mobile devices. Image acquisition was performed using the camera on a mobile device to capture the word to be entered. Character recognition using back propagation algorithm. Image processing system successfully created and then integrated with Google Text to Speech engine.Character recognition system in this study using a model of neural networks (ANN) with an accuracy of 97.58%. The system is able to recognize some types of font that is Arial, Calibri, and Verdana. The mean recognition accuracy on the test sample used in this study 94.7% with distance shooting conditions within the range 3-8 cm and the camera upright position facing the letter.