1. universal. Several Hundreds of these are in

Abstract sign language is 
language which uses visually exhibited sign patterns to define by
simultaneously combining hand shapes, orientation and movement of the hands,
arms or body, and facial expressions to fluently express one’s thoughts or else
to communicate with others and is usually used by the physically impaired
people who are physically challenged . 
Automatic Sign Language system needs faster and accurate methods for
identifying static signs or a sequence of produced signs to help interpret
their appropriate meaning. Major components of 
Sign Languages are Hand Gesture. In this paper, a robust approach for
recognition of bare handed sign language which is static is presented, using a
novel combination of features. These include Local Binary Patterns histogram
features based on colour and depth information, and also geometric features of
hand. Linear binary Support Vector Machine classifiers are used for
recognition, coupled with template matching in the case of multiple matches.
The research aims working on hand gesture recognition for sign language
interpretation as a Human Computer Interaction application.


keywords—Indian Sign Language,
Support Vector Machine, Linear Discriminant Analysis , and Local Binary Pattern .



Sign language is a language used by physically impaired persons. It
is language which uses hand gestures to convey the appropriate meaning,
opposite to that of acoustically conveyed sound patterns. It is analogous to
spoken languages and it is a reason why linguists consider it to be one of the
natural languages, but there are also some 
notable variations form spoken languages . Though sign language is used
over the globe, it is not universal. Several Hundreds of these are in use
,which vary from place to place and are at the core of local deaf cultures.
Some sign languages have achieved recognition legally, while some have nothing.
Regionally American Sign Language , 
German Sign Language ,French Sign Language , British Sign Language,
Indian Sign Language, etc. have been evolved.

     Indian Sign Language is
one of the oldest known sign languages and is considered extremely important in
its history but is rarely used nowadays. In linguistic notations, in spite of
the common gossip that they are not real languages, sign languages are as rich
and complex as any spoken language. Study on these languages by professional
linguists  found that many sign languages
exhibit the basic properties of all the spoken languages . The elements of a
sign are Hand shape, or palm Orientation, Movement , and facial Expression
summarized in the acronym HOLME. The core concept behind the method proposed is
to exploit a novel combination of color, depth, and geometric information
of  hand sign to increase recognition
performance while most approaches only attempt to use a combination of two or
less . This enables to recognise a vast range of signs though they appear to be
very common.


1.  Overview of the proposed hand pose
recognition system.


face a very challenging problem of improvising a vision based human computer
interaction system for interpretating sign languages . This survey conveys
theoretical and literature foundation . The researches based on sign languages
and the challenges faced are reviewed . Some of the problems that spoken and
written language of a country is differs from other countries. The syntax and
semantics of a language is varies from one region to another in spite of the
fact that same language has been used by several countries. For instance,
English is the official language of many nations including the UK, the USA. The
usage of English differs at country level. Also the sign language also varies
from one country to another.

The focus of this survey is on improvisation of sign languages at
global level . Earlier, to obtain data for SLI, data gloves and also
accelerometers were used for specification of hand. Orientation and velocity,in
addition to location were measured using tracker and/ or data gloves. These
methods gave exact positions, but they had the disadvantage of high cost and
restricted movements, which changed the signs.These disadvantages made vision
based systems come into screens and gain popularity . Sequence of images are
captured from a combination of cameras ,as the input of  vision based systems. Monocular, stereo and/
or orthogonal cameras are used to capture a sequence of images. External light
sources were used to illuminate the scene and also a multi-view geometry to
construct deeper image by Feris and team.

 Proposals of the advances in the concepts of
hybrid classification architectures with the consideration of hand gesture and
face recognition was done by Xiaolong Zhu and team. They built the
hybrid architecture by the use of  ensemble of connectionist networks- radial
basis functions and inductive decision trees, which helps in the combination of
merits of holistic template matching with abstractive matching using discrete
properties and subject to both positive and negative learning. Investigation of
effective body gesture in video sequences beyond facial reactions was done by C. Huang and team. Proposal to fuse body gesture and facial expressions at
the feature level using Canonical Correlation Analysis was given by them. An
integration of hand gesture and face recognition was proposed by Z. Ren and team. They argued that face recognition rate could be better by
recognition of hand gestures. They have proposed security lift scenario. They
made it clear that the combination of two search engines that they proposed is
generic and it is not shrunken to face and hand gesture recognition purposes



In a sign language , a
sign consists of three main parts which include manual features, non-manual
features and finger spelling . For
the interpretation of the meaning of a sign, analysis of all these parameters are
to be done simultaneously. Sign language poses an important challenge of being
multichannel. Every channel in the system is separately built , analysed and
the corresponding outputs are combined at the final level to come to a conclusion.

The research in Sign Language Interpretation started with Hand
Gesture Recognition. Hand gestures are most commonly used in human non-verbal
communication by hearing impaired and speech impaired persons. Sometimes normal
people too use sign languages for communicating. But still sign language is not
universal. Sign languages do exist in places where hearing impaired people live.
To make communication between them and normal people simple and effective, it
is essential that this process might be automated. Number of methodologies have
been developed for automating HGR. The overall process of Hand Gesture
Recognition system is shown as block diagram in figure 2. There are three
similar steps in HGR:

Hand acquisition that deals
with hand extraction from a given static image and tracking and hand extraction
from a video.

Feature extraction that deals
with compressed representation of data that will enable the recognition of the
hand gesture.

Classification/ recognition of
the hand gesture following some rules.

Fig. 2.  Block Diagram for Process of Hand Gesture Recognition






 Two different data sets are made use in ISL
recognition system in this survey.  The
data sets are ISL digits (0-9) and single handed ISL alphabets (A-Z). For the
purpose of data set acquisition, dark background for uniformity and easy in
manipulation of images for feature extraction and division is preferred. A
digital camera, Cyber shot H70, is used for capturing images. All the images
are captured with flash light in an intelligent auto mode. The usual file format
JPEG is used to capture images. Each original image is 4608×3456 pixel and
requires roughly about 5.5 MB storage space. To create an efficient data set
with a reasonable size, the images are cropped to 200×300 RGB pixels and barely
25 KB memory space is required per image. The data set is collected from 100
signers. Out of these signers, 69 are male and 31 are female with average age
group of 27. The average height of a signer is about 66 inches. The data set
contains isolated ISL numerical signs (0-9). Five images per ISL digit sign is
captured from each signer. Therefore, a total of 5,000 images are available in
the data set. The sample images of the data set are shown in figure 3.


3. The ISL Digit Signs Data set

In the data set,
totally 2600 images cropped to 200×300 RGB pixel sizes are available. The
images are collected from four males and six females. The backgrounds of sign
images are dark, as only hand orientations are required for the feature
extraction process. The images are stored in JPEG format because it can be
easily exported and manipulated in various software and hardware environments.
Each preprocessed ISL sign image required nearly 25 KB space for storage with
72 dpi. The size of the images is 200×300 pixels. The skin colors of these images
are neither very dark complexion nor very white complexion. This is due to the
reason that the application is proposed on consideration of Indian subcontinent
only. The colors corresponding to human skins are mainly used in capturing the
sign images. The sample data set is shown in figure 4.


Fig. 4. The ISL Single
Handed Alphabet Signs Data Sets




 To detect hand from background, Segmentation
is used . The experimentation in this work is carried out using two datasets
conveying hand gestures performed with one hand for alphabets A to Z using
Indian Sign Language. The images of this dataset before and after preprocessing
stage are shown in figure 5.    



                  (a)                                            (b)

Figure 5. (a) Original images of alphabet ‘A’,
(b) Images after RGB to Gray conversion and resizing.


 Linear Discriminant Analysis (LDA)

The Linear Discriminant Analysis (LDA) is used
to perform class speci?c dimension reduction. It ?nds the combination that best
separates different classes. To ?nd the class separation, LDA maximizes both
between class and within class scatters instead of maximizing the overall
scatter. As a result, same class members group together and different class
members stay far away from each other in the lower dimensions. Let, X­ be a vector with samples from c classes.

 Let, X  be a vector with samples from c classes.



The between class and within class scatters, SB and SW

are calculated as follows.

Fig.6 Example of
LBP code generation


Mean of vector data and mean of the class i,
where i = 1; :::; c.

LDA ?nds a projection, W that maximizes the
class separation criterion.

The rank of SW is at most (N   c), where c is the number of classes and N
is the number of samples . Most of the time the number of samples is less than
the dimension of the image data in pixels. Principal Component Analysis (PCA)
is performed on the image data and projected on a (N to c) dimensional space.
LDA is performed on this reduced data. The transformation matrix, W projecting
the sample in to

(c   1)
dimensional space is,


 Local Binary Pattern (LBP)

Ong and team proposed the Local
Binary Patterns (LBP). It performs local operations on the neighborhood of an
image pixel. The neighborhood of a pixel is the pixel adjacent to a particular
pixel. In LBP an 8 bit binary code is for a 3 X 3 pixel neighborhood of image I is,

Fig.7 Example of gesture model generation using LDA features

Local Binary Pattern (LBP)
was proved to be very efficient means for image representation and have been
applied in various analysis. The LBPs are tolerant against monotonic
illumination changes and are able to detect various texture primitives like
corner, line end, spot, edge, etc. The most popular and efficient version of
LBP i.e. Block LBP (figure 8) with uniform / no uniform patterns is used as the
first methodology for the extraction of hand features .

Figure. 8. Local binary patterns histogram
generation for color image: output in order from (a) to (e).

(a) color image

(b) gray scale version of color image

(c) LBP image representing color
information (LBP colour)

(d) LBP colour divided into 16 regions

(e) concatenated histogram
output for LBP colour (color feature vector ?c).


feature extraction  approaches in image
processing, acquires valuable information present in an image.  This deals with conversion of high dimensional
data space into lower dimensional data space. The lower dimensional data
extracted from the images should be containing accurate and precise information
which is the representation of the actual image.  The image can be reconstructed from the lower
dimensional data space. The lower dimensional data is required as input to any
classification methodology as it is not possible to process higher dimensional
data with accuracy and speed. The inputs to an 
automatic  sign  language recognition  system 
are  either static signs  (images) 
or  dynamic  signs 
(video frames) . In order to divide input signs in an automatic sign
language recognition system, acquisiton of valuable features from signs is
required. All the algorithms that are used for facial feature extraction are
used for Hand feature extraction as well.

            Classification is an essential part of
machine learning. The technique is used to classify each item in a data set
into one of a predefined set of groups. Classification methods use mathematical
models including decision trees decision trees, linear programming, neural
networks and statistics for pattern classification. In classification, a
software module is created that shall learn the art of dividing the data items
into different groups. With initial experimentation using multiclass SVM and
decision trees, a huge number of misfits have been identified in the process of
classification. Hence these classifiers are not further used for final
experimentation towards recognition.

During SVM classification, if more than one
sign returns a positive match for a test image pair, the template matching
process is executed. At first, the test image pair is checked with all the
signs which returned a positive match if it falls within the range of height to
width ratios of that sign defined by rmin and rmax. If the range of ratios of a sign does not fall into that of the
test image pair, the sign will not be considered as a positive match in the
subsequent template matching steps. The cosine distance d cosine is then
calculated between the feature vector ? of the test image pair and the average
feature vector ? avg of each sign that returned a positive match. An edge
template similarity metric sedge is also calculated . Here a bitwise AND
operation is done between the edge template of the test image pair Xtest and
the edge template of each sign that returned a positive match Xsign. The sum of
the number of white pixels in the resulting image is considered to edge. In
spite of the image pairs being in  different sizes, the resizing of the edge
template into a standard size allows a direct bitwise AND operation to be

The total
similarity metric stot is then defined according to (4). Here ? = 0.001 and ? = 1.2 were
chosen as it produced optimum results. The sign for which the similarity metric
stot returns a maximum will be considered as the final output sign.


Figure 9. Output of ISL Digits Produced by the System

26 classes are present in ISL single handed alphabet, the system is able to
predict single handed characters with more than 95% accuracy. This is possible
with LBP and SVM feature extraction technique. A sample output is shown for
single handed ISL sign ‘B’ in the fig 9. The input sign image is processed
through the system and a prediction is shown in the right-hand side of the
output screen. The sign interpreted as single handed ‘B’ which is the correct

 Performance of
Sign Language Interpretation System

For sign language
interpretation, N-fold cross validation method was used with the N = 5. For a
single hand (left or right) each fold is consisting of 200 images. The system
is trained using 800 images from four of the ?ve folds and tested against the
remaining fold of 200 images. For both hands, each fold has 400 images. The
system is trained using 1600 images from four of ?ve folds and tested against
the remaining fold of 400 images. The system was tested under three criteria
that are Sign gestures performed by left, right and both hands. The accuracy of
all these criteria is measured using the following condition, where NC is the
number correctly classi?er sign gestures and N is the number of all test sign

Table 1. comparison with
other work using dataset














An overall
accuracy of 92.14% was obtained with a relatively small training dataset. It
could be seen that the system managed well with the variation of individual
signs caused by different users as well as the similarity that exist among
different signs.




 A vision-based automatic sign language recognition
which enables to recognize sentences in Indian Sign Language was presented in
this work. Several features and different methods to join them were investigated
and experimentally carried out. Tracking algorithms with applications to hand
and head tracking were presented and experiments were carried out to determine
the parameters of these algorithms. An emphasis was put on appearance-based
features that use the images itself to represent signs. Other systems for
automatic sign language recognition usually require a segmentation of input
images to calculate features for the segmented image parts. The algorithm was
designed to function in real-time without requiring excessive computational
power. The results reveal that it is possible to train the system to recognize
more static Indian Sign Language hand signs while maintaining high accuracy. It
is also feasible to build on the framework to recognize dynamic sign language.
Future depth sensor technology with higher depth and higher colour resolution and
more accurate skeletal tracking has the potential to improve the results of the
proposed algorithm to a greater extennt. The results conveyed in this work show
that the usage of appearance-based features yields a promising recognition