XML API Documentation
Overview
- Quick Start
- API Overview
- Request Reference
- Response Reference
- Examples
- Posting a Request (Php & C#)
Quick Start
For now, it's only possible to create classifiers programatically via our API. The process can be described as
- Sign up to get your API keys.
- Use your favourite programming language to create classifiers by sending xml POST requests to
http://api.uclassify.com
- Send xml POST requests to classify texts from your web site or application.
Our step-by-step
tutorial will guide you in more detail.
API Overview
Our hope is to provide an easy-to-use API using xml requests. Let us know if this is not the case :) When a request is
done to the server, a response is returned. The response will tell you if the request was successful and return
whatever is relevant for your calls.
API url: http://api.uclassify.com (port 80)
Flexibility
The API is designed to handle multiple calls in each request; this means that you
are able to batch multiple texts to one or many classifiers in the same call. This is extremely powerful,
say if you want to classify 300 blog posts, you can send each in a textBase64 element, and in each classify
call specify a text to classifier mapping. This is done by indexing texts from classify calls. Also when
training this is useful, you can send a bulk of training texts to a classifier. Note that
requests to modify classifiers can only be done on one classifier per request.
Transactional behaviour
If one of the write calls goes wrong, all previous successful calls in the request are rollbacked,
meaning that the classifier is guaranteed to be in the same state as prior to processing the request.
This is to prevent leaving classifiers in an undefined state.
Restrictions
- Each xml request can at a maximum be 1MB.
- Every request must conform to our xml schema.
- All attributes that is of the type 'RestrictedString' is restricted to a-Z, space and 0-9.
Requests
Each request can contain an unlimited number of calls (as long as it follows the restrictions above).
Requests can consist of either read calls or write calls, but not both in the same request. At this
point the only read call available is the classify call. Write calls include create, remove, addClass,
removeClass, train and untrain. Each call has an id attribute, set this attribute to a unique string
that helps you identify it in the response e.g. (<create id="MyCreateCall">). Calls within the
readCalls and writeCalls element are executed sequentially.
Post xml encoded requests to the API url: http://api.uclassify.com (port 80)
Responses
All responses returns a status element, the status element has a boolean attribute called success and if this is true
the request went through without any trouble. If it's false something went wrong, in this case the status
element inner text will contain an error message (e.g. <status success="false" statusCode="3000">
The classifier
'MySpamClassifier' doesn't exist.</status>). A real error message will also indicate which call failed
using its unique id attribute. For now the only call that returns something more than the status is the classify call.
Training
Before a classifier can classify any texts it needs training. In the case of a spam classifier with two classes (spam and legitimate)
training would include feeding the spam class with known spam and the legitimate with known legitimate emails. This is called machine learning.
After the training period is over you can start classifying. More training gives better classification results. How much training is requires
differs largely on the domain, in some cases 10-20 documents per class is enough, while some require hundreds.
Encoding
It's important that the encoding (ASCII, UTF-8, Unicode, ANSI etc) of the training documents are the same as the documents being classified
(the classifier doesn't differentiate between the encodings internally). It's possible to mix encodings (requires more training data), however,
we recommend that you make sure that you train and classify on the same encoding. What encoding you use is up to you.
Xml Schemas and Namespaces
The API strictly follows an xml schema; any violation will result in an erroneous response indicating what you need to fix in the request.
Schemas validates the xml, that it follows a set of rules. By putting the xml into a namespace, it ensures that there are not datatype
conflicts, e.g. we can have many 'string' types as long as they are in different namespaces. The namespace is set in the xmlns attribute of the root element (uclassify),
We know (by experience) that namespaces can be confusing, please let us know if you think this is the case so we can improve the API.
Request Reference
Here is a reference of all elements
<uclassify>
This is the root element that holds information of what schema and API version to use.
Attributes
- xmlns - the namespace, http://api.uclassify.com/1/RequestSchema
- version - the version, currently 1.00 is the latest.
<texts>
The texts element specifies a list of <textBase64> that can be indexed from the calls (classify, train and untrain).
<textBase64>
This element contains a text string encoded in base64. The reason for using base64 encoded texts is to not break the xml
(e.g. with texts that has tags). Use
'
base64_encode'
to base64 encode strings in PHP and
System.Convert.ToBase64String' in C#.
Attributes
- id - a unique identifier for this text. (RestrictedString)
<readCalls>
Can hold one or more <classify> elements. Each classify element can reference different classifiers
(that the readApiKey can access) and texts. Each call (child) is executed in a sequential order.
Attributes
- readApiKey - a read API key that has access to the classifiers specified in classify calls. (RestrictedString)
<classify>
Classify sends a text to a classifier and returns a classification. The default behaviour is to access classifiers in the same
account as the readApiKey given in the readCalls parent element. You can access published classfieirs from other users (accounts)
by setting the optional username attribute to that classifiers author.
Attributes
- id - a unique name for this call. (RestrictedString)
- classifierName - the name of the classifier. (RestrictedString)
- textId - the id of the text. (RestrictedString)
- username (optional) - use this to classify with a published classifier in another account. Set this to the username of the classifiers creator. (RestrictedString)
Example with classify on own classifier
Example with classify on other users published classifier
<getInformation>
Gets information about a classifier. Right now it only returns the names of all classes.
Attributes
- id - a unique name for this call. (RestrictedString)
- classifierName - the name of the classifier. (RestrictedString)
Example
<writeCalls>
Write calls allow you to create and remove classifiers, add classes and remove classes, train and untrain. It can hold
one or more write call. Each call (child) is executed in a sequential order.
Attributes
- writeApiKey - a write API key that has access to the specified classifier. (RestrictedString)
- classifierName - the name of the classifier that you wish to modify or create. (RestrictedString)
<create>
Creates a classifier for the account associated with the write API key. The name of the new classifier is the
classifierName attribute set in the <writeCalls> element.
Attributes
- id - a unique name for this call. (RestrictedString)
Example
<remove>
Removes a classifier for the account associated with the write API key.
Attributes
- id - a unique name for this call. (RestrictedString)
Example
<addClass>
Adds a class to the classifier, a classifier can have an unlimited number of classes, however just having one class makes no sense.
Attributes
- id - a unique name for this call. (RestrictedString)
- className - the name of the class to add. (RestrictedString)
Example
<removeClass>
Removes a class to the classifier.
Attributes
- id - a unique name for this call. (RestrictedString)
- className - the name of the class to add. (RestrictedString)
Example
<train>
Trains the classifier on a text for a specified class.
Attributes
- id - a unique name for this call. (RestrictedString)
- className - the name of the class to train. (RestrictedString)
- textId - the id of the text to use (textId attribute of the textBase64 element). (RestrictedString)
Example
<untrain>
Untrains the classifier on a text for a specified class. The common usage for this call is to correct misstakes, if you
train a classifier on a text and then untrain it on the same text the classifier will returned to the previous state. For
example if a spam message incorrectly is trained as a legitimate - use this call to fix the misstake by untraining the spam
message from the legitimate class and then training it (train call) on the spam class.
Attributes
- id - a unique name for this call. (RestrictedString)
- className - the name of the class to untrain. (RestrictedString)
- textId - the id of the text to use (textId attribute of the textBase64 element). (RestrictedString)
Example
Response Reference
Responses are in xml, UTF-8 encoded. They tell you if your request was successful and in that case you will find return
values such as classifications. If anything went wrong prior to processing your calls the response will give you information why.
If an error occurred during processing of calls an error will point out exactly which call failed (using the unique id) and
provide an error message, such as "The classifier 'MySpamClassifier' is not found".
<status>
Examine this element to find out if the request was successful. If one of the calls failed no other response is given. If the attribute "success" is false, the error message is given as the elements text. The attribute statusCode will give you a
machine readable status code indicating what the error is.
Attributes
- success - true if the request was successful, false if not. (boolean)
- statusCode - a status code, 2000=OK, 4000=BAD_REQUEST, 4013=REQUEST_ENTITY_TOO_LARGE, 5000=INTERNAL_SERVER_ERROR, 5030=SERVICE_UNAVAILABLE (postiveInteger)
Example of successful call (a classify call would contain more in a successful response)
Example of unsuccessful call, the error shows the call that failed (the one with id 'ClassifyCall1')
<readCalls>
Holds a list of classify calls, each identifiable with the id element.
<classify>
Corresponds to one of the classify calls in the request. Contains a <classification> element that is a list of classes.
Attributes
- id - The unique id of the call - matches one of the calls in the request. (RestrictedString)
<classification>
Holds a list of <class> elements, each with a probability [0-1] of the classified document belonging to it. To find the most
probable class, use the one with highest probability.
<class>
This represents a class and the probability that the document belongs to it.
Attributes
- className - the name of the class. (RestrictedString)
- p - the probability that the text belongs to this class. (double)
Example of two successful classify calls to a spam classifier
<getInformation>
Corresponds to one of the getInformation calls in the request. Contains a <classes> element that is a list of classes with their information.
Attributes
- id - The unique id of the call - matches one of the calls in the request. (RestrictedString)
<classes>
Holds a list of <classInformation> elements, each with a name (more information will be added in the future).
<classInformation>
This represents a class and its information.
Attributes
- className - the name of the class. (RestrictedString)
Example of a successful getInformation call on a spam classifier
Note on write calls
At this point, no write call returns any value (void), meaning that you only need to check the status element.
Examples
Gender classifier step-by-step
Let's say that you want to create a web site that finds out whether or not a blog is written by a man or woman (see
example). The first steps you
take are
- Sign up and get an uClassify.com account.
- Log in and obtain your API keys, one for read and one for write operations.
Now you have an account and can use your write API key to programatically create classifiers. Before you create a
classifier, you must come up with a name for it. Let's call it 'ManOrWoman'.
- Create a script in your favourite language and send the following POST request to http://api.uclassify.com
That request simply creates a classifier named 'ManOrWoman' using the write call 'create'. The 'id' attribute is set to an arbitrary
identifier for that call. Each request sends back a xml
response, informing you
if the request was successful or not.
You can now log in to your account and be able to find a classifier called
'ManOrWoman' under the 'Classifiers' tab. The next thing to do is to add the possible classes. In our gender classifier there are two
classes, 'Man' and 'Woman'.
- Add the classes 'Man' and 'Woman' with this request
Before we can start using the classifier we need to train it. To do this we need collect training data for each class. In this case it
would be a bunch of texts written by men and another by women (can be found on blogspot for example). For simplicity of this example
we train it on one short message per class.
- Train the classes with texts written by men and women
Note that the texts have been base64 encoded so they don't break the xml. Of course it would need more training than two short texts
to give good results. You can repeat this process (automate it) or batch many texts in the same request. When training is done it's
time to start classifying using the read API key. Add a form to your web site where visitors can insert texts or links. When the form button
is pressed send a POST request to classify the text.
- Use the classifier to find out if a text is written by a man or woman
If this call is successful the xml
response will give you a probability of the text being written by a man or woman.
Creating a fantasy language classifier
This shows how to create a classifier that distinguishes text between the classes Klingon, Sindarin and Huttese. This is done in one request,
however it's possible to split it into one for each call, or any number. And of course it's possible (and necessary) to continue training the
classifier with more texts for each fantasy language.
Requests to train it further could look like this
When training is done, a classify request with two unclassified texts could look like this
Posting a Request (Php & C#)
Posting from Php
This example uses stream_context_create to post a message, another possible way is to use CURL
Posting from C#