XML API Documentation

Overview

  1. Quick Start
  2. API Overview
  3. Request Reference
  4. Response Reference
  5. Examples
  6. Posting a Request (Php & C#)

Quick Start

For now, it's only possible to create classifiers programatically via our API. The process can be described as
  1. Sign up to get your API keys.
  2. Use your favourite programming language to create classifiers by sending xml POST requests to http://api.uclassify.com
  3. Send xml POST requests to classify texts from your web site or application.
Our step-by-step tutorial will guide you in more detail.

API Overview

Our hope is to provide an easy-to-use API using xml requests. Let us know if this is not the case :) When a request is done to the server, a response is returned. The response will tell you if the request was successful and return whatever is relevant for your calls.

API url: http://api.uclassify.com (port 80)

Flexibility

The API is designed to handle multiple calls in each request; this means that you are able to batch multiple texts to one or many classifiers in the same call. This is extremely powerful, say if you want to classify 300 blog posts, you can send each in a textBase64 element, and in each classify call specify a text to classifier mapping. This is done by indexing texts from classify calls. Also when training this is useful, you can send a bulk of training texts to a classifier. Note that requests to modify classifiers can only be done on one classifier per request.

Transactional behaviour

If one of the write calls goes wrong, all previous successful calls in the request are rollbacked, meaning that the classifier is guaranteed to be in the same state as prior to processing the request. This is to prevent leaving classifiers in an undefined state.

Restrictions

  • Each xml request can at a maximum be 1MB.
  • Every request must conform to our xml schema.
  • All attributes that is of the type 'RestrictedString' is restricted to a-Z, space and 0-9.

Requests

Each request can contain an unlimited number of calls (as long as it follows the restrictions above). Requests can consist of either read calls or write calls, but not both in the same request. At this point the only read call available is the classify call. Write calls include create, remove, addClass, removeClass, train and untrain. Each call has an id attribute, set this attribute to a unique string that helps you identify it in the response e.g. (<create id="MyCreateCall">). Calls within the readCalls and writeCalls element are executed sequentially.

Post xml encoded requests to the API url: http://api.uclassify.com (port 80)

Responses

All responses returns a status element, the status element has a boolean attribute called success and if this is true the request went through without any trouble. If it's false something went wrong, in this case the status element inner text will contain an error message (e.g. <status success="false" statusCode="3000">The classifier 'MySpamClassifier' doesn't exist.</status>). A real error message will also indicate which call failed using its unique id attribute. For now the only call that returns something more than the status is the classify call.

Training

Before a classifier can classify any texts it needs training. In the case of a spam classifier with two classes (spam and legitimate) training would include feeding the spam class with known spam and the legitimate with known legitimate emails. This is called machine learning. After the training period is over you can start classifying. More training gives better classification results. How much training is requires differs largely on the domain, in some cases 10-20 documents per class is enough, while some require hundreds.

Encoding

It's important that the encoding (ASCII, UTF-8, Unicode, ANSI etc) of the training documents are the same as the documents being classified (the classifier doesn't differentiate between the encodings internally). It's possible to mix encodings (requires more training data), however, we recommend that you make sure that you train and classify on the same encoding. What encoding you use is up to you.

Xml Schemas and Namespaces

The API strictly follows an xml schema; any violation will result in an erroneous response indicating what you need to fix in the request. Schemas validates the xml, that it follows a set of rules. By putting the xml into a namespace, it ensures that there are not datatype conflicts, e.g. we can have many 'string' types as long as they are in different namespaces. The namespace is set in the xmlns attribute of the root element (uclassify), We know (by experience) that namespaces can be confusing, please let us know if you think this is the case so we can improve the API.

API Versions

When something changes in the API a new version is added in order to maintain compatibility with current API users. This means that users explicitly need to bump their request versions to use the latest API version. The version is specified as an attribute in the <uclassify>; tag.
Version 1.01 added the textCoverage score.

Request Reference

Here is a reference of all elements

<uclassify>

This is the root element that holds information of what schema and API version to use.

Attributes
  • xmlns - the namespace, http://api.uclassify.com/1/RequestSchema
  • version - the version, currently 1.01 is the latest.

<texts>

The texts element specifies a list of <textBase64> that can be indexed from the calls (classify, train and untrain).

<textBase64>

This element contains a text string encoded in base64. The reason for using base64 encoded texts is to not break the xml (e.g. with texts that has tags). Use 'base64_encode' to base64 encode strings in PHP and System.Convert.ToBase64String' in C#.

Attributes
  • id - a unique identifier for this text. (RestrictedString)

<readCalls>

Can hold one or more <classify> elements. Each classify element can reference different classifiers (that the readApiKey can access) and texts. Each call (child) is executed in a sequential order.

Attributes
  • readApiKey - a read API key that has access to the classifiers specified in classify calls. (RestrictedString)

<classify>

Classify sends a text to a classifier and returns a classification. The default behaviour is to access classifiers in the same account as the readApiKey given in the readCalls parent element. You can access published classfieirs from other users (accounts) by setting the optional username attribute to that classifiers author.

Attributes
  • id - a unique name for this call. (RestrictedString)
  • classifierName - the name of the classifier. (RestrictedString)
  • textId - the id of the text. (RestrictedString)
  • username (optional) - use this to classify with a published classifier in another account. Set this to the username of the classifiers creator. (RestrictedString)
Example with classify on own classifier Example with classify on other users published classifier

<classifyKeywords>

ClassifyKeywords sends a text to a classifier and returns a classification and relevant keywords for each class. If you need keywords for a text you should use classifyKeywords instead of classify (don't use both for the same text). The call itself uses the exact same arguments as the classify call.

Attributes
  • id - a unique name for this call. (RestrictedString)
  • classifierName - the name of the classifier. (RestrictedString)
  • textId - the id of the text. (RestrictedString)
  • username (optional) - use this to classify with a published classifier in another account. Set this to the username of the classifiers creator. (RestrictedString)
Example with classifyKeywords on own classifier

<getInformation>

Gets information about a classifier. Right now it only returns the names of all classes.

Attributes
  • id - a unique name for this call. (RestrictedString)
  • classifierName - the name of the classifier. (RestrictedString)
Example

<writeCalls>

Write calls allow you to create and remove classifiers, add classes and remove classes, train and untrain. It can hold one or more write call. Each call (child) is executed in a sequential order.

Attributes
  • writeApiKey - a write API key that has access to the specified classifier. (RestrictedString)
  • classifierName - the name of the classifier that you wish to modify or create. (RestrictedString)

<create>

Creates a classifier for the account associated with the write API key. The name of the new classifier is the classifierName attribute set in the <writeCalls> element.

Attributes
  • id - a unique name for this call. (RestrictedString)
Example

<remove>

Removes a classifier for the account associated with the write API key.

Attributes
  • id - a unique name for this call. (RestrictedString)
Example

<addClass>

Adds a class to the classifier, a classifier can have an unlimited number of classes, however just having one class makes no sense.

Attributes
  • id - a unique name for this call. (RestrictedString)
  • className - the name of the class to add. (RestrictedString)
Example

<removeClass>

Removes a class to the classifier.

Attributes
  • id - a unique name for this call. (RestrictedString)
  • className - the name of the class to add. (RestrictedString)
Example

<train>

Trains the classifier on a text for a specified class.

Attributes
  • id - a unique name for this call. (RestrictedString)
  • className - the name of the class to train. (RestrictedString)
  • textId - the id of the text to use (textId attribute of the textBase64 element). (RestrictedString)
Example

<untrain>

Untrains the classifier on a text for a specified class. The common usage for this call is to correct misstakes, if you train a classifier on a text and then untrain it on the same text the classifier will returned to the previous state. For example if a spam message incorrectly is trained as a legitimate - use this call to fix the misstake by untraining the spam message from the legitimate class and then training it (train call) on the spam class.

Attributes
  • id - a unique name for this call. (RestrictedString)
  • className - the name of the class to untrain. (RestrictedString)
  • textId - the id of the text to use (textId attribute of the textBase64 element). (RestrictedString)
Example

Response Reference

Responses are in xml, UTF-8 encoded. They tell you if your request was successful and in that case you will find return values such as classifications. If anything went wrong prior to processing your calls the response will give you information why. If an error occurred during processing of calls an error will point out exactly which call failed (using the unique id) and provide an error message, such as "The classifier 'MySpamClassifier' is not found".

<status>

Examine this element to find out if the request was successful. If one of the calls failed no other response is given. If the attribute "success" is false, the error message is given as the elements text. The attribute statusCode will give you a machine readable status code indicating what the error is.

Attributes
  • success - true if the request was successful, false if not. (boolean)
  • statusCode - a status code, 2000=OK, 4000=BAD_REQUEST, 4013=REQUEST_ENTITY_TOO_LARGE, 5000=INTERNAL_SERVER_ERROR, 5030=SERVICE_UNAVAILABLE (postiveInteger)

Example of successful call (a classify call would contain more in a successful response)

Example of unsuccessful call, the error shows the call that failed (the one with id 'ClassifyCall1')

<readCalls>

Holds a list of classify calls, each identifiable with the id element.

<classify>

Corresponds to one of the classify calls in the request. Contains a <classification> element that is a list of classes.

Attributes
  • id - The unique id of the call - matches one of the calls in the request. (RestrictedString)

<classification>

Holds a list of <class> elements, each with a probability [0-1] of the classified document belonging to it. To find the most probable class, use the one with highest probability.

Attributes
  • textCoverage (1.01) - propotion of the classified text that was found in the training data. 1 means that every word was used, 0 that no words were used. Useful to determine the relevance of the classification. Available from api version 1.01. (double)

<class>

This represents a class and the probability that the document belongs to it.

Attributes
  • className - the name of the class. (RestrictedString)
  • p - the probability that the text belongs to this class. (double)

Example of two successful classify calls to a spam classifier

<classifyKeywords>

Same structure as the classify response but with an additional 'keywords' tag with a list of class elements. Those elements are filled with the space separated keywords. Non XML compilant keywords are not included.

Attributes
  • id - The unique id of the call - matches one of the calls in the request. (RestrictedString)

<classification>

See classify response.

<keywords>

This is a list of class elements. Each class element contains a space separated string of relevant keywords.

<class>

This represents a class and a subset of relevant space separated keywords.

Attributes
  • className - the name of the class. (RestrictedString)

Example of a successful classifyKeywords call to a spam classifier

<getInformation>

Corresponds to one of the getInformation calls in the request. Contains a <classes> element that is a list of classes with their information.

Attributes
  • id - The unique id of the call - matches one of the calls in the request. (RestrictedString)

<classes>

Holds a list of <classInformation> elements, each with a name (more information will be added in the future).

<classInformation>

This represents a class and its information.

Attributes
  • className - the name of the class. (RestrictedString)

Example of a successful getInformation call on a spam classifier

Note on write calls

At this point, no write call returns any value (void), meaning that you only need to check the status element.

Examples

Gender classifier step-by-step

Let's say that you want to create a web site that finds out whether or not a blog is written by a man or woman (see example). The first steps you take are
  1. Sign up and get an uClassify.com account.
  2. Log in and obtain your API keys, one for read and one for write operations.
Now you have an account and can use your write API key to programatically create classifiers. Before you create a classifier, you must come up with a name for it. Let's call it 'ManOrWoman'.
  1. Create a script in your favourite language and send the following POST request to http://api.uclassify.com
That request simply creates a classifier named 'ManOrWoman' using the write call 'create'. The 'id' attribute is set to an arbitrary identifier for that call. Each request sends back a xml response, informing you if the request was successful or not. You can now log in to your account and be able to find a classifier called 'ManOrWoman' under the 'Classifiers' tab. The next thing to do is to add the possible classes. In our gender classifier there are two classes, 'Man' and 'Woman'.
  1. Add the classes 'Man' and 'Woman' with this request
Before we can start using the classifier we need to train it. To do this we need collect training data for each class. In this case it would be a bunch of texts written by men and another by women (can be found on blogspot for example). For simplicity of this example we train it on one short message per class.
  1. Train the classes with texts written by men and women
Note that the texts have been base64 encoded so they don't break the xml. Of course it would need more training than two short texts to give good results. You can repeat this process (automate it) or batch many texts in the same request. When training is done it's time to start classifying using the read API key. Add a form to your web site where visitors can insert texts or links. When the form button is pressed send a POST request to classify the text.
  1. Use the classifier to find out if a text is written by a man or woman
If this call is successful the xml response will give you a probability of the text being written by a man or woman.

Creating a fantasy language classifier

This shows how to create a classifier that distinguishes text between the classes Klingon, Sindarin and Huttese. This is done in one request, however it's possible to split it into one for each call, or any number. And of course it's possible (and necessary) to continue training the classifier with more texts for each fantasy language.

Requests to train it further could look like this

When training is done, a classify request with two unclassified texts could look like this


Posting a Request (Php & C#)

Posting from Php

This example uses stream_context_create to post a message, another possible way is to use CURL

Posting from C#