This document describes the search capabilities provided by the itemFind and itemFindLayout methods.
The following sections describe the query syntax, the query options and the possible response formats. The document concludes with a large selection of example queries.
The queryXML parameter specifies a pattern which the results will match. Every visible item in the resource that matches the pattern will be present in the results. For example, a simple pattern would be:
which will match any item of the itemType 'Person'.
A more complicated pattern might be:
which would match all 'Person' items whose 'Age' attribute is over of 18.
The following sections explain the various elements that can be used to construct patterns, in particular filters (for selecting items based on their name, type or owner), conditions (for finding items based on attributes, tags, and fora), links (for picking items based on their connection to other items) filters and the boolean logic used to join the elements together.
Filters are narrow down the set of possible items that a pattern can match by restricting the item type, owner or name.
Four filters are available:
<ItemType>
matches items of a particular type
<ItemOwner>
matches items owned by a particular user type
<ItemName>
matches items with a specific name
<ItemID>
matches a specfic item, identified by its itemID
Filters are used like this:
which is a pattern to only match items owned by the user called Daphne, or
which is a pattern that matches a specific item.
A condition is pair of operands surrounding an operator, e.g. status = done or height > 10.
Operands can be the name of the item, attributes belonging to the item, tags attached to the item and so on. Operands can also be constant values, such as "10" or "'done'".
The following set of operators are supported:
The ordering operators (GreaterThan and so on) can be used with the Date attribute for simple calendar arithmetic, for example "Expiry Date < 2010-01-01".
The Matches and NotMatches opertors offer regular expression matching using the SQL syntax. The degree of support will depend on the database backend being used, but it is safe to assume that the _ character matches a single wildcard and the % character represents a wildcard sequence of length 0 or more.
Operands are either variables (i.e. values extracted from some part of an item, its tags or its fora) or constants supplied explicitly in the query.
Constant operands:
Variable operands:
Variable and contant operands can be used interchangably, for example, all of the following are valid:
<Attribute name="alpha"/> <Equals/> <Attribute name="beta"/>
<Number>3200</Number> <LessThan/> <Attribute name="frequency"/>
<Attribute name="frequency"/> <GreaterThan/> <Number>3200</Number>
Important note: the semantics of using NotEquals or NotMatches operators with the Tag are not entirely intuitive. If, for example, you ask for 'items which are not tagged x', what you are actually going to get is 'items which have a tag that is not x'. If there is an item with two tags x and y, it will be returned as part of the results as it does indeed have a tag that is not x - the fact that it also has the tag x is neither here nor there. This is an unwanted side-effect of the way that the queries are mapped into SQL for the back-end database to process - it may be resolved in a future version of the software.
Filters and conditions can be combined using the Boolean conjuctions And and Or elements, for example:
The And and Or constructs take two or more child Condition elements. The And condition is true if all of the child conditions are true, and the Or condition is true if any of its child conditions are true.
Links between items can be expressed as part of the pattern. For example, the following is a pattern which selects the set of items which are linked to a particular item:
any items which are linked to the hat identified by this ItemID will be included in the result of this search.
The contents of the Link element can be anything valid as a pattern, for instance:
will match any item which is linked to a suitably large enough hat.
In the previous examples, any possible links between items will be considered, for instance in the first example, the query engine will examine the schema to see which item types can link to "hat" and then it will scan the of items of those types looking for instances which link to the target item.
In many cases, this automatic selection of links is sufficient, but sometimes it will be useful to be able to identify a specific link to use. This is done by adding a name attribute to the Link element, for instance:
is a pattern which selects items that have a link called "has_author" which points at some item called "Jane Austen". Notice that the no item types are mentionned in this pattern. Assuming that there is only a single link called "has_author" in the schema then the query processor can easily discover which item types are involved. If there are multiple links with that name, then they will all be considered for checking. If you want to restrict the matching to a particular link, then additional rules, such as a filter on item type, can be added to the pattern.
Links can also be followed in reverse. This is done by specifying the origin item type of the link via a from attribute . For example, the patttern:
will match the author(s) of the book(s) with a matching item name.
Links can be nested. For example, if we assume that the schema has an item type called "shop" which has a link called "has_in_stock" which links to one or more "book" items, then this pattern:
will match any shops which are stocking copies of books by "Dan Brown".
Further example query patterns can be found at the end of this document.
The queryOptions parameter allows for arbitrary extra parameters to be provided for certain query types, and also allows for the control of pagination. It takes the form of a string containing zero or more lines containing "name=value" tuples, for instance:
queryOptions can be null or the empty string if no options are being set.
Lines should be seperated using a single '\n' (newline) character.
Pagination allows large result sets to be returned in managable chunks (or 'pages'). A client-specified page size selects how many results will appear and the start index determines which subset of the results will be returned.
Clients can step forwards and backwards through the results as they wish, but only one set of paginated results can be maintained at any one time. If a new query is made, then any previously paginated results will be discarded.
Pagination is not compatible any of the ITEM_GRAPH_ return types. Asking for paginated results for this return types will cause an expcetion to be thrown.
NOTE: Pagination is not implemented yet and requests for pagination of the results will be ignored.
Given a starting item, the query engine will search 'upwards' for parent nodes (i.e. nodes which link to the starting item) and then 'downwards' for child nodes (i.e. nodes which the starting item is linked to).
The maximum search depths are controlled using the queryOptions parameter.
The inbound depth refers to the number of parent nodes which are linked in to the starting node. This can be limited by setting the maxOutboundDepth value.
The outbound depth refers to the number of child nodes which to the starting node links out to. This can be limited by setting the maxInboundDepth value.
In each case, setting the value to zero or to some positive integer restricts the search to that depth. Setting the value to -1 (or indeed, any negative integer) asks for un unlimited depth.
The server will have a hard limit as to the amount of nodes that can appear in a graph. This value is set in the "omixed.properties" file that is part of the server deployment. If this value is exceeded during the search, then an exception will be thrown.
A user-specified maximum node limit can also be set using the maxNodes value. If this is greater than the server-side limit, then the server-side limit will be used instead.
These setting ask for a thorough graph search (i.e. the depths are unlimited) but limit the total number of possible nodes in the response to 500:
anything owned by user Fred
anything which has 3 or more tags
anything owned by user Fred and which has 3 or more tags
Persons which have 3 or more tags
items of a particular item type (i.e. Part) that are owned by a particular person (i.e. Jeremy)
items which are tall, or have been tagged as such
items with the attribute 'Date of Birth' set to before 1950:
anything linked to some particular item
anything linked to something born before 1950:
items linked (via a link called "Employee") to items born before 1950:
items which are linked to a specific 'Company' via a link called 'Operator':
items which are linked to a specific 'Train' via a link called 'Operator':
items of itemType 'Train' which are linked to a specific 'Company'
items of itemType 'Train' which are linked to a specific 'Company' via a link called 'Operator':
items linked to items linked to items born before 1950:
items with more friends than enemies
items whose name begins with 'B' which have the word 'dodgy' in one or more messages in a discussion forum:
items in which the "Date of Birth" attribute is from 2005
When calling itemFind, the returnType parameter is used to select what format the results will be returned in. The following return types are available:
The actual return type of the method is a string or an XML element, depending on how the in-use language library chooses to expose things. If it is a string, it will just be the serialised form of the corresponding XML element.
The response for ITEM_ID is a single ItemIDList element containing one or more itemIDs, each on a separate line.
There will be one itemID per line. ItemIDs are separated by a single '\n' (newline) character.
The above example contains extra CRs and whitespace for clarity. In reality, the first itemID occurs immediately after the end of the <ItemIDList> opening tag. Likewise there is no '\n' between the final itemID and the closing </ItemIDList>. There is no leading whitespace before itemIDs.
In pseudo-Java code, one way to parse an the ItemIDList is:
The response for FULL_ITEM_DETAILS is an ItemList element containing one or more Item child elements, like this:
<Item> elements are constructed as follows:
Links are represented by a <Link> element which gives the name of the link and the itemID of the item being linked to.
Attributes are represented by an <Attribute> element which identifies the name and type of the attribute and carries the value of the attribute as a PCDATA section.
Files are represented by an <File> element which identifies the name and fileID of the uploaded file as well as details about the file itself: the originalName of the file when it was uploaded, the sizeInBytes of the file and the format for the file, as specified when it was uploaded.
Tags are represented using <Tag> elements detailing the name of the tag and owner, the name of the user who added the tag. If the user making the request has permission to delete a tag, then the corresponding <Tag> will have a single <Deletable> child element.
Groups are arranged hierarchically using <Group> elements.
If there are any messages in the forum associated with the item, then a single <Forum> element will be present. This element will identify the forumID and information about the forum including: threadCount (the total number of threads), messageCount (the total number of messages), mostRecentPostTime (a timestamp for the last time a message was added) and mostRecentPoster (the name of the person who posted the most recent message).
If the user making the request has write permission for the item, then this will be indicated by the presence of a <Writable> element in the details for that item.
If the user making the request has permission to delete the item, then this will be indicated by the presence of a <Deletable> element in the details for that item.
The ordering of the child elements of the <Item> element is indeterminate, but <Group> hierarchies are guaranteed to match the layout of the item type in schema definition.
The response for SHORT_ITEM_DETAILS is essentially the same as for FULL_ITEM_DETAILS, except that less information about the items is presented. Specifically, the tag and discussion fora information is omitted. This makes it cheaper for the server to generate, leading to a faster response time.
The response is an ItemList element containing one or more Item child elements, like this:
Compared to the response from FULL_ITEM_DETAILS, the Item element does not have the Tag or Forum child elements. Otherwise, everything else is the same.
The response for ITEM_GRAPH_FULL_DETAILS is a single ItemGraph element containing an item graph of one or more connected Item elements.
For each of the items which matched, there will be an Item element in the response. This element will recursively include all of the items that that item is linked to, and each of those items will contain the items that they are linked to and so on.
Each of the Item elements is essentially the same as that returned for the FULL_ITEM_DETAILS return type.
The difference is that Link elements will be expanded, i.e. instead of having the itemID of the linked item, a full Item element will be present instead. This is most easily shown by example. Firstly, recall how a link is represented in the FULL_ITEM_DETAILS response:
Compare this with the same portion of from ITEM_GRAPH_FULL_DETAILS response, and notice that the link is now expanded into a nested Item element:
IMPORTANT - because item graphs can potentially include cycles (e.g. item A is linked to item B and item B is linked to item A) special handling is required to prevent infinite expansion of the results. If a node appears more than once in a graph, its children will only be expanded on the first occurrence. Subseqent appearances of the node will be represented as ItemRef elements instead of Item elements.
The following example response assumes that the schema has an item type called "Person", and that "Person" has a link called "Friend" which points to another "Person" item. Three items exist; "Alex", "Bob" and "Carol". The "Friend" links have been set up so that "Alex" is friends with "Bob", who is friends with "Carol" who in turn, is friends with "Alex". This means that a cycle exists. The following results for a search for "Alex" show how this is dealt with. The occurence of "Alex" in the "Friend" link for "Carol" is handled using ItemRef because the information for "Alex" has already been presented in the results.
The response for ITEM_GRAPH_SHORT_DETAILS is essentially the same as for ITEM_GRAPH_FULL_DETAILS, except that less information about the items is presented. Specifically, the tag and discussion fora information is omitted. This makes it cheaper for the server to generate (and produces slightly more compact results) leading to a faster response time.
The response for ITEM_GRAPH_MINMAL_DETAILS contains only the information about the item graph. No attribute, tag or fora information data is included. Likewise, the Writable and Deletable markers are not provided.
This format is considerably more compact and quick to generate than ITEM_GRAPH_SHORT_DETAILS.
In the minimal format, the example above would be represented as follows: