The tgwf XML schema

The tgwf schema is designed both to simplify the task of workflow authors not having to know the semantics of GridWorkflowDL or Petri Nets, which is far more complex, and to account for some specific requirements TextGrid workflows have. It will be transferred automatically by an XSLT stylesheet to GridWorkflowDL (see below).

In the following, we just reproduce the schema here (it has some documentation inline), and show an example tgwf document afterwards.

tgwf.xsd XML schema

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" xmlns:tgwf="http://textgrid.info/namespaces/middleware/workflow" targetNamespace="http://textgrid.info/namespaces/middleware/workflow">

<xs:annotation>

<xs:documentation>

Defines a simplified Workflow document in TextGrid. A tgwf

document written by the user will be completed by the

TextGridLab Workflow component, then xsl-transformed into a

GridWorkflowDL document which can processed by the GWES Workflow

Engine.

</xs:documentation>

</xs:annotation>


<xs:element name="tgwf">

<xs:complexType>

<xs:sequence>

<xs:element ref="tgwf:description"/>

<xs:element ref="tgwf:activities"/>

<xs:element ref="tgwf:datalinks"/>

<xs:element ref="tgwf:CRUD"/>

<xs:element ref="tgwf:batchinput"/>

<xs:element ref="tgwf:metadatatransformation"/>

<xs:element ref="tgwf:inputconstants"/>

</xs:sequence>

<xs:attribute name="version" use="required" type="xs:decimal" fixed="0.5"/>

</xs:complexType>

</xs:element>

<xs:element name="description" type="xs:string">

<xs:annotation>

<xs:documentation>

Description will not be processed and is solely for the

writer. The title of the workflow will be taken from the title

of the TextGridObject holding this tgwf document.

</xs:documentation>

</xs:annotation>

</xs:element>

<xs:element name="activities">

<xs:complexType>

<xs:sequence>

<xs:element maxOccurs="unbounded" minOccurs="0" ref="tgwf:service">

</xs:element>

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="service">

<xs:annotation>

<xs:documentation>

The services proper that will process the _contents_ of the

TGOs. All data is transferred SOAP-inline, base64-encoded, so

the services will have to be compatible. CRUDread and

CRUDcreate for Grid access and StreamingEditor for metadata

transformation will be inserted automatically.

</xs:documentation>

</xs:annotation>

<xs:complexType>

<xs:attribute name="description" use="required"/>

<xs:attribute name="name" use="required" type="xs:NCName">

<xs:annotation>

<xs:documentation>

name for visualisation of workflow

</xs:documentation>

</xs:annotation>

</xs:attribute>

<xs:attribute name="operation" use="required" type="xs:anyURI">

<xs:annotation>

<xs:documentation>

the operation to be invoked from this wsdl

</xs:documentation>

</xs:annotation>

</xs:attribute>

<xs:attribute name="serviceID" use="required" type="xs:NCName">

<xs:annotation>

<xs:documentation>

this ID will be used throughout this tgwf document to

refer to this service

</xs:documentation>

</xs:annotation>

</xs:attribute>

<xs:attribute name="targetNamespace" use="required" type="xs:anyURI">

<xs:annotation>

<xs:documentation>

If the WSDL specifies a targetNamespace, its value can be

given here.

</xs:documentation>

</xs:annotation>

</xs:attribute>

<xs:attribute name="usetns" type="xs:boolean">

<xs:annotation>

<xs:documentation>

set to true to tell the Workflow Engine that the message

parameters should be prepended the targetNamespace

given. Hint: set to true if the schema definition part in

the WSDL has elementFormDefault="qualified". If you

interact with a Web Service written in a

namespace-ignorant language (such as PHP, Python, Perl, or

Tcl), usetns will perhaps better be false.

</xs:documentation>

</xs:annotation>

</xs:attribute>

<xs:attribute name="wsdlLocation" use="required" type="xs:anyURI"/>

</xs:complexType>

</xs:element>

<xs:element name="datalinks">

<xs:annotation>

<xs:documentation>

Determine how data flows from one service to another,

i.e. which output parameter in fromService yields the data and

which input parameter in toService will receive them. Use

crud/batchinput for fromServiceID/fromParam when the link

should lead to toServices that should receive the data as read

from the Grid. Similarly, the fromService that will serve the

final data must have a link to crud/batchoutput. Cave:

consistency checks will not be made yet, so possibly the

workflow might fail or loop.

</xs:documentation>

</xs:annotation>

<xs:complexType>

<xs:sequence>

<xs:element maxOccurs="unbounded" minOccurs="1" ref="tgwf:link"/>

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="link">

<xs:complexType>

<xs:attribute name="linkID" use="required" type="xs:NCName"/>

<xs:attribute name="fromServiceID" use="required" type="xs:NCName">

<xs:annotation>

<xs:documentation>

the ServiceID as specified in the activities element for

the service that yields data

</xs:documentation>

</xs:annotation>

</xs:attribute>

<xs:attribute name="fromParam" use="required" type="xs:NCName">

<xs:annotation>

<xs:documentation>

the output parameter of the fromServiceID which serves the

data for this link

</xs:documentation>

</xs:annotation>

</xs:attribute>

<xs:attribute name="toServiceID" use="required" type="xs:NCName">

<xs:annotation>

<xs:documentation>

the ServiceID as specified in the activities element, of

the service that receives the data

</xs:documentation>

</xs:annotation>

</xs:attribute>

<xs:attribute name="toParam" use="required" type="xs:NCName">

<xs:annotation>

<xs:documentation>

the input parameter of the toServiceID which accepts

the data for this link

</xs:documentation>

</xs:annotation>

</xs:attribute>

</xs:complexType>

</xs:element>

<xs:element name="CRUD">

<xs:annotation>

<xs:documentation>

attribute values to be filled in automatically by the TextGridLab

</xs:documentation>

</xs:annotation>

<xs:complexType>

<xs:attribute name="instance" use="required" type="xs:string"/>

<xs:attribute name="logParameter" use="required" type="xs:string"/>

<xs:attribute name="sessionID" use="required" type="xs:string"/>

</xs:complexType>

</xs:element>

<xs:element name="batchinput">

<xs:annotation>

<xs:documentation>

input TextGridObject's URIs, to be filled in automatically by

the TextGridLab

</xs:documentation>

</xs:annotation>

<xs:complexType>

<xs:sequence>

<xs:element ref="tgwf:URI" maxOccurs="unbounded" minOccurs="0"/>

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="URI" type="xs:anyURI"/>

<xs:element name="metadatatransformation">

<xs:annotation>

<xs:documentation>

This contains the XSL stylesheet for rule-based transformation

of the metadata, e.g. setting a new ProjectID, appending text

to the title, or adding an editor. Please consult an example

stylesheet for the current TextGridMetadata if you plan to

write a new one.

</xs:documentation>

</xs:annotation>

<xs:complexType mixed="true">

<xs:sequence>

<xs:any processContents="lax" minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="inputconstants">

<xs:annotation>

<xs:documentation>

configuration parameters for the services used in this

workflow

</xs:documentation>

</xs:annotation>

<xs:complexType>

<xs:sequence>

<xs:element maxOccurs="unbounded" minOccurs="0" ref="tgwf:activity"/>

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="activity">

<xs:complexType>

<xs:sequence>

<xs:element ref="tgwf:const" maxOccurs="unbounded" minOccurs="1" />

</xs:sequence>

<xs:attribute name="serviceID" use="required" type="xs:NCName">

<xs:annotation>

<xs:documentation>

the ServiceID as specified in the activities element

</xs:documentation>

</xs:annotation>

</xs:attribute>

</xs:complexType>

</xs:element>

<xs:element name="const">

<xs:complexType mixed="true">

<xs:sequence>

<xs:any processContents="lax" minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence>

<xs:attribute name="name" use="required" type="xs:NCName">

<xs:annotation>

<xs:documentation>

the name of this input parameter

</xs:documentation>

</xs:annotation>

</xs:attribute>

<xs:attribute name="needsB64encoding" type="xs:boolean">

<xs:annotation>

<xs:documentation>

set to true if this parameter, as the content data, has

to be encoded in Base64 for the service

</xs:documentation>

</xs:annotation>

</xs:attribute>

</xs:complexType>

</xs:element>

</xs:schema>

Example tgwf document

This document defines a two-service pipe: TextGridObjects are being sent to the TextGrid Tokenizer, then to the Lemmatizer, then resulting TextGridObjects are being created. See figure XXX for a graphical representation of this workflow in GridWorkflowDL.

<?xml version="1.0" encoding="UTF-8"?>

<tgwf:tgwf xmlns:tgwf="http://textgrid.info/namespaces/middleware/workflow" version="0.5">

<tgwf:description>

Lemmatizer Workflow with prepended Tokenizer

</tgwf:description>

<tgwf:activities>

<tgwf:service description="TextGrid Tokenizer"

name="Tokenizer"

operation="Tokenizer64"

serviceID="tok"

targetNamespace="http://namespaces.textgrid.de/"

wsdlLocation="http://ingrid.sub.uni-goettingen.de/Tokenizer.wsdl"/>

<tgwf:service operation="LemmatizerTEIBatch64"

wsdlLocation="http://ingrid.sub.uni-goettingen.de/lemmatizer_doc.wsdl"

name="Lemmatizer"

description="The TextGrid New German Lemmatizer"

serviceID="lem"

targetNamespace="http://namespaces.textgrid.de/"/>

</tgwf:activities>

<tgwf:datalinks>

<tgwf:link linkID="read" fromServiceID="crud" fromParam="batchinput"

toServiceID="tok" toParam="indata"/>

<tgwf:link linkID="Tok2Lem" fromServiceID="tok" fromParam="outdata"

toServiceID="lem" toParam="infile" />

<tgwf:link linkID="write" fromServiceID="lem" fromParam="outfile"

toServiceID="crud" toParam="batchoutput"/>

</tgwf:datalinks>

<tgwf:CRUD instance="inserted automatically"

sessionID="inserted automatically"

logParameter="inserted automatically"/>

<tgwf:batchinput/>

<tgwf:metadatatransformation>

<xsl:transform> ... </xsl:transform>

</tgwf:metadatatransformation>

<tgwf:inputconstants>

<tgwf:activity serviceID="tok">

<tgwf:const name="config" needsB64encoding="true">

<TokenizerConfig>...</TokenizerConfig>

</tgwf:const>

</tgwf:activity>

<tgwf:activity serviceID="lem">...</tgwf:activity>

</tgwf:inputconstants>

</tgwf:tgwf>