These are projects related to the Web, XML and more generally
semistructured data
The application areas for this work include e-business and e-learning.
My background is in databases, so I address the topics from a database perspective.
- Security of data on the Web. In the database world, it is possible to
define who can access each piece of data.
This project investigates whether it is possible and/or practical to apply
database security techniques to data on the web.
- The following 3 projects fit together so can be done together or individually.
- describe a graphical representation for DTD
- describe a graphical representation for XML-Schema
- describe a graphical representation for RELAX NG
How are these projects related?
If we have related graphical representations for all three XML validation
languages, it will allow us to compare the languages from an expressiveness
perspective. This is interesting because it would help users distinguish
between the languages and decide what their needs are.
Why is it useful to have graphical representations of the validation languages? If we consider the validation languages as schema definition languages and
wanted to integrate schemas, then it would allow us to view the schemas
graphically. It is easier for humans to compare graphical representations than
textual representations. If the same graphics were used in each description
then documents with different schemas could be integrated.
- optimize X-Query queries when applied to a nested relational database.
I am involved in a project with the National University of Singapore in which
we have defined a data model for semistructured data, called ORA-SS
(object relational model for semistructured data).
We have defined a mapping from XML via ORA-SS to object relational databases.
The challenge now if to map queries that are expressed in X-Query to queries
against the nested relational database, and further more to optimize the
queries against the database. We are likely to use Oracle.
- define an extended version of XML, or perhaps more than one extended
versions of XML. XML is a very simple language that is used for many purposes.
Currently it doesn't serve many of the purposes very well.
Perhaps extending it would make it more useful.
Perhaps extending it in some way will make it more useful for one purpose but
not another.
Identify the purposes that XML is used for (like data integration), and for
each purpose consider whether extending XML would make it better for that
purpose.
Once this is done, decide whether there are possible extensions that apply
across all purposes, in which case it would be useful to extend XML in those
directions. If the extensions are different for the different purposes, then
it may be useful to have different versions of XML for different purposes.
I suspect that much of the data that is missing is meta data, or data that
describes the semantics of the XML data. It may make more sense to store this
data using a different language for each purpose. What are different
possibilities and what are the advantages and disadvantages of each approach?
- do a case study where you translate data from a real world relational
database to XML. You will need to develop the mapping, or perhaps a set of
mappings from a relational database to XML, then apply them. Discuss which
mapping is most appropriate in different situations, and design an algorithm
or build a tool to do the mappings.
- build a data management tool for semistructured data.
Such a tool would have a diagram editor to depict ORA-SS diagrams, and a way
to map the data in the ORA-SS diagrams to XML and object relational databases
(like Oracle).
- build a document management system for the department course pages,
providing search facilities to students and others wanting to view the pages,
and update facilities to staff needing to update the pages.
- People need to integrate or merge information, e.g., when companies
merge, integrating engineering specifications, gathering information from
disparate web sites and displaying it on a common web site.
Usually the documents were not designed to be integrated so the same concepts
in different documents may have different names and/or different structure.
XML (eXtended Markup Language) and its associated schema definition languages
provide an excellent syntax for formatting the data but do not provide enough
information to make judgements on where the documents match, or what a
sensible common view of the data is.
The aim of this project is to build a system that provides a visual
representation of the documents or data to be integrated, does some automatic
matching where possible, allows the user to define other matches, and checks
whether the resulting common view is valid. The final output of the system
would be a common view of the documents. The system will be based on Java
and a semistructured data model called ORA-SS (Object-Relationship-Attribute
data model for semistructured data).
- Mining association rules in semistructured data.
There has been a lot of work on data mining done in the relational database
area. Some of this work has in turn been extended to the object relational
database area, e.g. some authors have shown how SQL3 can be used to mine
association rules. In this project, I would like to investigate if
XQuery is expressive enough to define the queries necessary when mining
association rules from semistructured data. If it isn't expressive enough,
what features must be added, and what features would it be nice to have?