uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Suffix trees on words
Uppsala University, Teknisk-naturvetenskapliga vetenskapsområdet, Mathematics and Computer Science, Department of Information Technology. Faculty of Science and Technology, Biology, Department of Ecology and Evolution, Computing Science.
1999 (English)In: Algorithmica, Vol. 23Article in journal (Refereed) Published
Abstract [en]

We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multi-character substrings or words. This word suffix tree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimiters, whose definition depends on the application.

Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In general, construction cost is Omega(n) due to the need of scanning the entire input. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions.

Furthermore, when the alphabet is small, we may assume that the $n$ characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.

Place, publisher, year, edition, pages
1999. Vol. 23
Keyword [en]
suffix trees, algorithms. data structures
URN: urn:nbn:se:uu:diva-25787OAI: oai:DiVA.org:uu-25787DiVA: diva2:53561
Available from: 2007-02-13 Created: 2007-02-13 Last updated: 2011-01-14

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Andersson, Arne
By organisation
Department of Information TechnologyComputing Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 277 hits
ReferencesLink to record
Permanent link

Direct link