Oracle Text russian morphology (russian with Oracle Text 11g)

And it was so.
1) As a result of the construction ctxsys.context - the index over a large volume of various text documents (several million web-pages) using AUTO_LEXER and russian stemming, from the index table DR # (INDEX_NAME) $ I have obtained all the texts encountered in the form of words. While - in the Cyrillic alphabet. These words were composed in the table, say, WORDS_RU
2) Then the table was built by another WORDS_RU ctxsys.context - index, also using AUTO_LEXER and russian stemming. It was written by a procedure that produced a full-text search of every word of their tables WORDS_RU on the full-text index. Revealed that the normal forms for each word. Slightly confused, of course ...
3) Then received word in the Cyrillic alphabet, and their normal forms (nominative, singular, present. Time, etc.) have been transliterated, and it was made custom dictionary forms (stemming user-dictionary) dren. dct, and placed in the folder $ ORACLE_HOME / ctx / data / enlx / This dictionary I am ready to share - in a personal email.
Now - almost everything is ready.
4) To build ctxsys.context - index, or use procedure_filter USER_DATASTORE, which implement the transliterated text indexed. In this case the texts themselves - in Russian, and in the index - the words transliterated. Oh yes, do transliteration had in Java, as Use something native, such as TRANSLATE - works by orders of magnitude slower.
BASIC_LEXER: 'index_stems' 'ENGLISH'
BASIC_WORDLIST: 'stemmer' 'ENGLISH', 'fuzzy_match' 'GENERIC', 'fuzzy_score' '70 ',' fuzzy_numresults ''10'
Building Code, and we all work for the English features.
5) When you search, of course, pass a parameter contains a request after the transliteration.

According to data from the table WORDS_RU normalization can be carried out as text indexing and queries. But our experiments have not yielded tangible productivity gains indexing and search under such transformations.

© 2010 - 2019 D@nVitLabs