How do you spellcheck a website with millions of pages? Assuming, of course, that you don’t feel inclined to copy-and-paste each page into Microsoft Word… Following the release of Isoxya plugins Crawler HTML and Elasticsearch 2.0 open-source previews, I’m pleased to announce Isoxya plugin: Spellchecker 2.0 open-source (BSD 3-Clause) preview. Everything has been upgraded to work with Isoxya 2.0 JSON interfaces—and English (British, American), Czech, German, Spanish (European), Estonian, French, and Dutch are supported.
This completes the release of the open-source previews planned for the new Isoxya 2.0 plugins. Example JSON payloads and links to the source code are available here.
what is it?
Isoxya plugin: Spellchecker is an open-source (BSD 3-Clause) processor plugin for Isoxya web crawler. This plugin uses Isoxya 2 JSON interfaces to provide spellchecking capabilities to entire websites, even if they have millions of pages.
The spellchecker backend is Hunspell, the same spellchecker as is used in LibreOffice, Mozilla Firefox, Mozilla Thunderbird, Google Chrome, and various proprietary programs. Support for several languages is included out-the-box.
Since Isoxya supports both processor and streamer plugins using the Isoxya interfaces, this plugin is only one of many possibilities for processing human-language or other webpage data.
what does it support?
CODE | LANGUAGE | VARIANTS |
---|---|---|
en * |
English | gb (BrE), us (AmE) |
cs |
Czech | cz |
de |
German | de |
es |
Spanish | es (European) |
et |
Estonian | ee |
fr |
French | fr |
nl |
Dutch | nl |
*
: this is the default, if no language or variant is specified
Many other languages can be added easily, since both Hunspell and MySpell dictionaries are used. If it’s available in the build OS, it can probably be added, with appropriate tests and extensions to the Isoxya engine interface.
Example:
[
{
"paragraph": "Global heating is increesing droughts, soil erosion and wildfires while diminishing crop yields in the tropics and thawing permafrost near the Poles, says the report by the Intergovernmental Panel on Climate Change.",
"results": [
{
"correct": false,
"offset": 19,
"status": "miss",
"suggestions": [
"increasing",
"screening",
"resining",
"cresting",
"resisting"
],
"word": "increesing"
}
]
}
]
is it ready?
Not yet, no. Work continues towards the goal of Isoxya 2.0, which for the first time is planned to include a Community Edition, making not only the plugins open-source, but also a minimal edition of the core crawling engine. Stay tuned for more information and previews as this work progresses.