Extensible full text search

Full-text search is extensible to any format, so long as there is a CLI tool to convert it to text.

Configuration of ORACLE and MSSQL built-in indexers differ and can be used in combination with this method.

To index the contents of popular formats like office documents on linux add the following to your .conf:

#tested
#apt/yum install -y poppler-utils pstotext antiword html2text unrtf python-excelerator libwpd-tools unzip catdoc
Indexer.IndexContent = ALL
Indexer.toTxt.Enabled = true
Indexer.toTxt.pdf = "pdftotext -q -eol unix -enc UTF-8 $IN $OUT"
Indexer.toTxt.doc = "antiword $IN > $OUT"
Indexer.toTxt.html = "html2text -nobs -o $OUT $IN"
Indexer.toTxt.xls = "xls2csv $IN > $OUT"
Indexer.toTxt.mp3 = "id3info $IN | grep '===' | grep -v 'PRIV' | grep -v 'image\/' | perl -p -e 's/^.+(\)|\])\:/ /g' > $OUT"
Indexer.toTxt.rtf = "unrtf --nopict --text $IN 2>/dev/null | grep -v '^### ' > $OUT"
Indexer.toTxt.docx = "unzip -p $IN word/document.xml | perl -p -e 's/<.+?>/ /g' > $OUT"
Indexer.toTxt.pptx = "unzip -p $IN ppt/slides/*.xml | perl -p -e 's/<.+?>/ /g' > $OUT"
Indexer.toTxt.xlsx = "unzip -p $IN xl/sharedStrings.xml | perl -p -e 's/<.+?>/ /g' > $OUT"
Indexer.toTxt.odt = "unzip -p $IN content.xml | perl -p -e 's/<.+?>/ /g' > $OUT"
Indexer.toTxt.ods = "unzip -p $IN content.xml | perl -p -e 's/<.+?>/ /g' > $OUT"
Indexer.toTxt.odp = "unzip -p $IN content.xml | perl -p -e 's/<.+?>/ /g' > $OUT"

# TensorFlow image search
#Indexer.toTxt.jpg = "python classify.py --image $IN | grep -P '^1\. ' > $OUT"

## others
#Indexer.toTxt.wpd = "wpd2text $IN > $OUT"
#Indexer.toTxt.jpg = "exiftool $IN > $OUT #for camera type or gps location"
#Indexer.toTxt.xls = "py_xlstoTxt $IN > $OUT # supports sheets but adds sheet = ----"
#Indexer.toTxt.docx = "unzip -p $IN word/document.xml | sed -e 's/<[^>]\{1,\}>/ /g' > $OUT"

Windows examples

Indexer.IndexContent = ALL
Indexer.Interval = 30
Indexer.toTxt.Enabled = true
Indexer.toTxt.pdf = "\"C:/Program Files/xpdf/pdftotext.exe\" $IN $OUT"
Indexer.toTxt.doc = "C:\antiword\antiword.exe $IN > $OUT"

You can populate your conf file with command line method of converting the file types of your choice to text.

Be sure "System Tools > Settings > General > Enable Full Text Search" is set to "Yes".