Full-text search is extensible to any format, so long as there is a CLI tool to convert it to text.
Configuration of ORACLE and MSSQL built-in indexers differ and can be used in combination with this method.
To index the contents of popular formats like office documents on linux add the following to your .conf:
#tested #apt/yum install -y poppler-utils pstotext antiword html2text unrtf python-excelerator libwpd-tools unzip catdoc Indexer.IndexContent = ALL Indexer.toTxt.Enabled = true Indexer.toTxt.pdf = "pdftotext -q -eol unix -enc UTF-8 $IN $OUT" Indexer.toTxt.doc = "antiword $IN > $OUT" Indexer.toTxt.html = "html2text -nobs -o $OUT $IN" Indexer.toTxt.xls = "xls2csv $IN > $OUT" Indexer.toTxt.mp3 = "id3info $IN | grep '===' | grep -v 'PRIV' | grep -v 'image\/' | perl -p -e 's/^.+(\)|\])\:/ /g' > $OUT" Indexer.toTxt.rtf = "unrtf --nopict --text $IN 2>/dev/null | grep -v '^### ' > $OUT" Indexer.toTxt.docx = "unzip -p $IN word/document.xml | perl -p -e 's/<.+?>/ /g' > $OUT" Indexer.toTxt.pptx = "unzip -p $IN ppt/slides/*.xml | perl -p -e 's/<.+?>/ /g' > $OUT" Indexer.toTxt.xlsx = "unzip -p $IN xl/sharedStrings.xml | perl -p -e 's/<.+?>/ /g' > $OUT" Indexer.toTxt.odt = "unzip -p $IN content.xml | perl -p -e 's/<.+?>/ /g' > $OUT" Indexer.toTxt.ods = "unzip -p $IN content.xml | perl -p -e 's/<.+?>/ /g' > $OUT" Indexer.toTxt.odp = "unzip -p $IN content.xml | perl -p -e 's/<.+?>/ /g' > $OUT" # TensorFlow image search #Indexer.toTxt.jpg = "python classify.py --image $IN | grep -P '^1\. ' > $OUT" ## others #Indexer.toTxt.wpd = "wpd2text $IN > $OUT" #Indexer.toTxt.jpg = "exiftool $IN > $OUT #for camera type or gps location" #Indexer.toTxt.xls = "py_xlstoTxt $IN > $OUT # supports sheets but adds sheet = ----" #Indexer.toTxt.docx = "unzip -p $IN word/document.xml | sed -e 's/<[^>]\{1,\}>/ /g' > $OUT"
Windows examples
Indexer.IndexContent = ALL Indexer.Interval = 30 Indexer.toTxt.Enabled = true Indexer.toTxt.pdf = "\"C:/Program Files/xpdf/pdftotext.exe\" $IN $OUT" Indexer.toTxt.doc = "C:\antiword\antiword.exe $IN > $OUT"
You can populate your conf file with command line method of converting the file types of your choice to text.
Be sure "System Tools > Settings > General > Enable Full Text Search" is set to "Yes".