Feature "Indexing file system directories"
C0023741356
Jump to navigation
Jump to search
|
Understanding
user@fileserver:~$ tree . # Showing only relevant files!
├── dataspects-indexer # Normal folder
│ ├── dataspects.db # Logging database
│ ├── dataspects-indexer # Go binary from Git repo
│ └── dataspects-indexer-config.json # Indexer configuration
└── documents_to_index
Docker Container "dataspects-indexing-service" (https://github.com/dataspects/dataspects-indexing-service)
|
~/dataspects-indexer/dataspects-indexer-config.json {
"FilenameRegex": ".pdf",
"RootFolder": "/home/user/documents_to_index",
"DataspectsIndexingWebserviceURL": "https://indexer.hapa.ch", # Points to NGINX reverse proxy
"TikaURL": "https://tika.hapa.ch", # Points to NGINX reverse proxy
"TikaUsername": "USERNAME",
"TikaPassword": "PASSWORD",
"APIMasterKey": "",
"IndexerClassName": "FindAndLearnDocumentsIndexer"
}
|
Run TIKA somewhere
- E.g. https://github.com/LogicalSpark/docker-tikaserver e.g. served at http://localhost:9998 as part of https://github.com/dataspects/dataspectsSystem.
Run dataspects-indexing-service
Execute indexing
user@fileserver:~/dataspects-indexer$ ./dataspects-indexer --config dataspects-indexer-config.json --indexer LEXP51