Feature "Indexing file system directories"

From dataspects::Wiki
C0023741356
Jump to navigation Jump to search





Understanding

user@fileserver:~$ tree
. # Showing only relevant files!
├── dataspects-indexer                 # Normal folder
│   ├── dataspects.db                  # Logging database
│   ├── dataspects-indexer             # Go binary from Git repo
│   └── dataspects-indexer-config.json # Indexer configuration
└── documents_to_index

Docker Container "dataspects-indexing-service" (https://github.com/dataspects/dataspects-indexing-service)

  • --env MASTERKEY=
  • --volume FindAndLearnIndexerClasses:/lib/custom-indexer-classes # Git repo volumed into container
  • (No ports exposed since it's going through NGNIX reverse proxy)
~/dataspects-indexer/dataspects-indexer-config.json
{
  "FilenameRegex": ".pdf",
  "RootFolder": "/home/user/documents_to_index",
  "DataspectsIndexingWebserviceURL": "https://indexer.hapa.ch", # Points to NGINX reverse proxy
  "TikaURL": "https://tika.hapa.ch", # Points to NGINX reverse proxy
  "TikaUsername": "USERNAME",
  "TikaPassword": "PASSWORD",
  "APIMasterKey": "",
  "IndexerClassName": "FindAndLearnDocumentsIndexer"
}

How to get the APIMasterKey

Run TIKA somewhere

Run dataspects-indexing-service

C190110142823

Execute indexing

user@fileserver:~/dataspects-indexer$ ./dataspects-indexer --config dataspects-indexer-config.json --indexer LEXP51