So version 1 of Dreamcatcher is working nicely. I’ve written up the current features, as well as the implementation details of the project. But as with anything, there’s still plenty of room for improvement. I’ve compiled a short list of the features/improvements I intend to explore:
- Making the training phase easier: At the moment, users have to scan each of their documents within a category, and choose keywords that describe its context. This can be a relatively time consuming process. With a more sophisticated training phase that operates at the Category level, returning something slightly more user friendly than mere keywords, the training time of could be reduced while maintaining (or even improving) classification accuracy.
- Supporting nested Categories: I think this is a must for providing an extra bit of flexibility with the creation of a filing system.
- “Filing system wizard”: A short wizard that can help with the creation of an appropriate filing system, based on user input.
- Supporting things other than Documents (e.g. Music, Pictures): Doing this may require alternative classification/training approaches for different types of file (e.g. perhaps a convolutional neural network for image classification).
- Progress indication: This should be a quick fix, progress bars need to be added to the keyword scanning and classification stages.
- Classification considering document format: Currently, all classification decisions are made based on the words within documents, and not based on format. So creating folders for things like meeting minutes, registers, or things that follow specific templates may not be as effective. I’d like to consider document formatting within the classification process, baring in mind that a variety of document formats are supported.
I’ve made sure to maintain an object oriented approach through out the application, so that components can easily be modified, replaced or extended. If you have any ideas of your own for improvements or features, I’d love to discuss them with you (in the comments section or via e-mail).
My current focus is on improving the training phase and accuracy of the application. and I’ll be sure to update this blog as I progress.