Announcement

**tuuker** · 19 February 2014, 01:22 PM

most horrible invention ever in KDE history, finally they will get KDE cleaner and less dependencies.

**anda_skoa** · 19 February 2014, 01:29 PM

Originally posted by justinzane View Post

That was exactly my point! Since plain old grep can find stuff in what I called "proper" PDFs, and nepomuk cannot OCR the innumerable "bogus" PDFs created by document scanners, the value of the index -- and nepomuk -- is limited for many users.

True, but an OCR indexer could be added while it would be difficult to add that functionality to grep (aside from an indexer OCR'ing once per document and grep OCR'ing per search).

Still a good point, but also a good example where and index that can be built using various tools can improve over the base functionality available elsewhere

Cheers,
_

**justinzane** · 19 February 2014, 02:44 PM

Should be KDE-Semantics...

Originally posted by anda_skoa View Post

True, but an OCR indexer could be added while it would be difficult to add that functionality to grep (aside from an indexer OCR'ing once per document and grep OCR'ing per search).

Still a good point, but also a good example where and index that can be built using various tools can improve over the base functionality available elsewhere

Cheers,
_

One of the things things I love about KDE/GNU/Linux is the 'nix philosophy:

Even though the UNIX system introduces a number of innovative programs and techniques, no single program or idea makes it work well. Instead, what makes it effective is the approach to programming, a philosophy of using the computer. Although that philosophy can't be written down in a single sentence, at its heart is the idea that the power of a system comes more from the relationships among programs than from the programs themselves. Many UNIX programs do quite trivial things in isolation, but, combined with other programs, become general and useful tools.

grep is a wonderful, powerful tool; but, it has intentionally limited functionality. The "composition" of find, grep, exiftool (photos), dcraw(raw photos), mpg123-id3dump (mp3s), orginfo (maps), pprof (software profiles), etc., etc. is what makes it magical to me.

To have something like nepomuk as a monolithic search/index tool seems, to me, wrong. I would much rather see a part of the KDE SC called kde-semantics that has a set of complementary tools much like kde-pim, kde-games, etc. rather than having semantic search, indexing and what not being part of the base.

**TheBlackCat** · 19 February 2014, 03:18 PM

Originally posted by justinzane View Post

To have something like nepomuk as a monolithic search/index tool seems, to me, wrong. I would much rather see a part of the KDE SC called kde-semantics that has a set of complementary tools much like kde-pim, kde-games, etc. rather than having semantic search, indexing and what not being part of the base.

That is essentially what Baloo is. Different parts of KDE have their own specific indexers implemented in their own ways. So the file indexer is implemented separately and in a different way than the Akonadi indexer. However, they expose their indexes to other applications through a set of common APIs to make interoperability easier.

**anda_skoa** · 19 February 2014, 04:04 PM

And also true for Nepomuk, e.g. all the indexing is done by specialized indexers, helpers that can each deal with certain file formats, etc.

The reason for a service or daemon at the core is not that it does everything, it is to make sure only one process attempts to change the data. File locking is way to unreliable to allow multiple processes writing to same files if data integrity or consistency is important.

An index can also allow to preserve information that would otherwise be lost, e.g. a browser knows the source of a download, a mail client knows who sent you and attachment.

Other tools can then use this to reduce their result set, e.g. only list PDFs that have been downloaded from a certain site. Another type of "find" if you will.

Cheers,
_

**erendorn** · 19 February 2014, 06:19 PM

Originally posted by justinzane View Post

Multitasking in the human brain, silly.

After all, as I tried to point out, an index or cache is rather useless if it has no content -- which is almost guaranteed to be the case for all but the most common text-derived content.

And, no, kmail does **not** by-definition store local IMAP copies. Though it can be setup to do so, it also can be setup to store nearly nothing locally, leveraging tiny clientside flash/SSD storage and gigabit networks.

And, no, firefox does **not** necessarily store anything in a disk cache. As with kmail, this is configurable to individual needs and circumstances. The instance of firefox that I am using right now does not cache anything to disk, though it does use a large RAM cache.

Please stop generalising with that example. It is not relevant to tremendous corpora of user data in all but the most simple and common formats.

So:
- It has no content except the most common content. Well, it's a start (besides, that just means you need to add specific indexers, just as people added specific grep utilities)
- my bad, "Disconnected IMAP" does.
- it does not matter if it's disk or RAM. The point is: Firefox is using cache so that it does not need to read all data from source, so that you don't have to wait, err, "multi-task" while it loads pages.

But you are right, let's not generalize with some example. Let's focus on one simple fact: searching through an index *is* faster than reading again the whole source.

**justinzane** · 19 February 2014, 08:00 PM

Originally posted by erendorn View Post

searching through an index *is* faster than reading again the whole source.

Absolutely. I've been working on trying to see if I can implement some sort of buffer/cache for GRASS so that is does not need to hit the datastore (fs, postgres, mysql, sqlite, whatever) to re-read and re-calculate the map display. And for the exact same reason, direct read-parse-calculate vs index/hash lookup is vastly faster.

My point is that nepomuk/baloo/various-and-sundry-indexes should be desirable add-ons to the KDE base, not part of the base. That way, users/packagers/distros can choose between stripped lightweight KDE installs with just the indexing the advanced user knows he/she needs or a full SC install that provides everything out-of-the-box but makes space/performance/privacy/whatever compromises in the name of eas of use.

I'm looking forward to trying Baloo and I hope it provides and easy and well documented way for low-skill contributors (like me) to write and contribute domain specific parsing/indexing algorithms. I just do not think that users should **have** to go to Arch or Gentoo or whatnot just to be able to easily disable functionality that they do not want.

-----

On a different side of the topic, I would also love to see the capability to have shared workgroup, organization and global metadata sharing as an option so that the effort that each of us puts into manually tagging content can benefit others and can provide a large knowledgebase for automated content analysis.

**caligula** · 19 February 2014, 10:52 PM

Originally posted by justinzane View Post

I'm looking forward to trying Baloo and I hope it provides and easy and well documented way for low-skill contributors (like me) to write and contribute domain specific parsing/indexing algorithms. I just do not think that users should **have** to go to Arch or Gentoo or whatnot just to be able to easily disable functionality that they do not want.

It's just easier to forget KDE and not switch the distro. KDE is the most bloated desktop on Linux. Has always been.

**kevinf28** · 19 February 2014, 10:56 PM

what i really dont understand about KDE is the completely WHACK application names... ill stick with my Xbuntu, thank you very much. lean, well supported, and fast.

**erendorn** · 20 February 2014, 03:28 AM

Originally posted by justinzane View Post

On a different side of the topic, I would also love to see the capability to have shared workgroup, organization and global metadata sharing as an option so that the effort that each of us puts into manually tagging content can benefit others and can provide a large knowledgebase for automated content analysis.

Similar to what is done (a bit) for music files, but generalized?

Announcement

KDE's Nepomuk Doesn't Seem To Have A Future

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment