Announcement

Collapse
No announcement yet.

The Brewing Problem Of PGP Short-ID Collision Attacks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Fry-kun View Post
    Why not assign an english word to each quadruple hex segment, leading to a schema similar to https://xkcd.com/936/
    Words are MUCH easier to read and compare by humans, even though the phrases would be mostly nonsense.
    Similar words can be discarded; there are over 150k words in Oxford English Dictionary; it should be possible to get 65k non-similar ones
    Nice idea but I fear impractical:

    Once you'd sieved out visually similar words you'd have to sieve out homophones. Sore soar saw etc., bow, bough.. and not just in RP but in every existing dialect.. and they'll be everywhere. I doubt there's even a comprehensive reference compiled anywhere! US dialects in particular, with their t/d homophones and spongy vowels are littered with unlikely homophones. Frog & fraud, cold & called for example. Even if there was such a list, it'd be obsolete as soon as it was complete as language is constantly evolving.

    Anyway, once you'd finished that, assuming you ever did, you'd have to do the same for every existing dialect of every other language on earth! That's assuming you're not just relying the words' lengths as a (shockingly poor) proxy for difference: I suspect comparing enormous strings of letters which supposedly happen to represent words in a foreign language would be worse than comparing the underlying hash. ...and then of course, there are plenty of languages which don't even have 65k words.

    Once you'd sorted out that lot and allowed for future deviations in pronunciation, how would the source choose which language to encode to? You'd have to know the first language of every potential recipient... or publish the digest/"fingerprint" encoded into every known language - which would be rather unwieldy!

    Encoding to a word-list which is compiled from every language on earth, carefully sieved to remove all visual and phonetic similarities could help resolve some of those problems... perhaps... the result might be quite good for visual comparison... but it'd be an absolute bastard to type or read over the phone!

    I'm pretty certain the idea is unworkable.

    Comment


    • #12
      Originally posted by Dick Palmer View Post
      I doubt there's even a comprehensive reference compiled anywhere!
      https://en.wikipedia.org/wiki/Rhyming_dictionary

      You're assuming that this needs to be a fixed system, built "once and forever". In my mind, it's more of a "dictionary" overlayed on top of the actual code signature to make it easier to compare.
      I think it would get to a pretty workable state after a few iterations -- and could be updated every so often if necessary

      Comment


      • #13
        Originally posted by Fry-kun View Post

        https://en.wikipedia.org/wiki/Rhyming_dictionary

        You're assuming that this needs to be a fixed system, built "once and forever". In my mind, it's more of a "dictionary" overlayed on top of the actual code signature to make it easier to compare.
        I think it would get to a pretty workable state after a few iterations -- and could be updated every so often if necessary
        I realise your list could be updated... as I said, it'd have to be perpetually updated... but you'd still have to build one to distribute with your system... and it would age. Or are you imagining it to be an online-only "service" like a DICT server? That would be unsafe/unwise for a host of different reasons!

        I think you're completely failing to comprehend the enormity and mind boggling complexity of the linguistic aspect of the task. You're also assuming your rhyme list is exhaustive. That's fine for poetic purposes but recklessly negligent for cryptography. Even if it were exhaustive (no such list exists), visually similar words don't necessarily rhyme, even when they're dangerously close homophones: Orange/arrange, device/devise etc, etc...

        Since you're so insistent it'll be easy to implement in a cryptographically dependable manner, why not just do it? Drop us a note when you've got it working.

        Comment


        • #14
          Originally posted by Dick Palmer View Post
          [...]Once you'd sieved out visually similar words you'd have to sieve out homophones.[...]
          Why? As long as the letters are different it's easy enough for you to notice that it's not the same. It's not like someone is reading that to you, you read it yourself.

          Originally posted by Dick Palmer View Post
          [...]Encoding to a word-list which is compiled from every language on earth.[...]
          English only, most used 65k words :-) It's not like you need to understand the word in its meaning, you can easily identify different words even without their meaning.

          Comment


          • #15
            Fingerprints aren't really safe either. Considering they're SHA1 hashes, that's pretty weak by modern crypto wisdom. Users should always exchange full keys securely, not just their fingerprints.

            Comment


            • #16
              Originally posted by droste View Post
              It's not like someone is reading that to you, you read it yourself.
              ...but it is! It's exactly like someone is reading the "fingerprint" to you: That's what they're for! I've verified keys over the phone on numerous occasions. In fact there's very little point trying to authenticate a "fingerprint" in-band! You may as well just assume your downloaded key or file or whatever it is is authentic, as assume that your adversary lacks the wit to substitute fraudulent authentication along with a corresponding in-band "fingerprint." Then what about the visually impaired? Screen readers etc...

              Anyway, even "reading it yourself" is like someone is reading that to you! It's a phonetic exercise and why some people can't manage to differentiate between "your" and "you're", "had" and "hat" etc... That's the whole premise! Substituting our (faulty) word recognition for sequential verification. That fautly word recongition is aslo the resaon funny lttile tricks like this raed so esaily.

              Not cryptographically secure!

              Originally posted by droste View Post
              English only, most used 65k words :-) It's not like you need to understand the word in its meaning, you can easily identify different words even without their meaning.
              Yes, it is like you need to understand the word. That's still the whole crux of the premise. That's the sole reason it's presented as "easier." If you didn't recognise the word, it'd just be a string of characters to verify: Exactly like the digest we use now then... just LONGER!... which is stupid! If it didn't matter, as you suggest, then why do you specify English? Why not use Russian? Or Chinese? Or Tamil? You could sit there studying two strings of ideograms in the hope of definitively establishing there are no meaningless (to YOU) little squiggles or dots out of place. English "works" to you precisely and solely because English is familiar to you.

              Comment


              • #17
                If you actually read it to someone you could specify hints to what it actually means ("hat like what you wear to cover your hair") and as a last resort you could spell it (which is not worse than the current situation).

                "Not cryptographically secure!" -> Not sure what you're suggesting here, do you believe our faulty mind is better in reading random number, letters, etc than words? As soon as there's a human involved it stopped being secure. The point is to make it _easier_ for humans to not make faults. They will still make faults and there's nothing that will change that.

                And not I don't need to understand the word as long as I can make up a pronunciation in my head.
                That's why I chose english because the alphabet is known to almost the complete world (even when their native alphabet is something else).

                "Humpertso" not a real word, you don't know how to pronounce it or what its meaning is and yet still easier to read than AE67GH86O

                Comment


                • #18
                  Originally posted by Dick Palmer View Post

                  Nice idea but I fear impractical:

                  Once you'd sieved out visually similar words you'd have to sieve out homophones. Sore soar saw etc., bow, bough.. and not just in RP but in every existing dialect.. and they'll be everywhere. I doubt there's even a comprehensive reference compiled anywhere! US dialects in particular, with their t/d homophones and spongy vowels are littered with unlikely homophones. Frog & fraud, cold & called for example. Even if there was such a list, it'd be obsolete as soon as it was complete as language is constantly evolving.
                  The hard work is already complete. It is used in Bitcoin as a backup method BIP0039. There is a carefully selected vocabulary of 2048 words, encoding 11 bits each. No homophones, no siliarities, no dialect-dependance. All words differ at the first 4 characters, so the rest is more like a mnemonic filling.

                  The whole 160bit fingerprint would fit in 15 words, giving 165 bits payload - the 5 additional bits could be used a simple checksum (1/32 error acceptance). Adding one more word would give 160+16 bits, giving CRC16 checkum possibility.

                  Such established dictionaries exist for other languages (Japanese, Spanish, Chinese Simplified & Traditional, French and Italian).

                  Comment


                  • #19
                    Originally posted by Pyth0n View Post
                    It is used in Bitcoin as a backup method BIP0039.
                    I fixed the link.

                    Comment


                    • #20
                      Originally posted by Pyth0n View Post

                      The hard work is already complete. It is used in Bitcoin as a backup method BIP0039. There is a carefully selected vocabulary of 2048 words, encoding 11 bits each. No homophones, no siliarities, no dialect-dependance. All words differ at the first 4 characters, so the rest is more like a mnemonic filling.

                      The whole 160bit fingerprint would fit in 15 words, giving 165 bits payload - the 5 additional bits could be used a simple checksum (1/32 error acceptance). Adding one more word would give 160+16 bits, giving CRC16 checkum possibility.

                      Such established dictionaries exist for other languages (Japanese, Spanish, Chinese Simplified & Traditional, French and Italian).
                      Good idea and good solution, I did not think about it before...

                      ZEBRA BLACK MESSIA and ZEBRA LAKE MESSIA

                      may be close, but still much easier to differenciate that

                      khOsDF15kh and kh0sDF15kh

                      Comment

                      Working...
                      X