My weird string encoding thing

I’ve been working on a little Websockets chat app, and wanted a way to separate multiple strings. Instead of using commas ‘,’ an escaped comma if the string needs a comma ‘\,’ and double backslash if I just need an actual backslash ‘\\’.


What we could do, is include the string’s length before the value.


But what if we have strings longer than 9 characters? we could use more digits like this:

05hello11Pirate Ship

But how far do we go? how about instead of having a fixed number of digits, we use the maximum digit to show there is more length information. We can keep chaining this to add more.

1 = 1
8 = 8
90 = 9 + 0 = 9
91 = 9 + 1 = 10
990 = 9 + 9 + 0 = 18
995 = 9 + 9 + 5 = 13
999996 = 9 + 9 + 9 + 9 + 9 + 6 = 51

This is nice and short for small strings, but longer for longer strings. This data is typically short, and (as a percentage) the length is still relatively short.

5hello92Pirate Ship

We could use something better than base-10 for these lengths. Hexadecimal (base-16)would look like this:

5helloBPirate Ship

FFDThe quick brown fox jumps over the lazy dog

Even better, we could use base-32

5helloBPirate Ship

vcThe quick brown fox jumps over the lazy dog

This seems to work nicely. The majority of strings for my application are short, for example a timestamp is “d1585917453947”

Of course, this is purely academic, there are much better and more proven ways to do this. You could use commas and escaped characters, a non-typeable separator, json or even protocol buffers.

Leave a comment

Your email address will not be published. Required fields are marked *