The Programmer's Paradox: bXML

Sunday, September 13, 2009

bXML

Lately, I've been feeling undecided. There are so many great things I want to write about, but I fear that there are fewer and fewer people who want to read them. I'm split between not caring and just doing my own thing, or trying harder to pick more accessible topics to gain acceptance.

We're drifting farther and farther away from wanting to know -- really know -- about stuff, and getting more into just having quick but meaningless details at our finger tips. It seems to be a dangerous side-effect of the Information Age. Still most people will eventually come to realize that even if you have the facts, without any context, you can't put them into perspective. They are useless on their own.

bXML

For this post, I figured I'd just dump out an old idea that I had a long time ago for some simple development. The need, requirements and drivers for the project are all long gone, ancient history now, leaving only this little orphaned idea.

It's really just a simple spin on XML, one that compacts the format and allows it to contain binary data. It can be indexed as well, using an extra small amount of space to allow for faster internal access for large files. Split into two related files, it could even anchor some novel type of database.

The acronym I like is bXML, but the full name is probably better described as "indexed binary XML". Since the need and usefulness of the index mechanism is crucial to this being more than just XML.

My inspiration comes from several of the better bits from the PDF format, which are quite intelligent and elegant, all things considered. PDFs marched out of nowhere to become a major important technology in a huge number of fields including both the Internet and the commercial printing industry. Internally they are complex, yet they do not require the demented rocket science understanding necessary for really bad spaghetti-data formats such as Microsoft Word files. They are easy to learn, and easy to generate.

ON LARGE

I do like XML, it has that great quality were it is mostly simple, yet it contains a significant amount of power in its expressiveness. Most of the best technologies have that property.

Still, there is a necessary redundancy to it, driven by its roots from the text-based format SGML. The all-text aspect is fine, but sometimes we want to contain a more condensed larger amount of data, and even a 20% overhead in size is too significant a price to pay.

No matter how much faster they are getting, machines are always way too limited to compute a lot of what we know would be interesting. Moore's law may help with video games, but it can not exceed our expectation for data processing. There are just too many things to capture, and too many ways to analyze the results. We're a long way away from gaining our freedom from the hardware, if it's even possible.

As such, the first and most important point in bXML is to find the smallest possible representation that still matches the power and expressiveness of XML, but also maintain that degree of simplicity.

Some intensely powerful representation that is also very obfuscated would not be particularly useful. Simplicity, especially initially, is an all important aspect for this to work.

To compact XML data we need two things. Cheaply parsable data, and a compact format for the file structure.

PARSABLE DATA

The first thing we need is to make sure the resulting format is "reasonably" parsable. Since this format can also contain binary data, variable length elements become an interesting issue.

The programming language Pascal created variable length strings that were preceded by the length of characters in the string. The C language was the opposite, it put an "in-band signal" -- the null character -- into the string as a terminator. These two languages represent the main ways of managing variability within a computer. An explicit size or a terminator.

In-band signals are easier and often more flexible, but with binary data there is always the chance that the terminator itself can appear in the data.

A way around this could be to use a custom variable length terminator. One that is explicitly picked each time because it does not belong in the current stream of binary data.

While it is a neat idea, it's complexity does seem to be higher than just specifying the size at the beginning. So we should probably take the simple size road, for all variable data in the file. It's slightly more expensive in size, but much less in CPU, and it is simple.

For now along with variable length data, we can also have a list of well know types and their explicit size and interpretation. Things such as shorts, integers, dates and floating point numbers.

That type of list tends to grow with the specification getting longer and longer as it ages. Still, well-defined, compact types represent an easy and reliable way to tightly pack data.

STRUCTURE

The second thing we need is to somehow pack the tags into an arrangement that isn't highly replicated throughout the entire document.

If there are a lot of small elements with long names in a large XML file, the structural burden of the document can easily exceed the content one. For small data, this difference is not significant, but this is a real show-stopper for big things.

The best way to handle this is by stealing a few tricks from PDFs. I've always rather admired how they managed to create a binary format that was still mostly readable in an editor. It can be very useful in many operational circumstances.

They use a reference table, often stored at the end of the document, to list out the index (in characters) for all of the internal objects in the file. While each object is delineated by itself -- you don't need the reference table to parse -- it can be quickly accesses from the table. The whole underlying scheme has a quality to it that we can use.

THE DOCUMENT

The overall document should be pretty simple, with a header, footer, reference table and of course, the data itself.

For the header and footer we can use some syntax similar to PDFs. The PDFs syntax was really based around Postscript syntax, but for this format, the similarly is just for show.

We can start by defining the fact that the first line of an bXML file references its type and version, and the second line is just a set of four binary values (like PDF) to insure that the data-type of the document is correctly understood.

%bXML-1.0

%âãÏÓ

Would do nicely for a header.

The next thing we want, is for some of the last bytes in the file (the footer) to be an integer that lists out the starting position for a reference table. Since we can use network encoded 4 byte integers directly, we could finish the file with a consistent 11 bytes:

BBBB\n

%%EOF\n

Where BBBB is the four-byte integer character offset of the start of the reference table.

In between the header and the footer, is first the data, and then a reference table to allow the data to be interpreted correctly.

THE REFERENCE TABLE

The reference table consists of one line for each unique element encoded in the data. Element attributes would be dropped into full tags, as part of normalizing the data, so the following:

<painting>
<img src="madonna.jpg" alt='Foligno Madonna, by Raphael'/>
<caption>This is Raphael's "Foligno" Madonna, painted in
<date>1511</date>-<date>1512</date>.</caption>
</painting>

Would have a list of element names:

painting

img

src

alt

caption

date

Now, since we want to be able to identify these tags in the data as we encounter then, we can implicitly associate each tag with an index into the reference table. Since the number of different tag types could be small, or it could be large, we can to use a mildly sophisticated method for encoding them.

In the data, each tag is one or more bytes. If the 8th bit is set on a byte, then the following byte is also part of the tag as well. In that case, the first 7 bits allows us 128 unique tags, then the next 14 allows us 16384 tags, and so on. In this scheme, there is no limit to the number of tags, but each latter tag starts to eat up more space.

The algorithm for finding the index is fairly simple. Keep reading bytes until one doesn't have an eigth bit set. Pack all of the 7-bit pieces together, and then convert to the index number. Since it is always positive integers, we can do a straight binary to decimal flip.

Now, of course, some of the element names are duplicated, over and over again. It is up to the creator of the file as to whether these repeated elements have separate tags or not. Usually 'not' would be better as it is more space-efficient.

This leaves us with a reference table that would look something like:

crossref

painting

img

src

alt

caption

date

end-crossref

Where each entry is separated for convenience by a newline (just to make it partially readable).

MORE PACKING

So we have a reference table that relates a variable string back to a byte tag embedded in the data.

In many XML files, there are often larger hierarchies of tags that refer to the same structure of elements, over and over again. If we know the underlying element types, we can parse them out individually without having to have a separate tag for each sub-element. In complex, highly repetitive data is a considerable saving.

To accomplish this, in the reference table, we can just add a value that says our "depth" in a structural tree.

If our depth is not listed or 0, we are essentially able to be parsed as a top-level (explicit) tag. If the depth is higher, then we are an implicit tag packed in sequence behind our parent tag (which may also be implicit). In that case, the two elements of data appear side-by-side without any intervening tag (and empty elements still need a zero size indicator).

Another modification we can do is to first assume that all data in the file is variable length. In that case, each element looks like:

BBBBdddddddddddddddddddddddddddd

Where BBBB is a four-byte network encoded integer, followed by a variable array of data.

Now for some data types, we may know right away that they have a fixed size and are a well-known data type. As such, in the reference entry, while the default is variable binary data, other types can be specified too such as network encoded integer, short or double.

Any type really, so long as the length is fixed and there is some reference tag for that type added to the bXML definition.

All of this fiddling with extra information gives us a rather arbitrary length reference table, where each entry contains a variable-length string, and possibly some other values.

Since each table entry is already variable, we might as well accept that. A separator character like # can be used to delineate these fields. A starting and ending string, also help to delineate the boundaries of the table. "crossref" and "end-crossref" are easily readable.

Also, since all of the entries and parameters in the table are explicitly separated, we can use the ASCII representation for numbers, instead of something less readable like network encoded floats. The small extra size is a reasonable trade-off for easy readability.

Another thing that would be nice, would be a bunch of index references for the main tags. We'd like to know where they are in the file. Since this is a one-to-many relationship, we can just use square braces and commas to delineate a list of character positions for the tags.

So a more complete reference file might look something like:

crossref

painting##empty#[0]

img#1#empty#[1]

src#2###[1]

alt#2###[16]

caption#1##[46,103,113]

date#2#year#[98,108]

end-crossref

Given that, we can then encoded the file as:

{1}BBBBmadonna.jpgBBBBFoligno Madonna, by RaphaelBBBBThis is Raphael's "Foligno" Madonna, painted in Y1511BBBB-Y1512BBBB.

Where I've used {N} to represent the byte(s) needed to encode the tag index, Y1511 to represent a "year" data-type and BBBB to represent a four-byte size.

Notice that we don't need the quotes and that the caption is broken up into three parts.

All and all it is a pretty compact representation. Of course, the reference table overhead is fairly large for small data, so in a simple example with one entry XML is a far better representation. Still, it makes for a nearly readable file.

With the exception perhaps of pure binary data (such as an embedded JPEG), this format is entirely translatable into standard XML, and any standard XML is translatable into this format.

CONTAINING THINGS

Still, the really clever reader will have no doubt noticed a fatal flaw in the currently described scheme.

We can delineate variable elements with a fixed size that tells us the end of the data. We can specify tags, but the current mechanism doesn't really support any type of reasonable containment.

Composite tags do have structure (because of the depth indication), but even there it does not help with the positioning and embedding of the data.

This layout how no real way of embedding an arbitrary depth of explicit tags within tags. That type of nesting isn't included, but it is crucial to making XML useful so we will have to add it.

Mostly to define some container scheme (particularly if it is recursive), we generally need two explicit tokens in the data, a start one and an end one. In this case, although it is tempting to use the element tag itself as the start token that assumes that most things will be containers.

For this scheme, since we are using it to deal with massive amounts of data, it is best to assume that most tags are not actually composite. That is, that most of them contain a one-to-one relationship with some discrete piece of data. If the structure outweighs the data, then pure XML is a better choice for representation.

Now, to keep this simple, we'd really only like to introduce the most minimal syntax to accommodate the container start and ends.

One easy and inexpensive way is to directly embed them into the existing tag mechanics. We simply just create a one-byte tag for each of them, drawing very little from the reference table. Thus for the original example, we could have a table like:

crossref
__BEGIN__###[]
__END__###[]
painting##empty#[]

img##empty#[]

src####[]

alt####[]

caption###[]

date##date#[]

end-crossref

Where we've picked up the convention of wrapping system-tokens in preceding and proceeding double underscores. Then BEGIN and END just delineate the boundaries for any tag container.

This can encode our earlier example as:

{3}{1}{4}{1}{5}BBBBmadonna.jpg{6}BBBBFoligno Madonna, by Raphael{2}{7}{1}BBBBThis is Raphael's "Foligno" Madonna, painted in {8}Y1511BBBB-{8}Y1512BBBB.{2}{2}

And of course, the first and second tag are only a byte each.

Now, of course, BEGIN and END can float to other locations in the table. Any interpreter much just bind back the syntax __BEGIN__ to the operation of pushing a new context on the stack, and __END__ of popping it. Un-tagged data is then just matched to the current context. Tags, values, data, etc are all converted back to XML based exactly on their explicit positions in the data.

SUMMARY

With the base information and the reference table we can encode all of the information from an XML file into several different forms of tighter representation, but still, allow it some minimal aspects of readability. PDFs too can be quickly examined, even though chunks of them can be quite unreadable to the human eye.

There are a whole lot of other games we could play to further bring down the data size, but the most important quality in this format is to still make it mostly readable.

Convenience in an operational environment, for quickly viewing files and diagnosing problems should not be underestimated in value. Our systems are so flaky that anything we can do to make it even the tiniest bit easier to find problems is going to help ensure that these technologies don't end up in the bit bucket with the rest of the might-have-beens.

When I was first considering this representation, my ultimate goal was to essentially replace "table" in a relational database with an "XML tree". To create a new type of database.

SQL crafts set-oriented arrangements that return tables while operating on tables. We could keep the same "set" mechanics, but return a tree while operating on forests (sets of trees).

We could also consider an XML tree to be a table with an infinite number of columns and some inter-column relationship.

I'm not sure how these ideas would work in practice, but by keeping the data and index separate in "forest" files, and then using bXML as the basis for storage mechanism, the actual coding of this esoteric type of database should be fairly well understood.

Another point is that it should be fairly trivial to create some bXML -> XML and vice versa tools. In that way, if this format is being used to move around massive data then it won't require new special parts to get it embedded into the infrastructure. It's almost the equivalent of just zipping up the file, accept that it still mains some readability while in progress.

20 comments:

anon_anonSeptember 14, 2009 at 3:19 AM
vtd-xml may be worth a look if you want faster perfomrance, built-in indexing, and fast update/delete/modification

http://vtd-xml.sf.net
ReplyDelete
Replies
Paul W. HomerSeptember 14, 2009 at 9:31 AM
Hi Dontcare,

Thanks for the comment (nice user name). VTD-XML looks interesting, although it's a little more complex (and a little less readable) than I was trying for.

There are a huge number of variations on XML that would be very useful if they get out there in practice. What makes a format like PDF interesting, is how quickly it was able to get out into the wild when it was first introduced. So many technologies never get very far from the drawing board.

Paul.
ReplyDelete
Replies
gallier2September 14, 2009 at 2:32 PM
Just a little remark concerning your introductory paragraph. It's not that your articles are not interesting, they are and I enjoy them and you're on my rss reader so I don't miss an update. The problem I have with you posts, is that they are awfully long. It takes really hard dedication to read a post through. It would be better IMHO to split a theme on several posts, so that a little more interaction can be built with readers via comments. With your extremly thorough and long articles, there is not much that can be added.
As for the real subject of todays installment, no comment as I haven't read it yet.
ReplyDelete
Replies
Paul W. HomerSeptember 14, 2009 at 2:48 PM
Hi gallier2,

Thanks for the comments. Yes, length has been one of those common complaints that I really really should address. Not only does it turn people off, but it also makes it much harder for me to write and edit.

Lately I've been trying to shorten the posts (I've been very pressed for time this summer), but I can't seem to figure out how to work this into my writing style (if you can call it that) without just overly trivializing things or making each post just a series of links to other posts.

I promise to try harder :-)

Paul.
ReplyDelete
Replies
Paul W. HomerSeptember 14, 2009 at 2:52 PM
I meant to add this point to the original post, but I keep forgetting:

If I were using this format as the basis of some type of database table, it would require a paging mechanism to be overlaid onto the data (otherwise inserts would be horrifically expensive as the data grows).

Something simple like breaking the file up into blocks for sub-trees and using block-references in the index instead of character positions. There are lots of good operating system examples for handling this type of problem (or perhaps try merging this with a gap buffer like an editor).

Paul.
ReplyDelete
Replies
neilSeptember 14, 2009 at 3:36 PM
Please don't stop what you are doing!

I don't understand half the stuff that you are talking about—but at least you aren't talking down to me. And the half that I understand makes me want to understand the half that I don't.

Gallier2 is [half right] it's a lot to take at a single bite—but sometimes that just has to be.

I'm an [old] neophyte at this computing game, and one of the hardest things [for me] is that people want to give me THE truth on a plate—ADTs are just that, abstract, nothing I have to worry about…

You don't do that—you highlight the uncertainty of computing, and the fallacy of our thinking that we can ever get it right. You know your Godel.

I decided to do a maths and computing degree because of a boy named Paul, who said, "it'll help you understand some of the more esoteric blogs", and because of you. [who hosts one].

Please keep it up, perhaps some time I'll have something interesting to say…
ReplyDelete
Replies
Paul W. HomerSeptember 14, 2009 at 10:28 PM
Hi Neil,

Thanks for the encouragement, I appreciate it far more than I'm willing to admit in public :-)

I wasn't really thinking of quitting, but just picking topics that aren't nearly as esoteric. Most of what I write has a practical side even if I do tend to dip towards the theoretical (dark) side from time to time. Still I could be more practical.

Recently though, I've become a interested in some aspects of the infamous P=NP problem. Theory does set the limits of practice, which indirectly effects what we build. Improvements in pure Computer Science drive our ability to write bigger, better more suitable programs. Still, sometimes it's hard to see the applicability in many routine programming tasks.

If I get time to write them, my next set of posts may seem a long way off from day-to-day programming issues, but like the normalization posts I'd really like to get the ideas out there, so that at some point they find their way into usefulness.

If you ever have any questions or perhaps some insights, don't be afraid to comment. Sometimes I skip over bits or use weird definitions (sometimes I don't even know what I mean, 62% of the time I even make up fake statistics). I'm always happy to hear from people, and if nothing else questions help me to understand where the weaknesses lie in my own understanding, explanations and writing.

Paul.
ReplyDelete
Replies
AnonymousSeptember 16, 2009 at 10:21 AM
Hej, Paul,

For what it's worth, I also enjoy your posts and also find them too long; but I don't mean that at all negatively, I'm just one of those Information-Agers with a short attention ... watchumacallit.

I think your posts are perfect for the chap/ette with time on his/her hands. And you also write very skimmably, so perhaps you work both audiences.

This, however, concerned me: "I'm split between not caring and just doing my own thing, or trying harder to pick more accessible topics to gain acceptance."

I think as soon as you start picking topics because of their accessibility, you'll have taken your first step towards quitting (despite your comment to Neil).

There are loads and loads of bloggers out there who write what they think others want to read.

What makes you unique is that you're the only blogger in the world - in the universe - with Paul W. Homer's point-of-view: it's the only thing that makes you more, "Sellable," than any other blogger out there.

That you would consider diluting this rare mixture is perplexing.

Keep up the good work,

Ed.
ReplyDelete
Replies
Paul W. HomerSeptember 16, 2009 at 12:58 PM
Hi Ed,

Thanks for the comments, it's actually worth a great deal.

I guess sometimes when posting, I'm greeted by that deafening wall of silence and I start to wonder whether it's all really worth it. I don't blog because I'm narcissistic (I think), nor for the money. I do it because mostly I am hoping to make some type of difference, perhaps to push things back into some better shape.

But in blogging, a lack of mass popularity often leaves open the question of whether or not you are wrong, crazy, can't write well enough or people are just disinterested. I can't help wondering sometimes if or why people really read my ramblings. So it is hugely comforting when people do respond. Especially positively, it makes a big difference.

I'll stick to the things I find interesting (but try harder to explain why they are relevant). I also think I'll experiment with spreading topics across a several posts. Perhaps use a few more bullet points as well. I'd like to make the things I'm saying more accessible, so hopefully they reach a wider audience, but I promise not to water them down too much.

Thanks again for commenting :-)

Paul.
ReplyDelete
Replies
SeitiSeptember 17, 2009 at 7:53 PM
I do read it too, despite english not being my first language.

And yes, the posts ARE long. Very long. But there´s something in them that keeps me reading every single one. =)
ReplyDelete
Replies
Paul W. HomerSeptember 18, 2009 at 10:15 AM
Hi Seiti,

Thanks for the comments.

I usually mean to write something shorter, mostly because the editing can take so long (I am a sloppy editor) and once I've decided on a topic I'm usually pretty keen to get it out there as fast as possible.

But once I start writing, I just keep thinking of more things that are related ....

Paul.
ReplyDelete
Replies
techzenSeptember 18, 2009 at 11:55 AM
Read a few posts on your blog and really enjoyed reading them ! :)
ReplyDelete
Replies
Paul W. HomerSeptember 18, 2009 at 4:08 PM
Hi Techzen,

Thanks for the comment!

I'm glad to see that there is a whole group of people out there reading the blog that I haven't heard from before. I get the stats for subscribers, but it's hard to know why they are subscribing or if they are really reading things.

Paul.
ReplyDelete
Replies
AstrobeSeptember 20, 2009 at 1:08 PM
This bXML thing is quite a surprise to me, coming from you.

To me, the fact that people feel the need to go back to some form of binary format is yet another proof that XML is wrong.

Not only XML, but the whole idea of text-based communications between programs, which is a central idea of Unix. One may find XML's great father in the runoff (that later evolved in nroff/troff).
I would say that XML is the final degeneration of that idea. Even plain-text communication proponents are starting to move to less convoluted formats such as JSON.

Plain text is incredibly expensive both to parse and to transmit and store. The sole argument in its favour is that it is supposed to be human-readable.

But who would read a multi-kilobyte, deeply nested XML document? You quickly need an XML editor.

I find the argument of readability just childish and lazy. It costs very few to write a useful program that would display a binary format into something readable and usable for humans, because the code for that is already or will be part of your program anyway. Writing such a tool helps both in designing and testing the main program, and it will fit your needs better than any plugin for your favourite editor.

But the real point where we disagree is when you say that XML is "simple" and "powerful".
It is clearly not simple. For instance, no one knows how to use attributes. For instance, the <tag> </tag> syntax is awfully redundant. For instance, a security hole was discovered recently in several XML parser libraries. One should oblige those who say that it is simple to write a parser for it pure C; I bet they will change their mind.
As for "powerful"... that word is too often used by people when they don't know what to do with something or what it can actually do for them. So when I hear that word or when I am about to use it, the signal goes red.
ReplyDelete
Replies
Paul W. HomerSeptember 21, 2009 at 10:45 AM
Hi Astrobe,

Thanks for your comments.

We programmers have a real tendency to only see technology from our own perspective. We assume that others work with it the same way, and we often get too lost in admiring the underlying details to stand back and look at the larger picture.

In that sense, text seems horribly large, inefficient and unnecessary. But from an operational side, binary is the really painful format. I don't know how many times I've been saved by the readability of PDFs in diagnosing systems-level problems. Or been able to work through big problems in systems with text based databases (there are a few out there) and config files.

There is a huge amount of convenience in being able to rely strongly on general tools, particularly when time or security become impediments to installing more specialized ones for each specialized format. We can't underestimate the importance of our technologies being easier to work with in real operational circumstances, after all, we build these things to be used. Ease of use is crucial. Not having to install (or update) software during a panic, is a great thing.

Text is, and always will be nicer than binary for the users. A great example of this was the change in Windows from ini files to the registry. While regedit is usually available, the obfuscation of hiding massive amounts of stuff in an opaque specialized database made dealing with Windows a much harder task. Plain text was way better. If it wasn't as efficient or as technologically cool, it certainly was so much less of a pain in the ass and that made a huge different to the people that use the technology.

As for XML, it is true that it hasn't always been my favorite technology, and I find large XML files a pain to deal with, even if they are displayed fancy colored and nicely indexed windows (pretty editors can't fix messy data). Still, at its heart it consists of a fairly small set of rather powerful ideas. Ideas that allow it to get leveraged for all sorts of usages. That's as much of a definition of "simple and powerful" as you can get. Contrast that with something massive and specific like JCL, AS400s or some of the larger (and uglier) GUI interface libraries. Big, ugly and inconsistent.

Small ideas that can be leveraged for big things are the essence of elegance. Even if there are attributes that we don't like (like being text-based), we can still appreciate those attributes that are good (like expressibility or the fact that the data is explicitly named (and implicitly (strongly) typed) and perhaps build on them (or mix them with other attributes from technologies that we find useful like easy-readability).

Paul.
ReplyDelete
Replies
P. M. HollottSeptember 28, 2009 at 11:52 AM
>>> Small ideas that can be leveraged for big things are the essence of elegance.

Take that to the bank. You've summed up so much that is necessary for both elegance and transparency right there. Perhaps XML doesn't shine in situations where JSON or a binary format might be more appropriate, but it definitely does when, for instance, financial declarations are made human- *and* machine-readable with XBRL.

As to your article, you are absolutely right that the continued viability of PDF is something to be emulated. I do think that the current breed of document-centric databases (Mark Logic, eXist etc), are well-positioned to provide solid indexing for XML documents, in essence giving them somewhere to live, though what you are suggesting accomplishes the same in a more portable fashion. Elegant.

Cheers,
Piers
ReplyDelete
Replies
Paul W. HomerSeptember 29, 2009 at 9:41 AM
Hi Piers,

Thanks for the comments. I tend to think that elegance is driven both by the beauty of the underlying design, but also by how easily it can be used. It's too easy to create intricate, yet frail technology. But ultimately we build tools for people, and it's in that capacity that we have to measure the success (or failure) of our efforts.

Paul.
ReplyDelete
Replies
AlSeptember 30, 2009 at 10:56 AM
Hi,

A quick comment for I do not have yet the time to read you post and the other comments, I'll come back when I'll have more time.

I have been an irregular reader of your blog ; I do not mind having to read long articles, as yours are most often very insightful (which is why I still come back from time to time). Of course it requires time to read and to understand. But, heck, should all writings fall to the length of a twit ?

Trying to grasp a subject that is not solved by some easy formal algorithm (which is, in fact, almost everything in our human lives) can be long, error-prone, and might require several iterations to have some understanding of it. Writing it down can take a few lines, or a few 1000 pages books, depending of the complexity of the problem and the time you take to think of it. Your posts are longer than we can see on other blogs, indeed, but I do not see this as a problem.

The corollary to the big time you need to think of it, and write your blog posts, is that your readers need time to understand what you mean. And, if they want to answer and leave a comment, they can have the feeling that such long posts deserve long, well-thought, exhaustive answers. And that requires time and energy.

All this to say that I do not think that the number of comments are a correct measure of your readers' interest in your work. Well, the number of comments to *this* post might give a more correcte measure than usual, but even then it's only the readers that care enough about you to say "keep up with the good work".

You know that the net sets some distance between the writer and the reader, that communication over the net is much more impersonal than men were used to. Some might be interested in your blog and not think that the guy writing those posts is a human that needs some feedback, answers, questions, people saying "hi" - without wich you might feel that you are speaking all alone in some big giant electronic version of the Sahara...

Well, my comment is a bit longer than i wanted it to be. Hope you don't mind ;)
ReplyDelete
Replies
Paul W. HomerSeptember 30, 2009 at 1:16 PM
Hi Al,

Thanks for the comments. No, I don't mind at all, in fact I really do appreciate it :-)

I think the hardest part of blogging comes from my trying to find ways to express some of my internal understandings. Often I know what I want to say, but I have no real idea how to say it.

In that sense, I'm occasionally concerned that I sound way more eccentric than I really am. That the silence is somehow a confirmation that what I am saying is a little 'off'.

Mostly I try not to let that affect me. But there is always that part of me that really wants to make a difference, and to accomplish that, I need to influence someone.

Your point about the size of the posts (and their depth) making it harder for people to comment is 'spot on'. Funny enough, since I'm a frequent commenter on other blogs, I should have seen that clearer. I think I'll experiment around with trying to make it easier for people to add in their input (a few open questions perhaps?).

Paul.
ReplyDelete
Replies
AlOctober 1, 2009 at 5:37 AM
Thanks for your answer ;)

Now I've read you post and the other comments... Having worked on "sensitive" systems, I like being able to debug quickly some data, even with a console editor such as vi(m). I haven't had the need for a binary-indexed XML format, but I guess some people might like/require the easy conversion between usual XML and some more "dense" yet "readable" format.

I like XML, but as many technologies it has been so much abused that it's hard not to be weary when someone wants to tinker with it... (I confess having overused it myself sometimes)

Back to the meta-blogging, of course you write partly to clear up your mind (which requires long self-to-self thinking that can sometimes only been solved by writing it down), partly to share your ideas with others and interact with them. Well, many who write books or press articles, or speak on the radio/tv are in the same state of mind. The publication and the author/reader interaction are made easier with blogs, but it is not that different.

The sound of silence can be deafening sometimes...

(I guess I'll never manage to write a short comment... I'm not sure it's a problem anyway)
ReplyDelete
Replies

Add comment

Thanks for the Feedback!