Code compression with bytecode

pixel · Post by **pixel** » Wed Mar 12, 2014 3:00 am

Don't know if somebody came up with this idea previously, but I'm pretty sure it didn't happen in the VIC community.

The idea is to move repeated code patterns into subroutines which share a single page and to then have them called by a bytecode interpreter. Each bytecode would be the address on that page. There'd be two kinds of subroutines: one for which the register values are preserved and one for which they aren't, so the interpreter can be extended. You probably wouldn't want to write the subroutines yourself, merely have a tool to convert most of your code. If calculated right, if I had dealt with the game logic that way, pulse would have had half a kilobyte more space left and a bunch of people would hate me now.

Here's what the interpreter would look like:

Code: Select all

interpret:                                                                                                                         
.(
    ldy #0
    lda (bytecode_ptr),y
    beq done
    inc bytecode_ptr
    asl  ; drawback of subroutines only on even addresses
    bcs interpreter_extension
    sta jump+1
    lda bytecode_flags
    ldy bytecode_y
    pha
    lda bytecode_a
    plp
jump:
    jsr bytecode_funs
    sta bytecode_a
    sty bytecode_y
    php
    pla
    sta bytecode_flags
    jmp interpret
call_interpreter_extension:
    sta jump2+1
jump2:
    jsr interpreter_extensions
    jmp interpret
done:
    rts
.)

interpreter_extensions:
bc_lda:
    lda (bytecode_ptr),y
    sta bytecode_a
    inc bytecode_ptr
    rts

bc_bcs:
.(
    lda bytecode_flags
    pha
    plp
    bcc r
    lda (bytecode_ptr),y
    sta bytecode_ptr
r:  inc bytecode_ptr
    rts
.)

FD22 · Post by **FD22** » Wed Mar 12, 2014 4:52 pm

I've experimented with bytecode interpreters in 6502 in the past - the problem is always the tradeoff between size and speed, as inevitably the interpreter and its' child functions are slower to execute than the equivalent 'raw' opcodes.

Where space is at a premium (and on the VIC this is pretty-much all the time) I embed Exomizer to manage runtime decompression, and always slap the final executable through the commandline invocation of it just to crunch the whole lot.

Kweepa · Post by **Kweepa** » Fri Mar 14, 2014 5:40 pm

It's called BASIC

groepaz · Post by **groepaz** » Sat Mar 15, 2014 2:13 pm

you could try the "sweet16" interpreter... its originally from the apple2 ROM if i recall correctly, available at wozniacs web site.

and yes, BASIC is the way to go if you are really short on memory

RJBowman · Post by **RJBowman** » Sun Mar 16, 2014 1:00 pm

Search for "huffman threaded code"; probably the most compact coding system ever devised, but at the expense of speed.

pixel · Post by **pixel** » Mon Mar 17, 2014 5:22 am

Ah, cool! Thanks! So the name of the game is "threaded code"... https://en.wikipedia.org/wiki/Threaded_code

So, ermh, I wanted to propose direct threading with the size benefits of token threading. This is faster than SWEET16 or BASIC and can provide much more compression. According to the Wikipedia article you can end up with a quarter to eigth of the original code size (see "Token threading"). Generic data compression IMHO won't go much further and is less practical at run-time. I'm still very excited about this.

I wonder how much space a Huffman tree would take.

Kweepa · Post by **Kweepa** » Mon Mar 17, 2014 10:58 am

Size of Huffman tree depends on the number of tokens N, the size of a token T, and the size of a jump J in the tree (which maxes out at half the size of the tree).
N * sizeof( T ) + ( N - 1 ) * sizeof( J ).
This is for N = 2^X. Might be a little different if 2^(X-1) < N < 2^X, I don't remember.

RJBowman · Post by **RJBowman** » Wed Mar 19, 2014 11:01 pm

There is an overhead in the huffman tree and code required to make it work. It is only worthwhile if the program is large enough that the compression will save more memory than the huffman overhead consumes.

Mike · Post by **Mike** » Thu Mar 20, 2014 10:36 am

IMO, some people should better concentrate on putting out actual, *working* code before theorizing about alternative encoding schemes for their algorithms.

And, for most non-trivial programs, the size of the code is usually dwarfed against the size of the data being processed.

pixel · Post by **pixel** » Thu Mar 20, 2014 12:12 pm

Mike wrote:IMO, some people should better concentrate on putting out actual, *working* code before theorizing about alternative encoding schemes for their algorithms.

Sorry.

Couldn't let the tempation pass by with the only VIC-20 programming forum right under my nose. And I don't wanna code all this really.

Mike · Post by **Mike** » Thu Mar 20, 2014 12:39 pm

PM sent

Vic20-Ian · Post by **Vic20-Ian** » Thu Mar 20, 2014 12:51 pm

Mike wrote:IMO, some people should better concentrate on putting out actual, *working* code before theorizing about alternative encoding schemes for their algorithms.

IMHO people should be more friendly on this forum

rhurst · Post by **rhurst** » Thu Mar 20, 2014 3:03 pm

I think it's a German thing. It does explain why so much engineering efficiency comes from that demographic; but terse can be misunderstood by us regular folk.

pixel · Post by **pixel** » Thu Mar 20, 2014 11:02 pm

rhurst wrote:I think it's a German thing. It does explain why so much engineering efficiency comes from that demographic; but terse can be misunderstood by us regular folk.

No worries. We deal with each other in your preferred efficient manner.

johncl · Post by **johncl** » Mon Nov 24, 2014 6:00 am

Smart ways of crunching data is always welcome in my world, and even if some are just theoretical tickling of the brain - I welcome the discussion. Coming from a Java world, some sort of bytecode thing is interesting, and as several have said BASIC is generally just that, but an idea that is more optimized to whatever program you have at hand is interesting. For example my program so far already have lda #0, ldy #0 and ldx #0 about 30 times each taking 2 bytes - actually often 3 bytes including the sta, sty or stx that often comes after (but naturally into different memory). So some places bytecode is an interesting idea, although I can see it requiring dynamic deflating which would take quite a bit of time and embedding e.g exomizer to unpack bits of code needed might be a better and shorter way to the goal (which I am using in a C64 project now). I actually spent several weeks with sprite compression on the C64, even including frame difference compression (may sprites had many parts of the frame identical to the previous in an animation) - and I was really satisfied with the results. Then I checked out exomizer and it was able to squeeze those sprites down even further than my own highly specialized routine.

- And personally I don't find the exomizer depacker very large (it requires a 156 bytes decrunch table area that can be used freely inbetween depacks). But I am uncertain that I would use it for anything in an unexpanded Vic20 except to allow depacking into areas which ROM does not load into (like the first pages), but even that is solved easily with a boot loader.

I know a lot of what could be optimized through some bytecode thing can be optimized away at a later stage, like figuring out when you can leave out the clc before an adc, although in many cases my adc's are often used to really add the y register being used in an indirect lda/sta in the loop. So there are bit of these (tya, clc, adc #NN, tay) to be found - if not for the variable NN I guess I could do a jsr to save some bytes at the expense of speed. A deflated bytecode version would ofc be faster except for the time deflation happens ofc.

Anyway an interesting thought experiment.

Atm I am happy just finding (as you have seen in my posts) what memory I can use - and I try to do short optimizations like a print routine that does not need zero termination but rather use a bit in the string bytes to indicate termination (or special code for e.g. a jump or color change). The tradeoffs are somewhat vague though at times where perhaps a ROM routine would be easier to use, although I tend to try to avoid having to set full word pointers but bunch text into blocks that can be indexed by a byte.

Edit: Checking the 6502 opcode table I see that low nybble values $3, $7, $b and $f are unused (lower two bits set of opcode) so nice byte keys to use as deflater keys. No doubt exomizer does any compression better so I am sure this is somewhat silly to contemplate.

- But I see a number of "patterns" in my code when I look at it - like ldx before its being used in a sta/lda NN,x (same with y). Of course the many lda NN, sta MM (and similar for x,y). And lets not forget the initialization of a,x,y (varying number of these) before calling some subroutine.

LDXSTA NN,MM = ldx #NN , sta MM,x

Yay! Saved one byte!

Denial

Code compression with bytecode

Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode

Re: Code compression with bytecode