ML optimizations

Schlowski · Post by **Schlowski** » Tue Sep 19, 2006 4:14 am

I always think of ML programming as two big parts of fun, first to get it up and running and second to optimize it like hell.

For the first part you need some experience and a lot of patience, but for the second part you need sometimes some inspirations.

I would like to start some threads about a few recurring tasks and hope that some people would like to share their ideas and comments about that. Would there be any interest in something like that?

Just as a little appetizer, I thought of something like that:

How to toggle a flag

Code: Select all

InitialiseFlag:
   lda   #0
   sta   onoff

ToggleFlagV1:
   inc   onoff
   lda   onoff
   and   #$01
   sta   onoff
(11 bytes, assuming onoff not ZP)

ToggleFlagV2:
   sec
   lda   #$01
   sbc   onoff
   sta   onoff
(9 bytes, assuming onoff not ZP)

ToggleFlagV3:
   lda   onoff
   eor   #$01
   sta   onoff
(8 bytes, assuming onoff not ZP)

Flag will be 0 for off or 1 for on.
----------------------------------------------------

InitialiseFlag:
   lda   #$55
   sta   onoff
   rts

ToggleFlagV4:
   ror   onoff
(3 bytes, assuming onoff not ZP)

Flag will be $55 or $AA
check with 
   bit adr  
   bmi / bpl for on/off

Mike · Post by **Mike** » Tue Sep 19, 2006 7:15 am

adr: some address containing $55 or $AA

ROR adr ; toggle flag. 3 Bytes. (+1 Byte storage)

BCS or BCC ; depending whether 0 or 1 had been rotated out.

To check this flag some time later:

BIT adr ; 3 Bytes. Check bit 7 of address. (+1 Byte storage)

BMI or BPL ; after that

[...]

first to get it up and running and second to optimize it like hell

Two questions:

- does the program fit into the available space?
- does the program (speed-wise) fulfil the expectations?

Regards the 2nd point. I would optimise a program if it is expected to execute - say - within a full video frame (for smooth animation). But doesn't do it so far. Sometimes a big as heavy optimisation doesn't cut it, and you'll have to use alltogether another algorithm. As soon as it functions, I won't exchange further "optimised" code against understandability and maintainability.

The point is, most of these optimisations tend to break as soon as you alter the program to add further functionality. One part of the code optimized may assume that X=0 before entering - as it really has been the case. So you leave out LDX #0 - 2 bytes saved. Six months later you insert a routine at that place, which alters X - you didn't know anymore it shouldn't - and suddenly the program barfs.

This happened often enough to me, more than 15 years ago, so that I really refrain from that type of optimisation.

Michael

Schlowski · Post by **Schlowski** » Tue Sep 19, 2006 8:48 am

I agree, therefore I more or less tend to make some sort of 'peephole optimization', i.e. do not assume anything outside my optimization.

This sort of side effects are deadly and I give up on them since years, like you

Mostly it is not necessary to optimize my program, but since it's fun I do it anyway. And you can always learn something new this way, at least I do...

Schlowski · Post by **Schlowski** » Tue Sep 19, 2006 12:40 pm

Another little proc I optimized in the course of the years:

Increment a word

Code: Select all

IncWordV1:
   clc
   lda   WordPtr
   adc   #1
   sta   WordPtr
   lda   WordPtr+1
   adc   #0
   sta   WordPtr

IncWordV2:
   inc   WordPtr
   bcc   NoOverflow
   inc   WordPtr+1
NoOverflow:

This can be used for example for calculation of offsets in screen and color ram (assuming col in X and row in Y):

Code: Select all

Screen = $1C00
CRAM = $9600
   
; column in X-reg
; row in Y-reg
CalcPos:
   ; store column directly
   stx   $FB
   stx   $FD
   lda   #>Screen
   sta   $FC
   lda   #>CRAM
   sta   $FE
 
   tya
   beq   NoRows

NextRow:
   clc
   lda   $FB
   adc   #$14
   sta   $FB
   sta   $FD
   bcc   NoCarry
   inc   $FC
   inc   $FE

NoCarry:
   dey
   bne   NextRow
  
NoRows:
   ; now ($FB) points to position in screen
   ; and ($FD) points to position in color RAM
   rts

This uses no lookup for offsets, which would be 23*2 bytes=46 bytes, the whole routine is 34 bytes long. But I'm quite sure that there is a table of offsets in ROM, isn't it? So maybe using that table would make a shorter version...

Schlowski · Post by **Schlowski** » Tue Sep 19, 2006 1:12 pm

Incorporated Mikes Flag solution to the first post, great idea!

carlsson · Post by **carlsson** » Tue Sep 19, 2006 5:40 pm

Atari 2600 programmers optimize for speed all the time. Or rather, they need to time the code with a frame, vblank etc. Certainly we VIC programmers need to do that as well when working with raster based stuff, but it is not always required..

Personally I like more to optimize for size (e.g. MiniGame compo) than for speed. You find that your program is two bytes too large and go through every routine, trying to find a way to squeeze those last few bytes. Sometimes it ends up in completely rewriting a routine or even the calling mechanism, and you keep separate versions of the code to compare in the end if you really did save some bytes. Perhaps there are two similar routines you try to merge into one using some parameter for selecting functionality. Possibly there is a packer you can use, but for games that are up to 1K large, rarily a packer does a good job considering you need to include the depacker as well.

Mike · Post by **Mike** » Wed Sep 20, 2006 1:20 am

Schlowski wrote:This uses no lookup for offsets, which would be 23*2 bytes=46 bytes, the whole routine is 34 bytes long. But I'm quite sure that there is a table of offsets in ROM, isn't it? So maybe using that table would make a shorter version...

Or a routine?

If you don't mind sending the cursor around ...

Code: Select all

LDX column
LDY row
CLC
JSR $FFF0 ;Read / Set Cursor X/Y Position
LDY column

Access Text RAM with LDA/STA ($D1),Y
Access Colour RAM with LDA/STA ($F3),Y

One small caveat: The routine works with logical, not physical lines!

Michael

Schlowski · Post by **Schlowski** » Wed Sep 20, 2006 1:34 am

I want never ever again fight with logical lines! I had a lot of trouble with them in my VideoPoker game, that was enough for the rest of my live

Otoh after a call to CLS there should be no linked lines anyway on screen, right?

Björg

mercifier · Post by **mercifier** » Fri Sep 29, 2006 1:15 pm

About saving instructions:

I think it's OK to exclude unneeded loading instruction, if you just make a comment about it. Put the instruction there with a semicolon before and make a short note. This will make it easy to change it later if needed.

Code: Select all

;LDX #0  -  X is already  zero here
STX $9120

or

Code: Select all

BCS away
;CLC - carry tested by branch instruction above
ADC #x

Using comparing instructions is sometimes an interesting way to set carry for rotating, adding or subtracting.

Code: Select all

CPX #1
ROL q ;set lowest bit of q if X is nonzero

CMP #$80
ROL ; 8 - bit rotation of A

CMP #$80
ROR ;arithmetic shift right

CPY #limit
ADC #$ff ;decrement A if Y is below limit.

tlr · Post by **tlr** » Fri Sep 29, 2006 3:22 pm

Schlowski wrote:

Code: Select all

IncWordV2:
   inc   WordPtr
   bcc   NoOverflow
   inc   WordPtr+1
NoOverflow:

inc does not set the carry flag. This should be:

Code: Select all

IncWordV2:
   inc   WordPtr
   bne   NoOverflow
   inc   WordPtr+1
NoOverflow:

Schlowski · Post by **Schlowski** » Mon Oct 02, 2006 3:23 am

Oops

You're right, I always use this with an ADC before incrementing the high-byte.

Thanks for correcting me, such errors can start hours of debugging...

Björg

carlsson · Post by **carlsson** » Mon Oct 02, 2006 6:19 pm

Of course, sometimes you can take advantage of that some instructions such as INC, DEC, LDA etc don't affect carry flag.

Denial

ML optimizations

Are you interested in discussions about ML oprtimizations?

ML optimizations