ML optimizations

Basic and Machine Language

Moderator: Moderators

Post Reply

Are you interested in discussions about ML oprtimizations?

yes
16
94%
no
0
No votes
don't care
1
6%
 
Total votes: 17

User avatar
Schlowski
NoMess!
Posts: 893
Joined: Tue Jun 08, 2004 12:20 pm

ML optimizations

Post by Schlowski »

I always think of ML programming as two big parts of fun, first to get it up and running and second to optimize it like hell.

For the first part you need some experience and a lot of patience, but for the second part you need sometimes some inspirations.

I would like to start some threads about a few recurring tasks and hope that some people would like to share their ideas and comments about that. Would there be any interest in something like that?

Just as a little appetizer, I thought of something like that:

How to toggle a flag

Code: Select all

InitialiseFlag:
   lda   #0
   sta   onoff

ToggleFlagV1:
   inc   onoff
   lda   onoff
   and   #$01
   sta   onoff
(11 bytes, assuming onoff not ZP)

ToggleFlagV2:
   sec
   lda   #$01
   sbc   onoff
   sta   onoff
(9 bytes, assuming onoff not ZP)

ToggleFlagV3:
   lda   onoff
   eor   #$01
   sta   onoff
(8 bytes, assuming onoff not ZP)

Flag will be 0 for off or 1 for on.
----------------------------------------------------

InitialiseFlag:
   lda   #$55
   sta   onoff
   rts

ToggleFlagV4:
   ror   onoff
(3 bytes, assuming onoff not ZP)

Flag will be $55 or $AA
check with 
   bit adr  
   bmi / bpl for on/off
Last edited by Schlowski on Tue Sep 19, 2006 1:16 pm, edited 2 times in total.
User avatar
Mike
Herr VC
Posts: 5133
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Post by Mike »

adr: some address containing $55 or $AA

ROR adr ; toggle flag. 3 Bytes. (+1 Byte storage)

BCS or BCC ; depending whether 0 or 1 had been rotated out.

To check this flag some time later:

BIT adr ; 3 Bytes. Check bit 7 of address. (+1 Byte storage)

BMI or BPL ; after that

[...]
first to get it up and running and second to optimize it like hell
Two questions:

- does the program fit into the available space?
- does the program (speed-wise) fulfil the expectations?

Regards the 2nd point. I would optimise a program if it is expected to execute - say - within a full video frame (for smooth animation). But doesn't do it so far. Sometimes a big as heavy optimisation doesn't cut it, and you'll have to use alltogether another algorithm. As soon as it functions, I won't exchange further "optimised" code against understandability and maintainability.

The point is, most of these optimisations tend to break as soon as you alter the program to add further functionality. One part of the code optimized may assume that X=0 before entering - as it really has been the case. So you leave out LDX #0 - 2 bytes saved. Six months later you insert a routine at that place, which alters X - you didn't know anymore it shouldn't - and suddenly the program barfs.

This happened often enough to me, more than 15 years ago, so that I really refrain from that type of optimisation.

Michael
User avatar
Schlowski
NoMess!
Posts: 893
Joined: Tue Jun 08, 2004 12:20 pm

Post by Schlowski »

I agree, therefore I more or less tend to make some sort of 'peephole optimization', i.e. do not assume anything outside my optimization.

This sort of side effects are deadly and I give up on them since years, like you :-)

Mostly it is not necessary to optimize my program, but since it's fun I do it anyway. And you can always learn something new this way, at least I do...
User avatar
Schlowski
NoMess!
Posts: 893
Joined: Tue Jun 08, 2004 12:20 pm

Post by Schlowski »

Another little proc I optimized in the course of the years:

Increment a word

Code: Select all

IncWordV1:
   clc
   lda   WordPtr
   adc   #1
   sta   WordPtr
   lda   WordPtr+1
   adc   #0
   sta   WordPtr

IncWordV2:
   inc   WordPtr
   bcc   NoOverflow
   inc   WordPtr+1
NoOverflow:
This can be used for example for calculation of offsets in screen and color ram (assuming col in X and row in Y):

Code: Select all

Screen = $1C00
CRAM = $9600
   
; column in X-reg
; row in Y-reg
CalcPos:
   ; store column directly
   stx   $FB
   stx   $FD
   lda   #>Screen
   sta   $FC
   lda   #>CRAM
   sta   $FE
 
   tya
   beq   NoRows

NextRow:
   clc
   lda   $FB
   adc   #$14
   sta   $FB
   sta   $FD
   bcc   NoCarry
   inc   $FC
   inc   $FE

NoCarry:
   dey
   bne   NextRow
  
NoRows:
   ; now ($FB) points to position in screen
   ; and ($FD) points to position in color RAM
   rts
This uses no lookup for offsets, which would be 23*2 bytes=46 bytes, the whole routine is 34 bytes long. But I'm quite sure that there is a table of offsets in ROM, isn't it? So maybe using that table would make a shorter version...
User avatar
Schlowski
NoMess!
Posts: 893
Joined: Tue Jun 08, 2004 12:20 pm

Post by Schlowski »

Incorporated Mikes Flag solution to the first post, great idea!
carlsson
Class of '6502
Posts: 5516
Joined: Wed Mar 10, 2004 1:41 am

Post by carlsson »

Atari 2600 programmers optimize for speed all the time. Or rather, they need to time the code with a frame, vblank etc. Certainly we VIC programmers need to do that as well when working with raster based stuff, but it is not always required..

Personally I like more to optimize for size (e.g. MiniGame compo) than for speed. You find that your program is two bytes too large and go through every routine, trying to find a way to squeeze those last few bytes. Sometimes it ends up in completely rewriting a routine or even the calling mechanism, and you keep separate versions of the code to compare in the end if you really did save some bytes. Perhaps there are two similar routines you try to merge into one using some parameter for selecting functionality. Possibly there is a packer you can use, but for games that are up to 1K large, rarily a packer does a good job considering you need to include the depacker as well.
Anders Carlsson

Image Image Image Image Image
User avatar
Mike
Herr VC
Posts: 5133
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Post by Mike »

Schlowski wrote:This uses no lookup for offsets, which would be 23*2 bytes=46 bytes, the whole routine is 34 bytes long. But I'm quite sure that there is a table of offsets in ROM, isn't it? So maybe using that table would make a shorter version...
Or a routine? ;) If you don't mind sending the cursor around ...

Code: Select all

LDX column
LDY row
CLC
JSR $FFF0 ;Read / Set Cursor X/Y Position
LDY column
Access Text RAM with LDA/STA ($D1),Y
Access Colour RAM with LDA/STA ($F3),Y

One small caveat: The routine works with logical, not physical lines!

Michael
User avatar
Schlowski
NoMess!
Posts: 893
Joined: Tue Jun 08, 2004 12:20 pm

Post by Schlowski »

I want never ever again fight with logical lines! I had a lot of trouble with them in my VideoPoker game, that was enough for the rest of my live ;-)

Otoh after a call to CLS there should be no linked lines anyway on screen, right?

Björg
mercifier
Vic 20 Drifter
Posts: 23
Joined: Sun Jun 18, 2006 4:17 pm

Post by mercifier »

About saving instructions:

I think it's OK to exclude unneeded loading instruction, if you just make a comment about it. Put the instruction there with a semicolon before and make a short note. This will make it easy to change it later if needed.

Code: Select all

;LDX #0  -  X is already  zero here
STX $9120
or

Code: Select all

BCS away
;CLC - carry tested by branch instruction above
ADC #x

Using comparing instructions is sometimes an interesting way to set carry for rotating, adding or subtracting.

Code: Select all

CPX #1
ROL q ;set lowest bit of q if X is nonzero

CMP #$80
ROL ; 8 - bit rotation of A

CMP #$80
ROR ;arithmetic shift right

CPY #limit
ADC #$ff ;decrement A if Y is below limit.
tlr
Vic 20 Nerd
Posts: 594
Joined: Mon Oct 04, 2004 10:53 am

Post by tlr »

Schlowski wrote:

Code: Select all

IncWordV2:
   inc   WordPtr
   bcc   NoOverflow
   inc   WordPtr+1
NoOverflow:
inc does not set the carry flag. This should be:

Code: Select all

IncWordV2:
   inc   WordPtr
   bne   NoOverflow
   inc   WordPtr+1
NoOverflow:
User avatar
Schlowski
NoMess!
Posts: 893
Joined: Tue Jun 08, 2004 12:20 pm

Post by Schlowski »

Oops :oops:

You're right, I always use this with an ADC before incrementing the high-byte.

Thanks for correcting me, such errors can start hours of debugging...

Björg
carlsson
Class of '6502
Posts: 5516
Joined: Wed Mar 10, 2004 1:41 am

Post by carlsson »

Of course, sometimes you can take advantage of that some instructions such as INC, DEC, LDA etc don't affect carry flag.
Anders Carlsson

Image Image Image Image Image
Post Reply