ML Optimization discussion (split from: ROM calls and other tricks)

Basic and Machine Language

Moderator: Moderators

User avatar
Noizer
Vic 20 Devotee
Posts: 297
Joined: Tue May 15, 2018 12:00 pm
Location: Europa

ML Optimization discussion (split from: ROM calls and other tricks)

Post by Noizer »

viewtopic.php?t=676&start=30

It would be nice if you would give short code examples within a code tag. Thank you in advance ☺️
Last edited by Mike on Sat Oct 24, 2020 2:36 pm, edited 1 time in total.
Valid rule today as earlier: 1 Byte = 8 Bits
-._/classes instead of masses\_.-
User avatar
J.E.E.K.
Vic 20 Drifter
Posts: 23
Joined: Wed Jan 25, 2017 12:31 pm
Website: http://klasek.at/8bit
Location: AT

Re: ML Optimization discussion (split from: ROM calls and other tricks)

Post by J.E.E.K. »

For all points or just the ones I added?
Last edited by Mike on Sat Oct 24, 2020 2:38 pm, edited 1 time in total.
User avatar
Noizer
Vic 20 Devotee
Posts: 297
Joined: Tue May 15, 2018 12:00 pm
Location: Europa

Re: ML Optimization discussion (split from: ROM calls and other tricks)

Post by Noizer »

J.E.E.K. wrote: Sat Oct 24, 2020 11:07 am For all points or just the ones I added?
Maybe for all the points. 😅 It seems Wimoos hasn‘t time left to further explain his points? Oh, probably you have to look at his last posting
Valid rule today as earlier: 1 Byte = 8 Bits
-._/classes instead of masses\_.-
User avatar
chysn
Vic 20 Scientist
Posts: 1205
Joined: Tue Oct 22, 2019 12:36 pm
Website: http://www.beigemaze.com
Location: Michigan, USA
Occupation: Software Dev Manager

Re: ML Optimization tips and tricks

Post by chysn »

It seems like ML optimization and ROM tips could be separate topics.

One solid optimization that I always try to look out for is when a subroutine sets up data and then ends in a subroutine call, like this:

Code: Select all

sub:   ; Subroutine in which something is done
       jsr $cb1e ; and then some call is made to act on data
       rts
JSR is really just a JMP that puts a return address (-1) on the stack, and RTS is really just a JMP that sets PC to a stack address+1; so you save one byte of code, two bytes of stack space, and three cycles by ending such a subroutine with

Code: Select all

sub:   ; Subroutine in which something is done
       jmp $cb1e ; etc.
VIC-20 Projects: wAx Assembler, TRBo: Turtle RescueBot, Helix Colony, Sub Med, Trolley Problem, Dungeon of Dance, ZEPTOPOLIS, MIDI KERNAL, The Archivist, Ed for Prophet-5

WIP: MIDIcast BASIC extension

he/him/his
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: ML Optimization discussion (split from: ROM calls and other tricks)

Post by Mike »

Noizer wrote:It seems Wimoos hasn‘t time left to further explain his points? Oh, probably you have to look at his last posting
IMO, the list of techniques that wimoos posted (link) is pretty much self-evident. It is you who would have to come up with a specific question about an own coding problem where you don't know how to apply one of the techniques listed, then you stand good chances that someone here will show you how. Conversely I doubt a random collection of code examples explaining these techniques would be helpful in case.
User avatar
srowe
Vic 20 Scientist
Posts: 1340
Joined: Mon Jun 16, 2014 3:19 pm

Re: ML Optimization tips and tricks

Post by srowe »

chysn wrote: Mon Oct 26, 2020 3:49 pm so you save one byte of code, two bytes of stack space, and three cycles
Actually you save 9 cycles using the approach (6+6 vs 3) subroutine calls have quite an overhead on the 6502.
User avatar
chysn
Vic 20 Scientist
Posts: 1205
Joined: Tue Oct 22, 2019 12:36 pm
Website: http://www.beigemaze.com
Location: Michigan, USA
Occupation: Software Dev Manager

Re: ML Optimization tips and tricks

Post by chysn »

srowe wrote: Tue Oct 27, 2020 1:14 am Actually you save 9 cycles using the approach (6+6 vs 3) subroutine calls have quite an overhead on the 6502.
Yes, that's right, thanks!
Last edited by chysn on Thu Nov 05, 2020 7:24 am, edited 1 time in total.
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: ML Optimization discussion (split from: ROM calls and other tricks)

Post by Mike »

Wimoos already mentioned tail call optimization in his post: viewtopic.php?t=676&start=22 ... and to quote from there:
wimoos wrote: Wed Dec 12, 2012 4:27 am[...]

Change JSRs, followed by RTS into JMPs

[...]
You'll have to be wary about two cases, where this technique cannot (easily) be applied:
  • the RTS may itself be a labeled (jump-)target, or
  • the called sub-routine uses the return address on stack. PRIMM is a good example. If you call PRIMM with JMP, wrong data will be printed (likely, from the return address one level higher) and same return address will be advanced somewhat uncontrolled, likely leading to a crash.
Any other jumps or branches pointing to that RTS (or references using the RTS opcode value as data) would need to be retargeted elsewhere.
User avatar
Noizer
Vic 20 Devotee
Posts: 297
Joined: Tue May 15, 2018 12:00 pm
Location: Europa

Re: ML Optimization tips and tricks

Post by Noizer »

J.E.E.K. wrote: Sat Oct 24, 2020 9:02 am
wimoos wrote: Wed Dec 12, 2012 4:27 amRemove NOP’s
[...]
This list could be extended by
  • Hold flag in bit 7 of a location, so you can check the flag by "BIT flag" followed by "BMI/BPL destination" keeping A untouched, clear the flag with "LSR flag". If carry is set by an condition before, use "ROR flag" to set the flag.
  • Operations without temporarily save the accumulator to memory:
    • To subtract the value in A from location's value ( (mem)-A ) use EOR #$FF; SEC; ADC mem (negate A with two's complement and simply add).
    • Merge some bits from A into a memory location using EOR mem; AND #mask_bits_taken_from_acc; EOR mem; STA mem.
😞 I‘m still waiting for some code examples, any chance ?
Valid rule today as earlier: 1 Byte = 8 Bits
-._/classes instead of masses\_.-
User avatar
chysn
Vic 20 Scientist
Posts: 1205
Joined: Tue Oct 22, 2019 12:36 pm
Website: http://www.beigemaze.com
Location: Michigan, USA
Occupation: Software Dev Manager

Re: ML Optimization tips and tricks

Post by chysn »

Noizer wrote: Sat Oct 31, 2020 5:40 am
[*]Hold flag in bit 7 of a location, so you can check the flag by "BIT flag" followed by "BMI/BPL destination" keeping A untouched, clear the flag with "LSR flag". If carry is set by an condition before, use "ROR flag" to set the flag.
😞 I‘m still waiting for some code examples, any chance ?
Set a flag (3 or 4 bytes)

Code: Select all

sec
ror flag
Check a flag (4 or 5 bytes)

Code: Select all

bit flag
bmi if_set ; ~or~ bpl if_unset
Clear a Flag (2 or 3 bytes)

Code: Select all

lsr flag
The idea here is that you save a couple bytes clearing the flag (leveraging the fact that LSR doesn't use Carry as an input), and you don't need to use a register when checking the flag. For setting the flag, you're leveraging the fact that ROR does use Carry as an input, and saving a byte over LDA/STA. The register-saving benefit of this system is almost as valuable as the smaller memory footprint, in my view.

For the other bullet point, "Operations without temporarily save the accumulator to memory," the code is right there in the nested bullet points.
VIC-20 Projects: wAx Assembler, TRBo: Turtle RescueBot, Helix Colony, Sub Med, Trolley Problem, Dungeon of Dance, ZEPTOPOLIS, MIDI KERNAL, The Archivist, Ed for Prophet-5

WIP: MIDIcast BASIC extension

he/him/his
User avatar
Noizer
Vic 20 Devotee
Posts: 297
Joined: Tue May 15, 2018 12:00 pm
Location: Europa

Re: ML Optimization tips and tricks

Post by Noizer »

chysn wrote: Sat Oct 31, 2020 10:04 pm...
Thanx for you efforts. I certainly will use some of the code snipplets as needed.
Now here are my different approaches on ML optimizations of MHO:
  • Use of self modifying code -- load everything you have to process from outer loops
  • Use of tables for all operations where
    z = x operation y — why slowly calculate values if you can simple do a table lookup with at most two bytes indexing (x and y)
  • Always starting to code keeping in mind execution time is not a bad choice — if you can do it fast, you can do it slow too
  • avoiding of PHA / PLA, if you can use zeropage addresses for storing or indexing — saves cycles and values are still available, this beans counting is more annoying than the necessary and instructive cycle counting
  • use of unintended opcodes, very handy and mind freeing, new coding possibilities with almost zero overhead — example Zeropage indirect addressing indexed in X 😵😉
to be continued
Valid rule today as earlier: 1 Byte = 8 Bits
-._/classes instead of masses\_.-
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: ML Optimization discussion (split from: ROM calls and other tricks)

Post by Mike »

Noizer, there is yet another dimension of optimization you left out in your list (BTW - been there, done that), and that is minimizing the time spent on writing the code and getting it right, to specification.

I'd presume quite a lot of fellow coders try to optimize their code for size or speed ... and never finish it.

Just to put things into perspective - I worked roughly 100 hours to implement MINIPAINT (spread over half a year in 2009, mostly on rainy afternoons and weekends), resulting in roughly 7 KB library code 'scripted' by a main program written in MINIGRAFIK-enhanced BASIC.

I optimized only a few routines of said library in any of the dimensions speed or size - the zoom routine of the main edit window and the cursor display are the most notable examples. The rest of the lot, around 50 JSR-callable sub-routines, is just straightforward code to get the job done.

So, what track record are you able to show? Finished programs I mean. No cheap talk.
User avatar
chysn
Vic 20 Scientist
Posts: 1205
Joined: Tue Oct 22, 2019 12:36 pm
Website: http://www.beigemaze.com
Location: Michigan, USA
Occupation: Software Dev Manager

Re: ML Optimization discussion (split from: ROM calls and other tricks)

Post by chysn »

Most of the time, when we speak of optimization, we're talking about things that we do to improve code when it's already somewhat-functional. But a great deal of optimization is available in the design stage.

Consider a data structure for a game enemy that requires four properties: direction, x position, y position, hit points. It's tempting to put data related to an instance in consecutive memory locations.

Code: Select all

OBJ_DATA = $03c0
DIRECTION = OBJ_DATA
X_POS = OBJ_DATA + 1
Y_POS = OBJ_DATA + 2
HP = OBJ_DATA + 3
The downside to this is that it can be costly to manage the index to the object, because there's math involved:

Code: Select all

GetHP:  asl        ; Multiply object index by 4 to get the table index
        asl
        tax
        lda HP,x
        rts
I've used this sort of data structure, which comes from my background in C, and my love of structs. Unfortunately, it's inefficient for 6502, and I've now switched to storing data with properties together instead of instances together:

Code: Select all

MAX_OBJ = 6
DIRECTION = $03c0
X_POS = DIRECTION + MAX_OBJ
Y_POS = X_POS + MAX_OBJ
HP = Y_POS + MAX_OBJ
And now all you need is an instance index to access properties

Code: Select all

lda HP,x
Notice that you do need to specify a maximum number of instances (MAX_OBJ) to allocate space. But in VIC-20 practice, this is easy enough.
VIC-20 Projects: wAx Assembler, TRBo: Turtle RescueBot, Helix Colony, Sub Med, Trolley Problem, Dungeon of Dance, ZEPTOPOLIS, MIDI KERNAL, The Archivist, Ed for Prophet-5

WIP: MIDIcast BASIC extension

he/him/his
User avatar
Noizer
Vic 20 Devotee
Posts: 297
Joined: Tue May 15, 2018 12:00 pm
Location: Europa

Re: ML Optimization discussion (split from: ROM calls and other tricks)

Post by Noizer »

Mike wrote: Wed Nov 04, 2020 3:56 pm Noizer, there is yet another dimension of optimization you left out in your list (BTW - been there, done that), and that is minimizing the time spent on writing the code and getting it right, to specification.
...
So, what track record are you able to show? Finished programs I mean. No cheap talk.
Dear Mike, some things just take time. And since it's a hobby, I alone decide when I invest time in it and when I want to make something public. I like to program cheaply in my quiet little room, I don't live for it and nobody pays me for it. When I feel like it, I write a little bit of my tiny experience in the world where you are and that's it. 😑
Valid rule today as earlier: 1 Byte = 8 Bits
-._/classes instead of masses\_.-
User avatar
J.E.E.K.
Vic 20 Drifter
Posts: 23
Joined: Wed Jan 25, 2017 12:31 pm
Website: http://klasek.at/8bit
Location: AT

Re: ML Optimization tips and tricks

Post by J.E.E.K. »

Noizer wrote: Sat Oct 31, 2020 5:40 am
J.E.E.K. wrote: Sat Oct 24, 2020 9:02 am [..]
  • [..]
  • Operations without temporarily save the accumulator to memory:
    • To subtract the value in A from location's value ( (mem)-A ) use EOR #$FF; SEC; ADC mem (negate A with two's complement and simply add).
    • Merge some bits from A into a memory location using EOR mem; AND #mask_bits_taken_from_acc; EOR mem; STA mem.
😞 I‘m still waiting for some code examples, any chance ?
Ad "subtraction":
Request Memory from String Space (BASIC V2, C64)

The length of the request (bytes to allocate) is given in A. The heap pointer ($33/$34, the result is finally in A/Y) has to be moved to lower addresses:

Code: Select all

B4F7: 49 FF     EOR #$FF
B4F9: 38        SEC
B4FA: 65 33     ADC $33       ; Pointer: Bottom of String space
B4FC: A4 34     LDY $34       ; Pointer: Bottom of String space
B4FE: B0 01     BCS $B501
B500: 88        DEY
Ad "merge":

In my Graphic-Extension code I am using a mode byte (savemo), which consists of bits from different sources:

Code: Select all

                                ; put A's bit 4-7 into savemo
        EOR savemo              ; ********
        AND #%11110000          ; ****0000
        EOR savemo              ; AAAAmmmm
        STA savemo              ;
Post Reply