LDA ($4000) and self-modifying code

Basic and Machine Language

Moderator: Moderators

Post Reply
Robbie
Vic 20 Dabbler
Posts: 84
Joined: Tue Aug 11, 2020 4:36 am
Location: England

LDA ($4000) and self-modifying code

Post by Robbie »

LDA ($4000) is what I want to do, but the 6502 isn't interested.

We can use indirect indexed zero page by replacing $4000 with $FE (or similar):
LDX #0
LDA ($FE,X)

But how would you go about achieving the equivalent of LDA ($4000)? From messing with it for a while, it seems a bit of a nightmare (or a challenge, depending what you're into).
Last edited by Robbie on Sat Sep 26, 2020 2:29 am, edited 1 time in total.
User avatar
chysn
Vic 20 Scientist
Posts: 1205
Joined: Tue Oct 22, 2019 12:36 pm
Website: http://www.beigemaze.com
Location: Michigan, USA
Occupation: Software Dev Manager

Re: LDA ($4000)

Post by chysn »

Robbie wrote: Mon Sep 21, 2020 1:39 pm LDA ($4000) is what I want to do, but the 6502 isn't interested.

We can use indirect indexed zero page by replacing $4000 with $FE (or similar):
LDX #0
LDA ($FE,X)

But how would you go about achieving the equivalent of LDA ($4000)? From messing with it for a while, it seems a bit of a nightmare (or a challenge, depending what you're into).
This might seem odd, but you could do something like this at the start of your code:

Code: Select all

lda #$ad      ; LDA
sta $3fff
lda #$60      ; RTS
sta $4002
; etc...
Then, when you need LDA ($4000), you can just do

Code: Select all

jsr $3fff
; A is the contents of ($4000)
Personally, I'd just do the ,X zero page addressing, and that's my usually practice.
VIC-20 Projects: wAx Assembler, TRBo: Turtle RescueBot, Helix Colony, Sub Med, Trolley Problem, Dungeon of Dance, ZEPTOPOLIS, MIDI KERNAL, The Archivist, Ed for Prophet-5

WIP: MIDIcast BASIC extension

he/him/his
Robbie
Vic 20 Dabbler
Posts: 84
Joined: Tue Aug 11, 2020 4:36 am
Location: England

Re: LDA ($4000)

Post by Robbie »

That's a really elegant, hacker way of doing it.

Have you used that technique 'in anger' anywhere?
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: LDA ($4000)

Post by Mike »

On the 65xx, if you (need to) hold pointer/address values 'outside' zeropage, it is a common idiom to copy/swap them over to zeropage when they're on active duty during a part of a routine and store the updated values back to their original places when they're not needed at the moment.

I.e., you were right on track with your OP, extended like thus:

Code: Select all

/* "activate" pointer in $4000 */
LDA $4000
STA $FD
LDA $4001
STA $FE

/* use working copy in $FD/$FE: */
LDY #0:LDA ($FD),Y or
LDX #0:LDA ($FD,X)

/* "retire" pointer back to $4000 */
LDA $FD
STA $4000
LDA $FE
STA $4001
You will likely see the "activate" and "retire" code snippets in the calling procedure, whereas the called procedure will use the fixed zeropage addresses to reference the pointer value as parameter.
User avatar
chysn
Vic 20 Scientist
Posts: 1205
Joined: Tue Oct 22, 2019 12:36 pm
Website: http://www.beigemaze.com
Location: Michigan, USA
Occupation: Software Dev Manager

Re: LDA ($4000)

Post by chysn »

Robbie wrote: Mon Sep 21, 2020 3:03 pm That's a really elegant, hacker way of doing it.

Have you used that technique 'in anger' anywhere?
HA! No, I have not. I like the phrase "in anger" here, though.

I always do a variation of the "activate" transfer that Mike suggests above. In most of my real-world code, the pointer is above zeropage because it's part of a larger data structure that requires a pointer to something (screen memory, etc.). So, I reserve a zeropage pointer for handling work on a current member of the data structure.

In my own code, I’ve found places where I used “retire” (or, “update”), and places where I didn’t. It depends :)

I spent some time reviewing my typical handling of pointers to data, and I usually write the "activate" as a subroutine that does something with the data. For example, in TRBo: Turtle RescueBot, there are up to (I think) six Patrols, and each Patrol is represented by an eight-byte data structure, the first two bytes being a pointer to the Patrol's screen location.

Each frame in the game calls a subroutine that iterates through each Patrol in the level, with the iterator being the Patrol index in X. This subroutine calls another subroutine that moves the Patrol based on its index. This subroutine calculates the address of the Patrol's data structure. It computes the Patrol's movements and updates the appropriate data, like the screen location. At the end, it calls a subroutine that places the character in the new position. This is a generic routine that can place any character given its address and character:

Code: Select all

; Place a Character
; Place the character on the screen at the specified address.
;
; Preparations
;     A - Low byte of the screen address
;     Y - High byte of the screen address
;     X - Character to place
;     Carry flag - Color if set, hidden if unset
PLACE:  STA DATA_L
        STY DATA_H
        TXA
        LDY #$00
        STA (DATA_L),Y
        LDA DATA_L
        LDY DATA_H      ; Falls through to CHRCOL
Only subroutines at the very end of such chains are concerned with the issue of indexed zeropage addressing. The data structure and how it's updated is handled separately.

My approach to machine language is extremely C-like, when it comes to design. I code subroutines even when it might be more efficient to write code in an inline manner. But I'm employing similar principles (transferring an address to zeropage) even if the ordering is somewhat different.
VIC-20 Projects: wAx Assembler, TRBo: Turtle RescueBot, Helix Colony, Sub Med, Trolley Problem, Dungeon of Dance, ZEPTOPOLIS, MIDI KERNAL, The Archivist, Ed for Prophet-5

WIP: MIDIcast BASIC extension

he/him/his
Robbie
Vic 20 Dabbler
Posts: 84
Joined: Tue Aug 11, 2020 4:36 am
Location: England

Re: LDA ($4000)

Post by Robbie »

I remember reading somewhere that the 6502 has so few registers because zero page is pretty much as fast.

I guess I need to just think of zero page in that way, and activating and retiring pointers as pretty much equivalent to copying them into a register for manipulation.

The idea of the code writing its own code is really interesting though, used 'in anger' or otherwise. I might have a mess with that, and see if I can come up with any practical uses for it.
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: LDA ($4000)

Post by Mike »

For code generators, I have used this subroutine now and again:

Code: Select all

.Write
 STA $FFFF
 INC Write+1
 BNE Write_00
 INC Write+2
.Write_00
 RTS
The pointer in Write+1/Write+2 gets initialised in the main program, and then the code generator calls Write with A as parameter to build yet another program in RAM by an algorithmic description. The X and Y registers are not used, which is quite useful for the calling code.

For example, my CGA viewer uses this to build the display routine. The resulting code is ~16K in size and is built from a routine that's just ~350 bytes long, which gives a 46:1 compression ratio. :mrgreen:
Robbie
Vic 20 Dabbler
Posts: 84
Joined: Tue Aug 11, 2020 4:36 am
Location: England

Re: LDA ($4000)

Post by Robbie »

I'm reading the words Mike, but my brain is refusing to process the meaning.
I shall go to bed, and try again tomorrow to understand what it is that you're telling me.
:?
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: LDA ($4000)

Post by Mike »

Robbie wrote:I'm reading the words Mike, [...]
Let me explain (you should probably try out the CGA viewer beforehand, to get a feeling about what the program actually does):

The CGA viewer takes a hidden bitmap (320x200 pixels in 4 colours) and displays a zoomed part of it, 80x64 pixels, into the screen bitmap of MINIGRAFIK (80x192 pixels in multi-colour, i.e. also 4 colours). As three multi-colour pixels are stacked top on each other on the VIC-20 screen, you get square zoomed 'pixels' (about perfect for NTSC, only slightly elongated for PAL).

The display routine takes one byte from the hidden bitmap, and writes it three times to the display bitmap, like thus:

Code: Select all

.0400  A0 00     LDY #$00
.0402  B1 FB     LDA ($FB),Y
.0404  C8        INY
.0405  8D 00 11  STA $1100
.0408  8D 01 11  STA $1101
.040B  8D 02 11  STA $1102
[...]
$FB/$FC contains a pointer to the hidden bitmap, and the display bitmap starts at $1100, so this code snippet writes the first 4 zoomed pixels at once onto screen. The display bitmap spans $1100..$1FFF, and so another 3837 bytes need to be written until the screen has been updated. As fast as possible!

Putting a loop around this code snippet and updating all addresses inside would slow down the display routine by more than a factor of 2. Therefore the whole display routine has been unrolled, i.e. the code snippet you see above continues like this:

Code: Select all

[...]
.040E  B1 FB     LDA ($FB),Y
.0410  C8        INY
.0411  8D 03 11  STA $1103
.0414  8D 04 11  STA $1104
.0417  8D 05 11  STA $1105
.041A  B1 FB     LDA ($FB),Y
.041C  C8        INY
.041D  8D 06 11  STA $1106
.0420  8D 07 11  STA $1107
[...]
... you get the idea. At some points in this unrolled routine, a few other instructions are interspersed for housekeeping reasons, but we don't need consider those here (for what I like to explain). As the above code snippet appears 1280 times in memory, with 12 bytes each, the major part of the display routine already needs 15360 bytes and it would be slightly inelegant (ho-hum ...) to write that on storage, i.e. in full in a file. There seems to be quite some regularity in the byte pattern building this routine so throwing a compression algorithm against it looks like a feasible measure to shrink the storage requirements on disk, no?

Unfortunately, the regularity only extends as far as the first four bytes ($B1, $FB, $C8, $8D), but then the changing addresses of the three STA instructions break the pattern. Normal compressing algorithms won't work well here. You can only expect the usual compression ratio of 2:1.

Instead, I use a code generator to roll out the loop in memory: it uses the Write sub-routine, and updates the addresses of the STA instructions as it progresses. That in turn need not be done in an unrolled loop (as we would then be no better off and wouldn't have any compression at all, rather a code expansion). After initialisation of the code pointer in $3192/$3193, the main part of that code generator looks like this:

Code: Select all

[...]
.3074  A0 40     LDY #$40
.3076  A9 B1     LDA #$B1   ; opcode byte of LDA ($FB),Y
.3078  20 91 31  JSR $3191
.307B  A9 FB     LDA #$FB   ; operand byte of LDA ($FB),Y
.307D  20 91 31  JSR $3191
.3080  A9 C8     LDA #$C8   ; INY instruction
.3082  20 91 31  JSR $3191
.3085  A9 8D     LDA #$8D   ; opcode byte of STA $xxxx
.3087  20 91 31  JSR $3191
.308A  A5 FB     LDA $FB    ; low-byte running address of STA $xxxx operand
.308C  20 91 31  JSR $3191
.308F  A5 FC     LDA $FC    ; high-byte running address of STA $xxxx operand
.3091  20 91 31  JSR $3191
.3094  20 8A 31  JSR $318A  ; increment value in $FB/$FC
.3097  A9 8D     LDA #$8D   ; ... write ...
.3099  20 91 31  JSR $3191
.309C  A5 FB     LDA $FB
.309E  20 91 31  JSR $3191
.30A1  A5 FC     LDA $FC
.30A3  20 91 31  JSR $3191
.30A6  20 8A 31  JSR $318A
.30A9  A9 8D     LDA #$8D   ; ... three STA $xxxx instructions,
.30AB  20 91 31  JSR $3191
.30AE  A5 FB     LDA $FB
.30B0  20 91 31  JSR $3191
.30B3  A5 FC     LDA $FC
.30B5  20 91 31  JSR $3191
.30B8  20 8A 31  JSR $318A
.30BB  88        DEY        ; ... for 64 times.
.30BC  D0 B8     BNE $3076

[...]

.318A  E6 FB     INC $FB    ; update addresses used in the
.318C  D0 02     BNE $3190  ; STA instructions of the 
.318E  E6 FC     INC $FC    ; display routine
.3190  60        RTS
.3191  8D XX XX  STA $XXXX  ; write opcode/operand bytes to memory
.3194  EE 92 31  INC $3192
.3197  D0 03     BNE $319C
.3199  EE 93 31  INC $3193
.319C  60        RTS
Note the pointer in $FB/$FC in the code generator has another use (and values) than later in the display routine: $FB/$FC in the code generator contains the running address for the STA instructions to generate, in the display routine it contains the column start addresses of the display bitmap.

The pattern is written 64 times, then the code generator writes some housekeeping code - Y is resetted to 0, and the pointer in $FB/$FC is advanced by 200 to address the next 4-pixel column of the hidden bitmap - all in all for 20 display columns (4x20 = 80 multi-colour pixels horizontally).

In effect, you start with a rather small routine that unrolls a much larger routine in memory, to minimize storage requirements on disk, and later, maximize speed for display. What do you want more? :wink:

Hope that helps.

Greetings,

Michael
Robbie
Vic 20 Dabbler
Posts: 84
Joined: Tue Aug 11, 2020 4:36 am
Location: England

Re: LDA ($4000)

Post by Robbie »

Thank you for taking the time to explain at that depth Mike, it's really useful to see that technique used so effectively in a practical application. That constant balance between limited speed and limited memory I find really interesting.

I think it would make the software world such a better place if everyone had a grounding in this kind of thing before progressing on to producing the horrendously bloated modern code we have today!
User avatar
srowe
Vic 20 Scientist
Posts: 1340
Joined: Mon Jun 16, 2014 3:19 pm

Re: LDA ($4000)

Post by srowe »

Robbie wrote: Fri Sep 25, 2020 1:04 pm I think it would make the software world such a better place if everyone had a grounding in this kind of thing before progressing on to producing the horrendously bloated modern code we have today!
I've suggested that very idea at work, everyone should have to write code for 8 bit computers to understand these trade-offs.
User avatar
pixel
Vic 20 Scientist
Posts: 1357
Joined: Fri Feb 28, 2014 3:56 am
Website: http://hugbox.org/
Location: Berlin, Germany
Occupation: Pan–galactic shaman

Re: LDA ($4000)

Post by pixel »

srowe wrote: Fri Sep 25, 2020 1:40 pm I've suggested that very idea at work, everyone should have to write code for 8 bit computers to understand these trade-offs.
Might be something to it as in the "modern" world digital illiterates call themselves "senior developers" after two years of experience, still not knowing how a bloody computer works.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: LDA ($4000)

Post by Mike »

Robbie wrote:I think it would make the software world such a better place if everyone had a grounding in this kind of thing before progressing on to producing the horrendously bloated modern code we have today!
It's just we demand a little bit more from our computers today, the size of data handled being considerably bigger, programs are expected to work reliably on non-uniform hardware, and you can't get all this without a bunch of abstraction layers between hardware and application.
User avatar
srowe
Vic 20 Scientist
Posts: 1340
Joined: Mon Jun 16, 2014 3:19 pm

Re: LDA ($4000)

Post by srowe »

Mike wrote: Tue Sep 29, 2020 1:42 am It's just we demand a little bit more from our computers today, the size of data handled being considerably bigger, programs are expected to work reliably on non-uniform hardware, and you can't get all this without a bunch of abstraction layers between hardware and application.
That's quite reasonable, computers are far more functional now because we can build on the achievements of the past rather than reinventing the wheel. What isn't reasonable is the approach of "just pull in this module/framework" without really understanding the size/performance/security of doing so.
Post Reply