LDA ($4000) and self-modifying code
Moderator: Moderators
LDA ($4000) and self-modifying code
LDA ($4000) is what I want to do, but the 6502 isn't interested.
We can use indirect indexed zero page by replacing $4000 with $FE (or similar):
LDX #0
LDA ($FE,X)
But how would you go about achieving the equivalent of LDA ($4000)? From messing with it for a while, it seems a bit of a nightmare (or a challenge, depending what you're into).
We can use indirect indexed zero page by replacing $4000 with $FE (or similar):
LDX #0
LDA ($FE,X)
But how would you go about achieving the equivalent of LDA ($4000)? From messing with it for a while, it seems a bit of a nightmare (or a challenge, depending what you're into).
Last edited by Robbie on Sat Sep 26, 2020 2:29 am, edited 1 time in total.
- chysn
- Vic 20 Scientist
- Posts: 1205
- Joined: Tue Oct 22, 2019 12:36 pm
- Website: http://www.beigemaze.com
- Location: Michigan, USA
- Occupation: Software Dev Manager
Re: LDA ($4000)
This might seem odd, but you could do something like this at the start of your code:Robbie wrote: ↑Mon Sep 21, 2020 1:39 pm LDA ($4000) is what I want to do, but the 6502 isn't interested.
We can use indirect indexed zero page by replacing $4000 with $FE (or similar):
LDX #0
LDA ($FE,X)
But how would you go about achieving the equivalent of LDA ($4000)? From messing with it for a while, it seems a bit of a nightmare (or a challenge, depending what you're into).
Code: Select all
lda #$ad ; LDA
sta $3fff
lda #$60 ; RTS
sta $4002
; etc...
Code: Select all
jsr $3fff
; A is the contents of ($4000)
VIC-20 Projects: wAx Assembler, TRBo: Turtle RescueBot, Helix Colony, Sub Med, Trolley Problem, Dungeon of Dance, ZEPTOPOLIS, MIDI KERNAL, The Archivist, Ed for Prophet-5
WIP: MIDIcast BASIC extension
he/him/his
WIP: MIDIcast BASIC extension
he/him/his
Re: LDA ($4000)
That's a really elegant, hacker way of doing it.
Have you used that technique 'in anger' anywhere?
Have you used that technique 'in anger' anywhere?
- Mike
- Herr VC
- Posts: 4858
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: LDA ($4000)
On the 65xx, if you (need to) hold pointer/address values 'outside' zeropage, it is a common idiom to copy/swap them over to zeropage when they're on active duty during a part of a routine and store the updated values back to their original places when they're not needed at the moment.
I.e., you were right on track with your OP, extended like thus:
You will likely see the "activate" and "retire" code snippets in the calling procedure, whereas the called procedure will use the fixed zeropage addresses to reference the pointer value as parameter.
I.e., you were right on track with your OP, extended like thus:
Code: Select all
/* "activate" pointer in $4000 */
LDA $4000
STA $FD
LDA $4001
STA $FE
/* use working copy in $FD/$FE: */
LDY #0:LDA ($FD),Y or
LDX #0:LDA ($FD,X)
/* "retire" pointer back to $4000 */
LDA $FD
STA $4000
LDA $FE
STA $4001
- chysn
- Vic 20 Scientist
- Posts: 1205
- Joined: Tue Oct 22, 2019 12:36 pm
- Website: http://www.beigemaze.com
- Location: Michigan, USA
- Occupation: Software Dev Manager
Re: LDA ($4000)
HA! No, I have not. I like the phrase "in anger" here, though.
I always do a variation of the "activate" transfer that Mike suggests above. In most of my real-world code, the pointer is above zeropage because it's part of a larger data structure that requires a pointer to something (screen memory, etc.). So, I reserve a zeropage pointer for handling work on a current member of the data structure.
In my own code, I’ve found places where I used “retire” (or, “update”), and places where I didn’t. It depends
I spent some time reviewing my typical handling of pointers to data, and I usually write the "activate" as a subroutine that does something with the data. For example, in TRBo: Turtle RescueBot, there are up to (I think) six Patrols, and each Patrol is represented by an eight-byte data structure, the first two bytes being a pointer to the Patrol's screen location.
Each frame in the game calls a subroutine that iterates through each Patrol in the level, with the iterator being the Patrol index in X. This subroutine calls another subroutine that moves the Patrol based on its index. This subroutine calculates the address of the Patrol's data structure. It computes the Patrol's movements and updates the appropriate data, like the screen location. At the end, it calls a subroutine that places the character in the new position. This is a generic routine that can place any character given its address and character:
Code: Select all
; Place a Character
; Place the character on the screen at the specified address.
;
; Preparations
; A - Low byte of the screen address
; Y - High byte of the screen address
; X - Character to place
; Carry flag - Color if set, hidden if unset
PLACE: STA DATA_L
STY DATA_H
TXA
LDY #$00
STA (DATA_L),Y
LDA DATA_L
LDY DATA_H ; Falls through to CHRCOL
My approach to machine language is extremely C-like, when it comes to design. I code subroutines even when it might be more efficient to write code in an inline manner. But I'm employing similar principles (transferring an address to zeropage) even if the ordering is somewhat different.
VIC-20 Projects: wAx Assembler, TRBo: Turtle RescueBot, Helix Colony, Sub Med, Trolley Problem, Dungeon of Dance, ZEPTOPOLIS, MIDI KERNAL, The Archivist, Ed for Prophet-5
WIP: MIDIcast BASIC extension
he/him/his
WIP: MIDIcast BASIC extension
he/him/his
Re: LDA ($4000)
I remember reading somewhere that the 6502 has so few registers because zero page is pretty much as fast.
I guess I need to just think of zero page in that way, and activating and retiring pointers as pretty much equivalent to copying them into a register for manipulation.
The idea of the code writing its own code is really interesting though, used 'in anger' or otherwise. I might have a mess with that, and see if I can come up with any practical uses for it.
I guess I need to just think of zero page in that way, and activating and retiring pointers as pretty much equivalent to copying them into a register for manipulation.
The idea of the code writing its own code is really interesting though, used 'in anger' or otherwise. I might have a mess with that, and see if I can come up with any practical uses for it.
- Mike
- Herr VC
- Posts: 4858
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: LDA ($4000)
For code generators, I have used this subroutine now and again:
The pointer in Write+1/Write+2 gets initialised in the main program, and then the code generator calls Write with A as parameter to build yet another program in RAM by an algorithmic description. The X and Y registers are not used, which is quite useful for the calling code.
For example, my CGA viewer uses this to build the display routine. The resulting code is ~16K in size and is built from a routine that's just ~350 bytes long, which gives a 46:1 compression ratio.
Code: Select all
.Write
STA $FFFF
INC Write+1
BNE Write_00
INC Write+2
.Write_00
RTS
For example, my CGA viewer uses this to build the display routine. The resulting code is ~16K in size and is built from a routine that's just ~350 bytes long, which gives a 46:1 compression ratio.
Re: LDA ($4000)
I'm reading the words Mike, but my brain is refusing to process the meaning.
I shall go to bed, and try again tomorrow to understand what it is that you're telling me.
I shall go to bed, and try again tomorrow to understand what it is that you're telling me.
- Mike
- Herr VC
- Posts: 4858
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: LDA ($4000)
Let me explain (you should probably try out the CGA viewer beforehand, to get a feeling about what the program actually does):Robbie wrote:I'm reading the words Mike, [...]
The CGA viewer takes a hidden bitmap (320x200 pixels in 4 colours) and displays a zoomed part of it, 80x64 pixels, into the screen bitmap of MINIGRAFIK (80x192 pixels in multi-colour, i.e. also 4 colours). As three multi-colour pixels are stacked top on each other on the VIC-20 screen, you get square zoomed 'pixels' (about perfect for NTSC, only slightly elongated for PAL).
The display routine takes one byte from the hidden bitmap, and writes it three times to the display bitmap, like thus:
Code: Select all
.0400 A0 00 LDY #$00
.0402 B1 FB LDA ($FB),Y
.0404 C8 INY
.0405 8D 00 11 STA $1100
.0408 8D 01 11 STA $1101
.040B 8D 02 11 STA $1102
[...]
Putting a loop around this code snippet and updating all addresses inside would slow down the display routine by more than a factor of 2. Therefore the whole display routine has been unrolled, i.e. the code snippet you see above continues like this:
Code: Select all
[...]
.040E B1 FB LDA ($FB),Y
.0410 C8 INY
.0411 8D 03 11 STA $1103
.0414 8D 04 11 STA $1104
.0417 8D 05 11 STA $1105
.041A B1 FB LDA ($FB),Y
.041C C8 INY
.041D 8D 06 11 STA $1106
.0420 8D 07 11 STA $1107
[...]
Unfortunately, the regularity only extends as far as the first four bytes ($B1, $FB, $C8, $8D), but then the changing addresses of the three STA instructions break the pattern. Normal compressing algorithms won't work well here. You can only expect the usual compression ratio of 2:1.
Instead, I use a code generator to roll out the loop in memory: it uses the Write sub-routine, and updates the addresses of the STA instructions as it progresses. That in turn need not be done in an unrolled loop (as we would then be no better off and wouldn't have any compression at all, rather a code expansion). After initialisation of the code pointer in $3192/$3193, the main part of that code generator looks like this:
Code: Select all
[...]
.3074 A0 40 LDY #$40
.3076 A9 B1 LDA #$B1 ; opcode byte of LDA ($FB),Y
.3078 20 91 31 JSR $3191
.307B A9 FB LDA #$FB ; operand byte of LDA ($FB),Y
.307D 20 91 31 JSR $3191
.3080 A9 C8 LDA #$C8 ; INY instruction
.3082 20 91 31 JSR $3191
.3085 A9 8D LDA #$8D ; opcode byte of STA $xxxx
.3087 20 91 31 JSR $3191
.308A A5 FB LDA $FB ; low-byte running address of STA $xxxx operand
.308C 20 91 31 JSR $3191
.308F A5 FC LDA $FC ; high-byte running address of STA $xxxx operand
.3091 20 91 31 JSR $3191
.3094 20 8A 31 JSR $318A ; increment value in $FB/$FC
.3097 A9 8D LDA #$8D ; ... write ...
.3099 20 91 31 JSR $3191
.309C A5 FB LDA $FB
.309E 20 91 31 JSR $3191
.30A1 A5 FC LDA $FC
.30A3 20 91 31 JSR $3191
.30A6 20 8A 31 JSR $318A
.30A9 A9 8D LDA #$8D ; ... three STA $xxxx instructions,
.30AB 20 91 31 JSR $3191
.30AE A5 FB LDA $FB
.30B0 20 91 31 JSR $3191
.30B3 A5 FC LDA $FC
.30B5 20 91 31 JSR $3191
.30B8 20 8A 31 JSR $318A
.30BB 88 DEY ; ... for 64 times.
.30BC D0 B8 BNE $3076
[...]
.318A E6 FB INC $FB ; update addresses used in the
.318C D0 02 BNE $3190 ; STA instructions of the
.318E E6 FC INC $FC ; display routine
.3190 60 RTS
.3191 8D XX XX STA $XXXX ; write opcode/operand bytes to memory
.3194 EE 92 31 INC $3192
.3197 D0 03 BNE $319C
.3199 EE 93 31 INC $3193
.319C 60 RTS
The pattern is written 64 times, then the code generator writes some housekeeping code - Y is resetted to 0, and the pointer in $FB/$FC is advanced by 200 to address the next 4-pixel column of the hidden bitmap - all in all for 20 display columns (4x20 = 80 multi-colour pixels horizontally).
In effect, you start with a rather small routine that unrolls a much larger routine in memory, to minimize storage requirements on disk, and later, maximize speed for display. What do you want more?
Hope that helps.
Greetings,
Michael
Re: LDA ($4000)
Thank you for taking the time to explain at that depth Mike, it's really useful to see that technique used so effectively in a practical application. That constant balance between limited speed and limited memory I find really interesting.
I think it would make the software world such a better place if everyone had a grounding in this kind of thing before progressing on to producing the horrendously bloated modern code we have today!
I think it would make the software world such a better place if everyone had a grounding in this kind of thing before progressing on to producing the horrendously bloated modern code we have today!
Re: LDA ($4000)
I've suggested that very idea at work, everyone should have to write code for 8 bit computers to understand these trade-offs.
- pixel
- Vic 20 Scientist
- Posts: 1380
- Joined: Fri Feb 28, 2014 3:56 am
- Website: http://hugbox.org/
- Location: Berlin, Germany
- Occupation: Pan–galactic shaman
Re: LDA ($4000)
Might be something to it as in the "modern" world digital illiterates call themselves "senior developers" after two years of experience, still not knowing how a bloody computer works.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
https://github.com/SvenMichaelKlose
- Mike
- Herr VC
- Posts: 4858
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: LDA ($4000)
It's just we demand a little bit more from our computers today, the size of data handled being considerably bigger, programs are expected to work reliably on non-uniform hardware, and you can't get all this without a bunch of abstraction layers between hardware and application.Robbie wrote:I think it would make the software world such a better place if everyone had a grounding in this kind of thing before progressing on to producing the horrendously bloated modern code we have today!
Re: LDA ($4000)
That's quite reasonable, computers are far more functional now because we can build on the achievements of the past rather than reinventing the wheel. What isn't reasonable is the approach of "just pull in this module/framework" without really understanding the size/performance/security of doing so.Mike wrote: ↑Tue Sep 29, 2020 1:42 am It's just we demand a little bit more from our computers today, the size of data handled being considerably bigger, programs are expected to work reliably on non-uniform hardware, and you can't get all this without a bunch of abstraction layers between hardware and application.