Mike wrote: ↑Fri Jul 17, 2020 3:52 am
brain wrote:Hmm, not sure I understand that, because code isn't going to know about "just exec io2, and your data will get loaded into memory" and if it does know that much, it's easy to just change that to "exec io2, which will slow load the internal RAM and sideload the RAM on the cart".
User code is not supposed to have any business calling or accessing anything directly in the I/O area. The magic words in my preceding postings were: "[...] abstract from a bunch of different, incompatible register layouts for memory mapping. Instead standard KERNAL calls are used to move data between memory and files [...]", i.e. a wedge into all relevant KERNAL calls presents the standard API to all existing programs to operate with the new storage device driver.
My point is that your idea was stated thusly:
Code: Select all
For faster file I/O, what had been proposed in another thread - nearly 9 years ago - was a method that constructs transfer routines on the fly (most probably in I/Ox to keep RAMx and BLKx free for 'regular' code/data). A microcontroller would access the SD card in a known fashion, but not just providing a single register byte to load a byte from the file. Rather, when loading a file to memory (i.e. KERNAL load replacement/wedge), it would provide unrolled code in I/Ox like thus:
LDA #aa
STA $1001
LDA #bb
STA $1002
LDA #cc
STA $1003
[...]
I responded questioning the value of going to all effort, since it mainly would provide value to just the internal RAM only.
You commented "When the file transfer method works there, it also works for expansion RAM ..."
I questioned why you needed to do this method for expansion ram, since I could just load the expansion RAM directly from the uC
You then responded "... and, again, it would work the same way there, using standard KERNAL API calls and would not require programs to be adapted to the cartridge."
Which I don't agree with, but here's my full response, since it appears I am either misunderstanding things or you're misunderstanding my response.
IF we have a cart that one can access via std KERNAL routines, but that then populates IO2 or IO3 with unrolled code (as per your example), the code has to be executed from the on board 6502.
IF we assume that the KERNAL APIs are somehow patched or modified to know about the code that needs to be executed from IO2, then the application calls the KERNAL API, the API drives the registers on the cart, and the cart presents an unrolled loop of move instructions in IO2. The API then jumps to IO2 and starts running the code presented by the uC.
Now, if I am understand the above correctly (patched KERNAL APIs that know about executing code in IO2), something like this is the result:
IO2:
LDA #aa
STA $1001
LDA #bb
STA $1002
LDA #cc
STA $1003
[...]
RTS
#now in KERNAL API code
CHK_LOOP
LDA trigger_next_dump of code
BNE CHK_LOOP
JSR IO2 ; load in next chunk of code
Obviously, there needs to be a "are we done loading" check somewhere, where that's just details.
Now, if the above is the idea to load all the data, both internal and expansion RAM data, I was noting that without changing the API again or doing anything else, the uC could present the following code:
IO2
LDA #aa
STA $1001
LDA #bb
STA $1002
LDA #cc
STA $1003
[...]
; OK, we're done loading internal RAM, now to load external expansion RAM
LOOP
lda IO_REGISTER_TO SIGNAL_SIDELOADING_EXP_RAM
bne LOOP
rts
Since the uC is creating all the code, it can just as easily create the above chunk of code, which instructs the uC that the RAM is not being used so feel free to take it offline and sideload all the expansion RAM data needed, letting the 6502 know via the register when the uC is done. No more changes to KERNAL API other than the original change to allow the API to execute code in IO2
My point was not that USER code would know about IO2, but that the KERNAL API knows about IO2. And, if it knows about executing code in IO2, the uC need not resort to unrolling code to load the data into the expansion RAM via 6502 moves when it could just present code 6502 code to keep the CPU off the RAM, take the RAM offline, dump all the data into the RAM, and bring it back online.
I'm in agreement unrolled is faster. But, since I am not a SW person, I figured I'd ask for the speedup. Looks like 2.3X or a bit better, or ~50K cycles for unrolled loop of 8kB and ~115K for normal. Still, that's 50ms versus 115mS, which is pretty marginal, all things considered. Just my opinion, of course...
It is the difference between <10 frames/second and 25 frames/second, if the new file I/O method is used to stream video data into internal RAM for the VIC to display.
Mind you, if the position was that one could use bog standard KERNAL API codes to load from the cart and somehow execute this unrolled code in IO2, I am at a loss on how that would be done, since the KERNAL has no idea to jsr to IO2. I cna see it being possible if the "load" command loads a stub that then calls jsr IO2, but that only works for loads, not for opens and such. SO, if I assuming incorrectly about how the code in IO2 gets executed, I need more clarity.