Mike wrote: ↑Thu Jul 23, 2020 3:07 pm
Regarding the SizeOf subroutine: IIRC, there exists a compact code snippet which derives the instruction length directly from the opcode byte, and which is not bigger than about 2 dozen instructions. Even if the check for branch instructions is added, that routine could be an alternative to hardcoding the address of the instruction decode routine in wAx or MINIMON. As a useful side effect, this makes the knapsack generator stand-alone.
Ooh!
Update: I wasn't able to find that after a few minutes of Googling, but I've got a lot of 6502 books I can dive into.
Update 2: I think I see the pattern. I'm going to give this a shot... JSR is a weird duck.
Update 3: Blah, I failed at the two dozen bytes part of the challenge, coming in at 40-ish bytes. I might need to stare at JSR, BRK, RTS, and RTI a little harder.
Code: Select all
SizeOf: cmp #$20 ; JSR
beq three
cmp #$00 ; BRK
beq one
cmp #$60 ; RTS
beq one
cmp #$40 ; RTI
beq one
tay
and #%00001000
beq two
tya
and #%00000101
beq one
tya
and #%00010000
bne three
tya
and #%00000100
bne three
two: ldx #2
.byte $3c
three: ldx #3
.byte $3c
one: ldx #1
rts
My methodology involved converting all the opcodes into binary and sorting them by size, and then just looking at them for a while, looking down the columns. Some patterns, like the ????10?0 of 1-byte instructions were easy to spot. The two-byte instructions are a real mixed bag, and are sort of the leftovers. I may be able to gain more insight with more looking. Maybe some beer and then more looking? But for tonight, I'll have to be satisfied with this.
I should mention the rock-star role of BBEdit in this little project. It's got a regular expression search utility that highlights matches
in real time. So I put in something like
^....10.0 and watch things light up. It would have been hard to do without it.
Like you said, it won't take much more work to find the relative branches and clear Carry, and then it'll be a standalone system! That's why I used ldx/top instead of ldx/rts, so I can dive straight into the relative branch code later.
Update 4: BRK, RTS, and RTI are organically mistaken for two-byte instructions. But two-byte instructions with $?0 always have either bit 4 or bit 7 set, while BRK, RTS, and RTI do not. I can use this, but I really need to spend some time with the wife.
Update 5: Determining whether it's a branch instruction is a matter of checking the opcode for $?0, and then 4 x ASL to set Carry.
Update 6: Here's a standalone knapsack generator using the standalone SizeOf.
Code: Select all
; Code Instrumention Generator
; Create a breakpoint at the specified address, and a code knapsack
; at $02f0. If there's already a breakpoint set, clear it and restore
; the code.
;
; In preparation for using this, fill $02f0-$02ff with $00
;
; (Originally assembled with xa)
*=$033c
; BASIC Routine
UND_ERROR = $c8e3 ; UNDEF'D STATEMENT ERROR
; Knapsack Generator
EFADDR = $07 ; Temp zeropage pointer to breakpoint address
KNAPSACK = $02f0 ; Knapsack storage (10 bytes)
BREAKPT = KNAPSACK+10 ; Breakpoint address (2 bytes)
KNAPSIZE = BREAKPT+2 ; Knapsack size (1 byte)
; Main routine entry point
; If no breakpoint is set, set one by setting X as the low byte
; and A as the high byte.
;
; If a breakpoint is already set, then clear it by restoring the knapsack code
Main: ldy KNAPSIZE ; If a breakpoint is not already set, then
beq NewKnap ; create it with the address at Y/X
restore: lda BREAKPT ; Otherwise, restore the breakpoint to the
sta EFADDR ; original code by copying the
lda BREAKPT+1 ; bytes in the knapsack back to the code
sta EFADDR+1 ; ,,
dey
-loop lda KNAPSACK+2,y ; Move between 3 and 5 bytes back
sta (EFADDR),y ; to their original locations
dey ; ,,
bpl loop ; ,,
lda #$00 ; Reset the knapsack size so that doing it again
sta KNAPSIZE ; doesn't mess up the original code
restore_r: rts ; ,,
; New Knapsack
; Generate a new knapsack at the address specified by X (low byte) and
; A (high byte)
NewKnap: stx EFADDR ; Save low byte of breakpoint
sta EFADDR+1 ; Save high byte of breakpoint
ldy #$00 ; (BRK) (also set Y to 0 for index)
sty KNAPSACK ; ,,
lda #$ea ; (NOP)
sta KNAPSACK+1 ; ,,
next_inst: tya ; Preserve Y against SizeOf
pha ; ,,
lda (EFADDR),y ; A = Opcode of the breakpoint instruction
jsr SizeOf ; X = Size of instruction (1-3)
pla
tay
bcc xfer ; Error if relative branch instruction
jmp UND_ERROR ; ?UNDEF'D STATEMENT ERROR
xfer: lda (EFADDR),y ; Move X bytes starting at Y index
sta KNAPSACK+2,y ; Y is a running count of knapsacked bytes
iny ; ,,
dex ; ,,
bne xfer ; ,,
cpy #$03 ; If at least three bytes have been knapsacked
bcc next_inst ; we're done
lda EFADDR ; Stash pointer in breakpoint storage for
sta BREAKPT ; later restoration
lda EFADDR+1 ; ,,
sta BREAKPT+1 ; ,,
sty KNAPSIZE ; Save knapsack size for later
lda #$ea ; (NOP)
-loop: cpy #$03 ; Pad code with more than three bytes with
beq add_kjmp ; NOPs after the first three
dey ; ,,
sta (EFADDR),y ; ,,
bne loop ; ,,
add_kjmp: ldy #$00
lda #$4c ; (JMP) This is the JMP to the knapsack
sta (EFADDR),y ;
lda #<KNAPSACK ; Store knapsack JMP low byte
iny ; ,,
sta (EFADDR),y ; ,,
lda #>KNAPSACK ; Store knapsack JMP high byte
iny ; ,,
sta (EFADDR),y ; ,,
lda KNAPSIZE ; Calculate the return jump point (original
tay ; address + Y)
clc ; ,,
adc EFADDR ; ,,
sta KNAPSACK+3,y ; Store return JMP low byte
lda #$00
adc EFADDR+1
sta KNAPSACK+4,y ; Store return JMP high byte
lda #$4c ; (JMP) This is the JMP to the return point
sta KNAPSACK+2,y ; ,,
knap_r: rts
; Size Of Instruction
; Given an opcode in A, return instruction size in X and set Carry flag
; Carry set indicates a relative branch instruction
SizeOf: cmp #$20 ; JSR
beq three ; is three bytes
tay ; Save opcode in Y for repeated testing
and #%00001000 ; This bit pattern usually means two bytes
beq two_cand ; ,,
tya ; The pattern ?????010 always means one byte
and #%00000101 ; when the two-byte test above fails
beq one ; ,,
tya ; If the previous tests fail, this pattern
and #%00010000 ; always indiates three bytes. Really testing
bne three ; for ???11???
tya ; As does this one. Really testing for
and #%00000100 ; ????11??
bne three ; ,,
two_cand: tya ; Of the two-byte instructions that match
and #%00001111 ; ????0000, all of them have either bit 4 or
bne two ; bit 7 set. If that's not the case, then it's
tya ; actually a one-byte instruction like BRK,
and #%10010000 ; RTS, or RTI
beq one ; ,,
two: ldx #2 ; Set to two bytes and fall through with TOP
.byte $3c ; ,, (skip word)
three: ldx #3 ; Set to three bytes and fall through with TOP
.byte $3c ; ,, (skip word)
one: ldx #1 ; Set to one byte and fall through with TOP
clc ; Clear carry indicates success
tya
and #%00001111 ; If the instruction low nybble is 0, it's a
bne size_r ; branch instruction if and only if bit 4
tya ; is set. 4xASL so sets Carry, and indicates
asl ; an instruction that can't be knapsacked
asl ; ,,
asl ; ,,
asl ; ,,
size_r: rts