Line drawing algorithm

Basic and Machine Language

Moderator: Moderators

User avatar
MrSterlingBS
Vic 20 Afficionado
Posts: 304
Joined: Tue Jan 31, 2023 2:56 am

Re: Line drawing algorithm

Post by MrSterlingBS »

Hello,

I know my search for a quick Linde drawing code is a bit boring for some people. Nevertheless, I would like to share my latest optimization with you.
In the routine for creating the jumps in the main routine, I was able to get a small speed advantage by adjusting the BCS jump. It depends on whether the main direction of the line runs in X or Y. The routine then either jumps straight to the Y set point instruction or the X direction is recalculated.

The initialization routine starts at $2153 and the main line drawing routine at $0010.
Approximately 23-24,000 pixels per second are now drawn on a screen of 144x192 points.
The pixels per second are determined by the largest delta x or delta y and are counted per run.
(I used $59 for the counter, but i see, that it is possible to use $1B.)
The total is then displayed every second and then set to zero.

Hit Enter/Return to clear the screen.

If we have a screen with 128x192 points, up to 26,000 pixels per second are possible.
If you dont like to put the core line drawing routine to the ZP, you can modify the attached one easily.
On a 144x192 pixel screen, you have up to 22,000 pixels the second in a routine without the heavy ZP usage.

Are further optimizations possible in this small routine?

Best regards
Sven

Code: Select all

draw_line:	
	SEC				;				
	LDX #$E8			; INX			
	LDA X2			; X2			
	SBC X1			; X1			
					; X2 - X1
	BCS X3C0F		;				
					;				
	LDX #$CA			; DEX			
	EOR #$FF			;				
	ADC #$01			;				
X3C0F:
	STX $10			; INX or DEX	
	STX $1C			;				
	STA $13			; 		
	SEC				;				
	LDY #$C8			; INY			
	LDA Y2			; Y1			
	SBC Y1			; Y0			
	BCS X3C22		;				
					;				
	LDY #$88			; DEY						
	EOR #$FF			;							
	ADC #$01			;								
X3C22:
	STY $06			; INY or DEY	
	STA $04			;				
	LDX X1			;				
	LDY Y1			;				
	LDA $13			; 			
	CMP $04			;				
	BCS X3C52		;				
					;				
	STA $15			;				
	LDA $06			;				
	STA $10			;				
	LDA $04			;				
	STA $59			; count pixels can deleted
	STA $1B			; count pixels can used				
	LSR A			;				
	STA $13			; 		
	LDA #$C0			; CPY			
	STA $35			;				
	LDA Y2			;				
	STA $36			;				
	lda #$13			; change jump in BCS
	sta $19			; 
	LDA lowb,X		; load first low byte 
	JMP $0026			;				
					;				
X3C52:
	STA $59			; count pixels can deleted
	STA $1B			; count pixels can used 				
	LSR A			;					
	STA $13			;	
	LDA $04			;					
	STA $15			;					
	LDA $06			;					
	STA $1C			;					
	LDA #$E0			; CPX				
	STA $35			;					
	LDA X2			;					
	STA $36			;					
	lda #$05			; change jump in BCS
	sta $19			;	
	LDA lowb,X		; load first low byte 
	JMP $0026
The Zero page code:

Code: Select all

X3C74:
	DEY						
	SEC						
	LDA #$00		
X3C78:
	SBC #$0A				
	STA $13
	BCS X3C81				
X3C7C:
	ADC #$33				
X3C7E:
	DEX						
X3C7F:	
	STA $13			
X3C81:	
	LDA lowb,X
	cmp Pixel_Low					
	beq nextstep
	STA Pixel_low					
	LDA highb,X				
	STA Pixel_High					
nextstep:	
	LDA $0000,Y				
	ORA xTable,X		
	STA (Pixel_low),Y				
X3C92:
	CPY #$1B				
	BNE X3C74				
	;RTS						
	jmp endline
Attachments
program.7z
(771 Bytes) Downloaded 134 times
User avatar
pixel
Vic 20 Guru
Posts: 1548
Joined: Fri Feb 28, 2014 3:56 am
Location: Bielefeld, Germany

Re: Line drawing algorithm

Post by pixel »

Quick scan doesn't reveal nothing to me to optimise. An achievement for its size. But I cannot place the loss of zero page in any bigger picture comfortably.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
User avatar
Mike
Herr VC
Posts: 5134
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: Line drawing algorithm

Post by Mike »

pixel wrote:Quick scan doesn't reveal nothing to me to optimise.
No surprises here. At its heart, this is still the same routine I posted here earlier this year. The optimizations applied by MrSterlingBS mostly act like exerting pressure on an air balloon: where one squeezes on one part, another part of the balloon bulges out.
An achievement for its size.
I am also gladly taking credit for this one. :)

However, this implementation of a line routine utilizes three big tables to speed up the address calculation. In the whole my original version of said routine weighs in at about 151 bytes of code and 480 bytes of tables, 631 bytes in total. The optimizations by MrSterlingBS add a good amount of code in the init part, so we're likely at about 700 bytes in total now, for a marginal speed increase in the general case - oblique lines.

My faster line routine - at a sustained speed of 30000 pixels/second - weighs in at 795 bytes total, some small tables included.
But I cannot place the loss of zero page in any bigger picture comfortably.
As I wrote in the thread already:
Mike wrote:[...] it is however a design decision whether occupying a good part of the zeropage for code is in order. When the code is supposed to be interoperable with BASIC and KERNAL, the answer likely is no.
so there ...
MrSterlingBS wrote:I know my search for a quick [Line] drawing code is a bit boring for some people. [...]
The topic at hand here of course involves an exploration mode on your part, with your attempts to reproduce the speed of my 'record holder' line routine on your own. Nothing wrong with that as general course.

Over the months I dropped quite some hints and also code snippets of that faster routine that should have given you an idea that another change of algorithm is necessary to achieve higher speeds than merely around 20000 pixels/second. As I wrote, the key to increased speed is not stuffing more logic into the main core loop, but less. As the routine still has to do its job as whole, the coder's job thus is to identify, what processing can be factored out of the main loop and placed into the init part of the routine instead. It is actually the big hunks of thought that I put into the init routine, which made the routine workable: it took me quite some weeks to also optimize this part, so the preparation of the main loop would not cost too much extra code and also not too much extra time.

On another aspect (on behalf of the "boring"(?) part): a game like ELITE consists of a lot more than just a fast line routine. A fast line routine ultimately is indispensable, that's clear, but the whole thing needs a good 3-digit figure of routines that interwork with each other, and some quite complex data structures that make up the game model and business logic. You are putting a lot of work into a single aspect of the game, you seem to forget the other aspects of the whole, and maybe you should reconsider things here.
User avatar
pixel
Vic 20 Guru
Posts: 1548
Joined: Fri Feb 28, 2014 3:56 am
Location: Bielefeld, Germany

Re: Line drawing algorithm

Post by pixel »

Sorry for the missing references. Not going for an academic degree. ;) In the end: it's a working line routine and I probably cannot imagine how good that's kicking the first time any more, plus a couple of priceless lessons for free. No time wasted.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
User avatar
MrSterlingBS
Vic 20 Afficionado
Posts: 304
Joined: Tue Jan 31, 2023 2:56 am

Re: Line drawing algorithm

Post by MrSterlingBS »

I hope it wasn't understood as if I had invented the routine. it was loaded from codebase64 and significantly improved by Mike. Thank you for that. I'm simply interested in the possibility of slightly improving the routine through very slight adjustments. I learned a lot from that. With the few more bytes we have a speed advantage of around 10%, not bad at all, right? In the non ZP version…


I'm of course very excited about the continuation of Mike's course 3D from a scratch.
User avatar
MrSterlingBS
Vic 20 Afficionado
Posts: 304
Joined: Tue Jan 31, 2023 2:56 am

Re: Line drawing algorithm

Post by MrSterlingBS »

Hi,

here is a new version of my line routine.

The SEC can be deleted if only the major direction DEX or DEY is used.
Drawing the lines from right to left or from down to up.

DEX or DEY with CPX or CPY calculate the difference between Y or X and CPY or CPX >= 0 the carry was set.
https://www.c64-wiki.de/wiki/CPY_(RAUTE)$nn

This reduces the cycles in the ZP core routine by 2' cycles each pixel.

Here is the result on a 144x224 screen. The lines the are drawn on a 144x204 screen.
Attachments
Line_without_SEC.zip
(10.35 KiB) Downloaded 104 times
Last edited by MrSterlingBS on Sat Apr 12, 2025 1:10 am, edited 2 times in total.
User avatar
MrSterlingBS
Vic 20 Afficionado
Posts: 304
Joined: Tue Jan 31, 2023 2:56 am

Re: Line drawing algorithm

Post by MrSterlingBS »

Hi,

After reordering the ZP-Core routine (see code), it is possible to use the horizontal line with 37,000 pixels per second, with a line length of 100 pixels, instead of 27.000 pixel per second. A very fast routine for the vertical alignment was added separately. > 60.000 pixels/s.
This may be of interest to some. This time i uses a 128x160 pixel screen.

If you have any questions, I will be happy to answer them.

BR
Sven

Code: Select all

X3C74:
	;SEC					; is no longer needed if we use DEY or DEX
	DEY					; DEY or DEX
	LDA #$00				;
X3C78:
	SBC #$0A				;	
	STA $12
	BCS X3C7F			;	
X3C7C:
	ADC #$33				; delta X or delta Y
X3C7E:					;	
	STA $12				; STA $12 instead of DEX/DEY/INY/INX
X3C7F:					
	DEX					; DEX/DEY/INY/INX instead of STA $12
X3C81:	
	LDA highb,X			; 	
	STA Pixel_High			;	
	LDA $0000,Y			;
	ORA xTable,X			;	
	STA (Pixel_low),Y		;	
X3C92:
	CPY #$1B				; CPY or CPX
	BNE X3C74			; Ändern wenn dX oder dY = 0 Jump to Vert_or_Hori:	
	RTS

Code: Select all

Draw_H_Line:
	jmp Draw_Line_X
Draw_V_Line:
	jmp Draw_Line_Y
	
Draw_Line:	
	SEC				;				
	LDA X2			; X2			
	SBC X1			; X1			
					; X2 - X1
	BEQ Draw_V_Line 	; X2=X1
	BCS X3C0F		;				
					;							
	EOR #$FF			;				
	ADC #$01			;		
	SEC
X3C0F:			
	STA $12			; 	
				
	LDA Y2			; Y2			
	SBC Y1			; Y1
	BEQ Draw_H_Line 	; Y2=Y1
	BCS X3C22		;		
					;								
	EOR #$FF			;							
	ADC #$01			;								
X3C22:	
	STA $04			;	
	
	LDA #$E1
	STA $2E			;
					
	LDA $12			; dx	
	CMP $04			; dy				
	BCS X3C52		; check if dx or dy is greater			
	
;dy ist größer		;	
	STA $14			; dx		
	LDA $04			; dy				
	STA $1A			; count pixels				
	LSR A			;				
	STA $12			; 	
	lda #$0A			; change jump in BCS
	sta $18			; 

	LDA #$C0			; CPY			
	STA $2B			;	
	LDY #$88			; DEY	
	STY $10			; DEY
	
	SEC
	LDA Y2			; Y2			
	SBC Y1			; Y1
	BCS y2greater		;			
;y1 größer
	ldy y2
	sty $2C
	ldy y1
	
	ldx x1
	
	SEC
	LDA X2			; X2			
	SBC X1			; X1			
	BCS y1gx2g		;				

	lda #$CA			; DEX		
	sta $1D
	jmp $001E
y1gx2g:
	lda #$E8			; INX		
	sta $1D
	jmp $001E
	
y2greater:	
	ldy y1
	sty $2C
	ldy y2
	
	ldx x2
	
	LDA X2			; X2			
	SBC X1			; X1			
	BCS y2gx2g		;				

	lda #$E8			; INX		
	sta $1D
	jmp $001E
y2gx2g:
	lda #$CA			; DEX		
	sta $1D
	jmp $001E	
		
;dx ist größer			;						
X3C52:
	STA $1A			; count pixels			
	LSR A			;					
	STA $12			;	
	LDA $04			;					
	STA $14			;									
	
	lda #$05			; change jump in BCS
	sta $18			; 
	
	LDY #$CA			; DEX	
	STY $10			; DEX
	
	LDA #$E0			; CPX				
	STA $2B			;
				
	LDA X2			; X2			
	SBC X1			; X1			
	BCS X2greater		;						

X1greater:
	ldx x2
	stx $2C
	ldx x1
	
	ldy y1
	
	SEC
	LDA Y2			; Y2			
	SBC Y1			; Y1			
	BCS x1gy2			;		
	
	lda #$88			; DEY	
	sta $1D
	jmp $001E
	
x1gy2:
	lda #$C8			; INY		
	sta $1D
	jmp $001E
	
X2greater:
	ldx x1
	stx $2C
	ldx x2
	
	ldy y2
	
	LDA Y2			; Y2			
	SBC Y1			; Y1			
	BCS x2gy2			;		
	
	lda #$C8			; INY	
	sta $1D
	jmp $001E
	
x2gy2:
	lda #$88			; DEY		
	sta $1D
	jmp $001E
	
Draw_Line_X:			
	LDY Y1			
	LDA #$CA			; DEX	
	STA $1D
	LDA #$E0			; CPX				
	STA $2B
	
	LDA #$EE
	STA $2E			;

	SEC				;	
	LDA X2			; X2				
	SBC X1			; X1				
	BCS X3C52YY		; JMP IF X2 >= X1	
	sta $1A
	LDX X1
	LDA X2			;			
	STA $2C			;
	JMP $001E			
X3C52YY:
	sta $1A
	LDX X2
	LDA X1			;			
	STA $2C			;
	JMP $001E
	
Draw_Line_Y:			
	LDX Y1			; Y1 = Startpoint
	SEC				;				
	LDA Y2			; X2			
	SBC Y1			; X1			
					; X2 - X1
	BCS X3C52XX		; JMP IF Y2 >= Y1	
	EOR #$FF		;							
	ADC #$01		;			
	LDX Y2			; Y2 = Startpoint
X3C52XX:	
	TAY
	
	STA $1A
		
	STX LoopFastY+1
	STX Pixel_Low
	STX ORABIT+3			; 
	
	LDX X1
StartFastY:	
	LDA highb,X				; only once per line	
	STA LoopFastY+2
	STA Pixel_High			;	
	STA ORABIT+4			; 
	LDA xTable,X			;	
	STA ORABIT+1			;
LoopFastY:
	LDA $0000,Y
ORABIT:
	ORA #$00				;	
	STA $0000,Y	
FastCPY:
	DEY
	BNE LoopFastY
	LDA (Pixel_low),Y
Last_ORA:
	ORA xTable,X				;	
	STA (Pixel_low),Y	
	RTS
Attachments
fast_vertical_line.png
fast_vertical_line.png (1.27 KiB) Viewed 1474 times
horizontal_line.png
horizontal_line.png (1.24 KiB) Viewed 1653 times
vertical_line.png
vertical_line.png (1.25 KiB) Viewed 1654 times
User avatar
MrSterlingBS
Vic 20 Afficionado
Posts: 304
Joined: Tue Jan 31, 2023 2:56 am

even faster Line drawing algorithm

Post by MrSterlingBS »

Hi,

Here's a little update to my line drawing routine. It's now 2% faster thanks to a small adjustment.
evenFasterLine.zip
(9.23 KiB) Downloaded 48 times

BR
Sven
User avatar
MrSterlingBS
Vic 20 Afficionado
Posts: 304
Joined: Tue Jan 31, 2023 2:56 am

> 30.000 pixel line drawing algorithm 6502

Post by MrSterlingBS »

Hey,

Here's a reverse engineering of the 30,000 pixel/s line algorithm from mike's vector demo. Slightly adapted and a bit faster (20 cycles during initialization). This version use 4 x 168 byte tables. These tables can reduced to 58 bytes with a small adjustment in the sourcecode. (increase the init by 20 cycles)


11kbytes of RAM
PAL only (sorry)
168x192 bitmap @$1040
21x24 charmap @$0200
IRQ driven pixel per second timer

And a new version with a very small increase in speed.
The source code is also attached.

BR
Sven
Attachments
30k_Line_V1.zip
(11.32 KiB) Downloaded 40 times
User avatar
MrSterlingBS
Vic 20 Afficionado
Posts: 304
Joined: Tue Jan 31, 2023 2:56 am

>31.000 Line drawing algorithm 6502 Double Buffer

Post by MrSterlingBS »

Hello,

I discovered another great trick.
If the line routine is combined with a 3D routine that uses the double buffer technique, calculating the memory address at the end of the line routine can be implemented much more easily. Instead of writing,

Code: Select all

	CLC
	LDA PL
	ADC #$C0
	STA PL
	BCC next1
	INC PH
	JMP JTO_0
next1:		  
	SEC
	JMP JTO_0
dYSdX_Exit_Down:
	JMP DrawLineEnd
	
you can simply increment the high byte. The low byte is always zero.

Code: Select all

	
	INC PH
	SEC
	JMP JTO_0
dYSdX_Exit_Down:
	JMP DrawLineEnd
	
This eliminates the need for the low-byte table. The high-byte table can then be created from $4000 to $5100, for example, at a horizontal resolution of 144 pixels. The buffer then only needs to be transferred to the correct screen addresses.
Of course, the corresponding free memory blocks are then created in the double buffer at the end of the individual column entries.
Here, for example, the data for 3D models could be entered in order to use the free memory sensibly.

BR
Sven
User avatar
MrSterlingBS
Vic 20 Afficionado
Posts: 304
Joined: Tue Jan 31, 2023 2:56 am

>100.000 (190.000) pixel/s horizontal line algorithm

Post by MrSterlingBS »

Hello,

I managed to speed up the drawing of the horizontal line enormously.
Mike's line algorithm, runs with 30,000 p/s. If we extract the part that draws the horizontal line from his algorithm, it becomes incredibly short and accelerates significantly, to approximately 45,000 p/s. If we now apply the option of filling an entire byte, the speed can be increased enormously, to approximately 95,000 p/s. To speed things up even further, we can use the SBX opcode instead of TXA/SBC #8/TAX. Then we can even achieve 100,000 p/s. This comparison is during the random line generation of horizontal lines.

If you draw a single line, a speed of up to 195,000 p/s can be achieved! (Line from X1=10 to X2=109; dX=100) :mrgreen:
(see picture and new picture without the two dots to verify the correct length of the line)

Bitmap - 128 x 256
PAL
11kB
fast_H_Line.zip
(4.31 KiB) Downloaded 7 times
press enter/return after starting to clear the screen

This is the init routine:

Code: Select all

Init_H:
	LDX X1					
	LDA highb,X			; load memory address
	STA PH	  
	
	txa				; small jmp table with only 8 bytes
	and #7
	tax
	LDA JumpTableH,X		; JMP Table
	STA JMPH+1			; self-modified code
     
	LDX $F7
	STX counter			; for counting the speed

	INX
	LDY Y1				; Y1=Y2
JMPH:
	JMP $2600		
Here is the drawing routine:

Code: Select all

*=$2600

JTH_0:
	LDA (PL),Y
	ORA #$80
	STA (PL),Y
	DEX
	BEQ Exit_H
JTH_1:
	LDA (PL),Y
   	 ORA #$40
   	 STA (PL),Y
   	 DEX
   	 BEQ Exit_H
JTH_2:
	LDA (PL),Y
	ORA #$20
	STA (PL),Y
	DEX
	BEQ Exit_H
JTH_3:
	LDA (PL),Y
	ORA #$10
	STA (PL),Y
	DEX
	BEQ Exit_H
JTH_4:
	LDA (PL),Y
	ORA #$08
	STA (PL),Y
	DEX
	BEQ Exit_H
JTH_5:
	LDA (PL),Y
	ORA #$04
	STA (PL),Y
	DEX
	BEQ Exit_H
JTH_6:
	LDA (PL),Y
	ORA #$02
	STA (PL),Y
	DEX
	BEQ Exit_H
JTH_7:
	LDA (PL),Y
	ORA #$01
	STA (PL),Y
	DEX
	BEQ Exit_H
JTH_00:
	INC PH  
	CPX #9
	BCC JTH_0
	LDA #$ff
	STA (PL),Y
	;TXA
	;SEC			; set with BCC
	;SBC #8
	;TAX
	db $CB,$08		; opcode SBX (see No more secrets)
	JMP JTH_00
Exit_H:
	RTS
	
JumpTableH:
	db <JTH_0,<JTH_1,<JTH_2,<JTH_3,<JTH_4,<JTH_5,<JTH_6,<JTH_7
Attachments
picture2.png
picture2.png (1.26 KiB) Viewed 19 times
160k_pixel_line.png
160k_pixel_line.png (1.28 KiB) Viewed 82 times
Post Reply