Line drawing algorithm
Moderator: Moderators
- MrSterlingBS
- Vic 20 Afficionado
- Posts: 302
- Joined: Tue Jan 31, 2023 2:56 am
Re: Line drawing algorithm
Here are the results of the faster line drawing routine for a 128x256 Pixel screen.
There is no low byte calculation necessary.
Line from X0,Y0 to X1,Y1 / cycles
0,0 to 99,0 / 4347
0,0 to 99,0 / 4346
0,0 to 99,99 / 4386 !!!
Greeting
Sven
There is no low byte calculation necessary.
Line from X0,Y0 to X1,Y1 / cycles
0,0 to 99,0 / 4347
0,0 to 99,0 / 4346
0,0 to 99,99 / 4386 !!!
Greeting
Sven
Re: Line drawing algorithm
Quite cool! I went for a char boundary check instead of unconditional address calculations. Something like this:
That's at least -5 cycles for your code, I assume. BUT: There's still clipping, so I'm thinking of doing just that with the cpx/cpy, correcting the drawing direction beforehand.
Code: Select all
lda dx
loop2:
sta error
loop:
lda (d),y
ora pixels,x
sta (d),y
dey
cpy y1
beq done
lda error
sbc dy
bcs loop2
adc dx
sta error
inx
lda is_charboundary,x ; <--- !
beq loop
lda column_addrs_l,x
sta d
lda column_addrs_h,x
sta d+1
bne loop ; (jmp)
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
https://github.com/SvenMichaelKlose
- Mike
- Herr VC
- Posts: 5130
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: Line drawing algorithm
To get the rounding right, the first LDA dx should be followed by LSR A (i.e., error := dx/2).
The address calculation/lookup is factored out and done simply by adding the column size to the address column pointer. This allows to re-purpose register X to count down the number of remaining pixels.
Other than all that, your code snippet pretty much does what my line routine does for |dy|>|dx|.
The whole adventure laid out in the thread here is about obtaining a fast line routine. The key to that is not stuffing more logic into the loop body, but less!
The routine knows where the character boundaries are without the need to explicitly check, if the loop is unrolled 8 times, and at start, the preparation part jumps into the loop body instance "x1 mod 8". Each loop body instance knows which bit to set (or reset/toggle), so that bit is encoded as immediate constant.pixel wrote:Quite cool! I went for a char boundary check instead of unconditional address calculations.
The address calculation/lookup is factored out and done simply by adding the column size to the address column pointer. This allows to re-purpose register X to count down the number of remaining pixels.
Other than all that, your code snippet pretty much does what my line routine does for |dy|>|dx|.
The whole adventure laid out in the thread here is about obtaining a fast line routine. The key to that is not stuffing more logic into the loop body, but less!
Re: Line drawing algorithm
Thanks!
I know what the adventure is. Some strategies might not include unrolling the loop.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
https://github.com/SvenMichaelKlose
- Mike
- Herr VC
- Posts: 5130
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: Line drawing algorithm
As can be seen by these two posts (here and here), 20000 pixels/second on a 1 MHz 65xx are easily attainable without loop unrolling, mostly by streamlining the address calculation, at the expense of using huge tables.pixel wrote:Some strategies might not include unrolling the loop.
As long as there still happens a full or even only partial address calculation per pixel, you can't improve much on that. Moving the inner loop into the zeropage and applying those measures as hinted at by the last posts of MrSterlingBS may eke out some cycles, but still won't be able to approach the speeds possible by unrolling the loop.
So-called "chunky" Bresenham implementations use (variants of) this code snippet as part of their inner loop:
Code: Select all
.loop
LSR mask ; 5
BEQ fixup_column ; 2 (assumed not taken)
SBC dy ; 3
BCC fixup_y ; 2 (assumed not taken)
DEX ; 2
BNE loop ; 3 (assumed taken)
-----------------------------
total: 17 cycles
As I wrote earlier in this thread, in any case - and already with my above implementation of Bresenham at 20000 pixels/second - we are working with drawing speeds here that demand a good and fast(!) geometry engine. Even my faster line routine, with a sustained speed of 30000 pixels/second, won't save the day when the geometry engine is crap.
*) ... measured in pixel steps. If one takes the PAR into account, that corresponds to a maximum slope angle of phi = arctan(1/2 x 3/5) ~= 16.7° (for PAL). When drawing objects in vector graphics, we can assume an equidistribution of angles (provided the programmer took the PAR into account), which means only 16.7°/90° ^= 18.6% of all lines have a chance to be faster with a chunky Bresenham. For NTSC (with its PAR ~= 3:2), this figure comes out to 20.6%, not much more. This should give you a good idea why I never bothered to actually implement that variant.

**) not entirely in the drawer: it is used in my Vector demo (rotating a icosahedron inscribed in a cube), the "Rotating Earth Globe" demo (see the *.d64 of the MG batch suite), in the Madness demo, and also in MINISKETCH, where it connects the single points that are returned from the lightpen each frame to a line traverse.
Re: Line drawing algorithm
Can be a very productive time figuring one out.
“One of my most productive days was throwing away 1000 lines of code.” (Ken Thompson)
I have a concept that looks very promising with throw-away-day dooming all over it. At least I arrived at only adding, subtracting, negating, swapping and look-up tables and some kind of 8.8 fixed-point math. Feels like doing bloody homework again. Did that with a cold, so there'd better be another round of reassurance and trying to science something out of it. But the new video mode is definitely a new kind of boost.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
https://github.com/SvenMichaelKlose
- Mike
- Herr VC
- Posts: 5130
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: Line drawing algorithm
You mean the 128x256 resolution brought into the discussion here by MrSterlingBS?pixel wrote:[...] the new video mode is definitely a new kind of boost.
I can tell you that the speed increase from saving the addition to the low byte (to get the next column address) is marginal with my faster line routine, as - as I already wrote - the address calculation/lookup already has been factored out of the inner loop.
The geometry engine I implemented for the vector cube/icosahedron demo was a first for me in 65xx code after an effective 16 year hiatus on coding for the VIC-20 (1986..2002). I put the focus on getting it working, with a good leeway on arithmetic precision (actually 2:14 fixpoint with 4:28 intermediate results), not so much on speed. Those are 20 rotated vertices (8 for the cube, 12 for the icosahedron) and 42 lines (12 for the cube, 30 for the icosahedron), around all three axes. Not actually using a 3x3 matrix but 3 single axis rotations, i.e. 12 multiplies instead of 9 so that could be streamlined. Perspective correction (and taking into account the PAR, for NTSC actually) needing another 2 multiplies per vertex (with lookup from a table of inverses). The mult/trig/inverse tables are rather big, and could also be streamlined for less size at the expense of accuracy. As long as the calculated screen co-ordinates still differ less than 1/2 pixel from the ideal value, that would be in order.Can be a very productive time figuring one out.
As I wrote somewhere else here in Denial, in the end the demo spends about the same time on geometry processing as it does on drawing the lines, these two tasks combined take about 80% of the total time. The rest of the time is spent in business logic, clearing the shadow buffer and copying the shadow buffer over to the display buffer when the current animation step has been completely rendered. As stated in the demo, it arrives at roughly 4 to 5 frames/second.
This very demo has been unchallenged for over 20 years now. So here's your benchmark: try to substantially better the frame rate with otherwise the same visual result.

Re: Line drawing algorithm
+20y! That's even longer than my game release delays! True: a kick in the glockenspiel is overdue.

But instead I'll keep tossing around bytes in what's to become the engine of a game, hoping to get away with rather primitive calculations, LOGO-style.
At the moment I'm… well… kinda throwing things away. But I managed to rotate a cube:
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
https://github.com/SvenMichaelKlose
- Mike
- Herr VC
- Posts: 5130
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: Line drawing algorithm
That cube must have been exposed to a rather strong gravity field in the vicinity of a black hole.pixel wrote:But I managed to rotate a cube: [...]

Re: Line drawing algorithm
That's the last working version. With the new thing I hope to be able to get the (8-bit) vertices of a rotated cube in about 600 cycles.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
https://github.com/SvenMichaelKlose
- MrSterlingBS
- Vic 20 Afficionado
- Posts: 302
- Joined: Tue Jan 31, 2023 2:56 am
Re: Line drawing algorithm
Hi,
After implementing the SBC/BCS technique, the line drawing routine became a little bit faster.
The results for a line are around 37 cycles per pixel. X1=10;Y1=10; X2=110;Y2=110
Screen sice 128 by 256 pixels.
The two dots are only to verify the start and end of the line.
Thanks a lot for the feedback.
BR
Sven
After implementing the SBC/BCS technique, the line drawing routine became a little bit faster.
The results for a line are around 37 cycles per pixel. X1=10;Y1=10; X2=110;Y2=110
Screen sice 128 by 256 pixels.
The two dots are only to verify the start and end of the line.
Thanks a lot for the feedback.
BR
Sven
- thegg
- Vic 20 Dabbler
- Posts: 75
- Joined: Mon Aug 30, 2021 4:49 am
- Location: England
- Occupation: retired
Re: Line drawing algorithm
Sorry, I've been following this with some interest, but I can't recall a SBC/BCS technique being discussed. Unless you are referring to Mike's chunky Bresenham snippet. Can you either explain further, or point me to the right post please.
- Mike
- Herr VC
- Posts: 5130
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: Line drawing algorithm
Also, regardless how it is implemented, with
as start and end co-ordinates of the line, the routine should only do diagonal steps. However, in the screenshot I see interspersed horizontal steps that are not supposed to be there. So either the screenshot uses another set of co-ordinates, or there is a flaw with the routine. I take the given cycle count with a grain of salt.MrSterlingBS wrote:X1=10; Y1=10; X2=110; Y2=110
Re: Line drawing algorithm
Ah, yes, looks like a carry flag sneaking in with each new char column.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
https://github.com/SvenMichaelKlose
- MrSterlingBS
- Vic 20 Afficionado
- Posts: 302
- Joined: Tue Jan 31, 2023 2:56 am
Re: Line drawing algorithm

Yes, there was an error in the routine.
I hope I can fix it today.
Sven
The routine is not finished yet and need further optimizations.But the principle works…
Here is the updated routine with no error. The screensize is now 168x192 pixels.