03:02imirkin: karolherbst: fyi, worked out the lowering for shared int64
05:06imirkin: or not...
05:06imirkin: HdkR: what am i doing wrong? https://hastebin.com/ecuhoduway.scss
05:06imirkin: it goes into an infinite loop, it would seem
05:09imirkin: f me.
05:09imirkin: just kidding.
05:09imirkin: apparently the id of the register matters
05:09imirkin: it doesn't just know what you want
05:11HdkR: what's all this then
05:12imirkin: phi fail
05:13imirkin: oh ... the stupid loop detect thing?
05:13imirkin: how did that work....
15:39emersion: i did more investigations wrt. my modifiers issue, here are my findings: https://gitlab.freedesktop.org/drm/nouveau/-/issues/36#note_765017
15:41imirkin: emersion: that high bit on pitch was "pitch" (vs blocklinear) i thought. did i get that wrong?
15:41emersion: yea sorry just fixed it
15:42emersion: PITCH is actually 0x01a
15:42imirkin: afaik the only requirement is that it's aligned to 256
15:43imirkin: but there could well be things i don't know ;)
15:43emersion: eh, that would explain it
15:43emersion: 6912 is divisible by 256, but not 6784
15:44imirkin: oh, so 0x1a iss actually not 6784?
15:44imirkin: it's 6656. heh
15:44imirkin: so yeah, then i'm guessing it gets mad that the pitch is lower than the width
15:44emersion: i'm very thankful to hw devs to raise INVALID_STATE on this :)
15:45emersion: where is the pitch chosen? libdrm? mesa?
15:45imirkin: no clue.
15:45imirkin: depends how you create the buffer
15:45imirkin: yeah, but more precisely?
15:46emersion: it goes GBM → DRI interface → Gallium → nouveau in mesa
15:46emersion: then → libdrm
15:46emersion: it's using gallium's resource_create callback
15:48imirkin: pitch_align == 128
15:48imirkin: and 6784 is aligned to 128. so the system works.
15:48imirkin: should make it say if scanout, then use 256
15:49imirkin: (this is called from nvc0_miptree as well, for linear layout)
15:50emersion: was looking at this :P
15:50emersion: makes sense, let's try it
15:51karolherbst: imirkin: btw.. I find very interesting races now that I've enabled tsan in libdrm as well :/
15:52karolherbst: so apparently we can start a nouveau_bo_wait while the bo has no pushbuf bound, but another thread sets it via pushbuf_validate :/
15:52karolherbst: which.. feels wrong
15:53karolherbst: or rather annoying to fix
16:22imirkin: emersion: oh, and iirc we only allocate linear surfaces when PIPE_BIND_SCANOUT is set
16:23emersion: hm, i'm now hitting an assertion failure in nvc0_user_vbuf_range
16:23imirkin: modifiers were kinda bolted onto nouveau, not properly integrated
16:23imirkin: i thought i fixed that
16:23imirkin: did you pull something semi-recent?
16:23emersion: yeah, do you have a commit hash?
16:23imirkin: and are you using instancing on client-side vbo's?
16:24emersion: hm, i don't know what that is
16:24emersion: i'm only doing extremely simple GL stuff
16:24imirkin: instancing = glDrawInstanced
16:24emersion: no, nothing like this
16:24imirkin: client-side vbo's == ... client side vbos...
16:24emersion: i don't use vbos
16:24imirkin: ok, then that's a yes.
16:24imirkin: you just pass a pointer to CPU-side data with vertices right?
16:25imirkin: aka "client-side"
16:25imirkin: do you do instancing?
16:25imirkin: oh crap
16:25imirkin: that was on nv50
16:26imirkin: you're on nvc0, so won't matter
16:26emersion: just glDrawArrays
16:26imirkin: yeah, so no instancing
16:26imirkin: instancing == glDrawArraysInstanced or something
16:26imirkin: so not affected by that bug on two separate counts
16:26imirkin: (a) different hw, (b) not using the affected feature
16:27imirkin: what's the assertion?
16:28imirkin: assert(nvc0->vb_elt_limit != ~0);
16:28imirkin: that guy?
16:28emersion: i think that's the one yeah, but gdb only indicates the function lineno
16:28emersion: it's clearly an abort though, and there's only one
16:29imirkin: that really shouldn't happen
16:29imirkin: with client-side buffers, there should always be limits...
16:29imirkin: perhaps something changed in mesa core
16:37imirkin: unfortunately i can't look into it right now
16:38imirkin: emersion: btw, the whole client side is from the idea of a client/server, where the client is sending GLX calls to a remote X server, and then the X server has a GL impl to do the rendering
16:39imirkin: but nowadays you can think of it a "client=CPU" "server=GPU"
16:43emersion: hm, so i checked out mesa 20.3.2 and tried to reproduce
16:43emersion: and got a different INVALID_STATE error
16:44emersion: oh wait it's still using align=128
16:44emersion: my PIPE_SCANOUT check is wrong
16:47emersion: oh, right, PIPE_SCANOUT is missing when modifiers are supplied
16:47emersion: i mean: oh, right, usage flags is missing when modifiers are supplied
16:49emersion: "DRM: core notifier timeout" means i should reboot, right?
16:51emersion: fwiw, i don't hit the assert with 20.3.2
16:51emersion: i could bisect if that helps
16:55emersion: oh eh, weston uses a BLOCK_LINEAR_2D layout and it works fine
16:58imirkin: so with blocklinear, there is no pitch
16:58imirkin: there are "gobs"
16:58imirkin: which are of various sizes, as specified by the tile mode
17:03emersion: ok, so with the patch and with 20.3.2 it doesn't crash, but just displays a black screen with some gray lines at the top
17:05imirkin: mission accomplished
17:06imirkin: don't worry about the bisect, mareko said he changed it
17:06imirkin: so i need to poke around to figure out how to adapt
17:07emersion: ah, i understand why wlroots picks LINEAR now
17:07emersion: opengl tells us only LINEAR is supported
17:07emersion: ah, no
17:13emersion: it's because the modifiers exposed by the gl impl don't intersect those exposed by kms
17:13emersion: opengl exposes 0x300…0010 through 15
17:13emersion: none of these support scanout
17:15emersion: aaand everything works fine if i don't intersect with the gl modifiers
17:16emersion: well, in any case, there's a bug when using LINEAR for the primary plane
17:16imirkin: due to the lack of layout right?
17:16imirkin: i mean, due to the 128-alignment
17:17imirkin: we should just align to 256 for PIPE_BIND_SCANOUT
17:17imirkin: (&& linear)
17:17emersion: well, that was one issue. the other is that i still get a black screen with gray lines
17:17imirkin: you're trying to render to this buffer, right?
17:17imirkin: and then display it?
17:17emersion: hm, yeah, but maybe there's a sync issue
17:18emersion: because i'm importing the buffer as a dmabuf, rendering to it, then calling glFinish()
17:18emersion: err, glFlush()
17:18imirkin: ah yeah
17:18imirkin: that doesn't do anything
17:18emersion: let's try to see if glFinish() improves things
17:18imirkin: glFinish should wait on a fence
17:19emersion: well it's supposed to
17:19imirkin: glFlush is "send out the commands you have queued up"
17:19imirkin: vs http://docs.gl/gl4/glFinish
17:20emersion: glFlush is also a barrier
17:20emersion: see e.g. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7384
17:20imirkin: yes, it's a barrier
17:20emersion: then with implicit sync the kernel should wait for the barrier to complete
17:20imirkin: but it still says nothing about when the flushed commands complete executing
17:20imirkin: yeah i don't think we have anything like that for the display pipeline
17:20imirkin: perhaps we should :)
17:20emersion: at least that's what daniels explained
17:21imirkin: implicit sync is whatever the driver wants it to be
17:21imirkin: in nouveau's case, when you submit commands to be executed, it will wait for stuff potentially
17:21emersion: well i'm open to suggestions wrt. rendering to a dmabuf
17:21imirkin: but here you're not submitting commands to be executed
17:21imirkin: you're just saying "flip to this buffer"
17:21emersion: how can i say to the driver that i want it to Do The Thing?
17:21imirkin: well, i think you're right - nouveau probably SHOULD Wait
17:22imirkin: but it totally doesn't :)
17:22emersion: ok :P
17:22imirkin: and adding such a wait could have various implications
17:22imirkin: like ... what does that mean for e.g. front-buffer rendering
17:22emersion: all other drivers i know of are doing it
17:23imirkin: i think there's another thing to go with dma-buf
17:23imirkin: i forget what it's called
17:23emersion: maybe explicit sync helps with it?
17:23imirkin: oh yeah -- syncfd
17:23imirkin: (i think?)
17:23emersion: sync files
17:23imirkin: which addresses this in some way
17:23imirkin: that's the extent of my knowledge though
17:23imirkin: so ... limited :)
17:24emersion: alright, so glFinish() fixes it
17:24daniels: imirkin: you don't execute flips in front-buffer rendering
17:24emersion: i'll submit the pitch align fix
17:25daniels: you provide DirtyFB if you like, but otherwise you just wait for scanout to catch up to whatever it is you haphazardly threw into the buffer
17:25imirkin: daniels: yeah, but how does the driver know what you want?
17:25imirkin: e.g. the first time you try flipping to it
17:26daniels: in the implicit sync model, it waits for the completion of any pending ops flushed to the kernel at the time a flip was requested
17:26imirkin: so if you're front-buffer rendering, it's not guaranteed to ever display :)
17:26imirkin: oh, i see. at the time it was requested.
17:26daniels: _at the time a flip was requested_
17:26imirkin: fair enough
17:26imirkin: so yeah. i'm pretty sure nouveau has none of that, but it totally makes sense that it should.
17:26daniels: then beyond that you don't repeatedly flip, you just call DirtyFB as a courtesy to let the kernel know of the damage region
17:27daniels: used e.g. if you have 'smart' (i.e. caching-sink) displays
17:27emersion: is the explicit sync stuff completely in DRM core, or are there things the driver needs to implement?
17:27imirkin: daniels: or command-mode panels?
17:27daniels: imirkin: exactement
17:27daniels: emersion: the core has helpers for it
17:27imirkin: emersion: i'd say it's completely in the driver :)
17:27emersion: ok, rip
17:28imirkin: both kernel- and driver-side
17:28imirkin: er, both kernel- and user-side
17:28daniels: drm_gem_fb_prepare_fb() is the generic helper, but I would assume nouveau rolls its own everything so would need to insert its own wait
17:28imirkin: like you can't just remove it from nouveau and expect things to work -- we assume it's there, it's an active part of the synchronizaton strategy
17:29emersion: for some reason i thought DRM_CAP_SYNCOBJ only depended on a recent enough kernel, but indeed nouveau doesn't have it
17:30imirkin: you need a kernel so recent, the patches to support it haven't been written yet :)
17:30imirkin: i do think skeggsb was looking at it at one point
17:30imirkin: but i dunno what the results, if any, were
17:31emersion: would adding the implicit wait be tricky?
17:31imirkin: SHOULDNT be, but ... could be :)
17:31imirkin: i dunno what the right way to implement it is, tbh
17:31emersion: as always :)
17:32imirkin: the display core has all this "interlock" logic
17:32imirkin: perhaps you can just wait on the fence that way directly
17:32imirkin: if not, you have to get more creative and do the wait somewhere else
17:33imirkin: #define NV827D_UPDATE_NOT_DRIVER_FRIENDLY 31:31
17:33imirkin: #define NV827D_UPDATE_NOT_DRIVER_UNFRIENDLY 30:30
17:33imirkin: don't call it "DRIVER_FRIENDLY", just not as unfriendly as the other thing
17:34imirkin: (i have no clue what those do)
17:34imirkin: (it was just funny)
17:35imirkin: emersion: ok, so the display stuff *does* have SET_SEMAPHORE_CONTROL stuff
17:35imirkin: (which is aka fences)
17:36emersion: unused right now?
17:36emersion: how does it work when *not* using dmabufs, and using just swapBuffers?
17:36imirkin: looks like it does set those
17:41imirkin: i'm probably blind, but i don't actually see where the semaphore stuff is set, if ever
17:41imirkin: could be very clever grep-evading macros
17:41imirkin: but even grepping for 'sema' doesn't reveal it
17:43daniels: you won't have native semaphores if you're doing presentation from another device
17:43daniels: so those would only ever be an optimisation, not the sole path
17:44imirkin: if it's from another device, how can we sync?
17:44imirkin: (a) i'm pretty sure we can't display from sysmem
17:45imirkin: i don't think there's a need for others. basically we're copying this stuff to vram first, then displaying. sync is out.
17:45imirkin: (and this might not work with dma-buf at all, not sure)
18:07imirkin: emersion: yes - scanout is always 256
18:07imirkin: nv50 and nvc0
18:07imirkin: and nv30 too i'd imagine
18:07imirkin: but it might already do that
18:08emersion: yeah sounds like it checks for that already
18:08emersion: kind of -- screen->eng3d->oclass >= NV40_3D_CLASS ? 1024 : 256
18:09imirkin: int pitch_align = MAX2(
18:09imirkin: screen->eng3d->oclass >= NV40_3D_CLASS ? 1024 : 256,
18:09imirkin: found it at the same time. anyways, it's covered.
18:19emersion: pushed with a nv50 patch as well
18:22emersion: thanks for your help!
18:24RSpliet: emersion: glad you were able to scratch this itch! feel free to come back if you have more itches ;-)
18:25emersion: well i'll definitely come back, need to get that implicit sync issue fixed :P
18:32emersion: opened this for the glFlush issue: https://gitlab.freedesktop.org/drm/nouveau/-/issues/41
18:36imirkin: you'll have to catch skeggsb for working out how to properly address this
18:36imirkin: he knows the hardware and kernel driver inside out
18:56karolherbst: this pcli->kref[bo->handle].push is annoying :/
18:57imirkin: i think i mentioned my thoughts on libdrm's suitability...
18:59karolherbst: apparently using bufctx all the way should solve it, but I don't see how it would solve this particular case where waiting on/mapping a bot has this silly data race
18:59karolherbst: I mean.. I got the pushbuffer race thing all figured out, now it's just those random things
19:03daniels: imirkin: ‘how do we sync’ - you wait until the notification in the kernel
19:04daniels: I mean, this has to work, otherwise every dual-GPU case e.g. laptops would need manual app-side detection and glFinish ...
19:04daniels: (see also non-x86 systems with separate GPU & display)
19:04imirkin: daniels: PRIME just doesn't have vsync
19:05imirkin: i mean, i understand how you'd *like* it to work
19:05imirkin: but currently it's just not a thing with nouveau
19:05daniels: well yes, X11 is not the poster child for perfect display
19:05karolherbst: X11 has tearing? ... so what?
19:05karolherbst: just remove X11, problem solved
19:05HdkR: fbdev, the future of Linux
19:05imirkin: or learn to stop worrying and love the bomb
19:06imirkin:doesn't care about tearing
19:06karolherbst: I'd close all X11 bugs as -EWONTFIX
19:08karolherbst: ohh, mhh, we shouldn't call pushbuf_validate.. right
19:08karolherbst: that was the problem
19:09karolherbst: okay.. so I can ignore that race until I convert to the bufctx stuff
19:09daniels: imirkin: anyway, long story short, the function I pointed to earlier is what you need to painlessly handle sync whether it's local or foreign
19:09daniels: you can specialise local into the semaphore pipeline if you like
19:09imirkin: daniels: ok, but ignoring sync, i don't think we can scanout foreign buffers in the first place
19:09daniels: but that will only work for nv50(?)+, since it's part of atomic :P
19:10KungFuJesus: imirkin: have any things for me to try on NV43 BE yet?
19:10imirkin: for reverse prime, userspace makes copies
19:10imirkin: KungFuJesus: sorry, no. been looking at other things. i haven't forgotten though
19:10imirkin: KungFuJesus: unrelated - would you be willing to try some kernel patches which mess with display stuff?
19:11imirkin: nouveau uses some "legacy" stuff which should be fixable, but i'm also not willing to just do it without testing
19:41ccr: probably not needed, but I can test at least kernel stuff on my craptop (NV4E, GeForce Go 6150). Mesa also, if needed, but will have to spin few scripts etc. as I can't be assed to actually compile anything on the target itself :P
19:41ccr: it's not BE of course
19:41imirkin: yeah, this is for BE things specifically
19:42imirkin: basically the kernel was super-loose about how it handled stuff for BE. this got tightened up, but without fixing the drivers, the "old" thing had to stick around too
19:42imirkin: i'd like to move nouveau to the "new" thing
19:42imirkin: but without BE to test it on ... it's a wee tricky
19:43ccr: understandably. not surprising either that BE stuff hasn't been tested much, probably not many users for it.
19:44imirkin: my G5 died =/
19:44imirkin: stupid power supply
19:44imirkin: and the thing is built in such a way that it's basically impossible to remove
19:54HdkR: Oh jeez, just looked up the ifixit guide for it. That's terrible
19:54imirkin: basically you just need some dynamite
19:54HdkR: And of course you'd have to get a donor PSU since it isn't standard so that means ebay to find the correct one
19:54imirkin: and then maybe...
19:55imirkin: HdkR: yeah, i tried to follow one of those guides
19:55imirkin: but was unsuccessful at removing it
19:55imirkin: i failed at like the first step :)
19:58imirkin: and i'm not like mechanically incompetent ... it's very tough
20:00RSpliet: What? IFixit actually *sells* power supplies for a G5?!
20:00HdkR: Nah, they just have a guide submitted by customer
20:01HdkR: er, consumer
20:01RSpliet: They sell a power supply
20:02HdkR: Oh, that's an iMac G5. I was thinking PowerMac G5
20:02imirkin: i have a powermac g5
20:03imirkin: PowerMac7,2 iirc
20:03imirkin: or 7,3
20:03RSpliet: oh right
20:03ccr: hmm. now I wonder... we've got 2 or 3 Power Mac G5 machines at work that are going to trash, they've also been stripped of some parts, but I don't know what's left in them. no idea what model they are either.
20:04imirkin: you can be sure they still have their power supplies! :)
20:04ccr: heh, probably, but no idea if those work :)
20:04imirkin: unless someone got a sawzall to rip them out
20:05ccr: I think the RAM and graphics card and such have been taken out at least.
20:06ccr: amazingly enough there's at least one, maybe two machines that were actually even in use at least in 2019. no idea if they are now. probably should work, but I can't go and install Linux in them, unfortunately. :P
20:08ccr: is it possible to boot those things from USB?
20:08ericonr: I know some macs require you to burn a CD
20:08ericonr: no idea if that's the case
20:09imirkin: i did tftpboot
20:09imirkin: but you have to have an apple keyboard if you want to do that
20:09imirkin: coz you have to press some sort of key-combo on boot to enter OF
20:10ccr: I see
20:10imirkin: i don't remember the other boot options. i'm sure CD works, USB was less common for booting back then
20:10ccr: yeah. was wondering if live USB linux was a possibility.
20:10karolherbst: mhh, race on nvc0->screen->cur_ctx :/
20:10imirkin: you'd need to do some research
20:10karolherbst: sounds annoying to fix
20:11karolherbst: macs can boot from USB since forever
20:11karolherbst: I think
20:11karolherbst: at least from USB 2.0
20:12imirkin: yeah, i don't think USB 2.0 was a thing
20:12ericonr: https://voidlinux-ppc.org/ (if you need a distro)
20:13ericonr: and folks at #voidlinux-ppc should know about booting the G5
20:13ericonr: I think someone there even has one
20:14ccr: USB 2.0 ws circa 2000, though obviously implementations came later
20:14karolherbst: I know I bought a USB 2.0 PCI card for my G4, but... not sure if I was able to boot from it :D
20:14ccr: wikipedia lists USB 2.0 for all G5 models there
20:15imirkin: ok. well i never tried.
20:15imirkin: i treated it like my other ARM boards
20:15imirkin: booting from tftp was easiest :)
20:15imirkin: it was probably about as powerful ;)
20:18ccr: I once had one of those x86 thin client PCs, a HP thing iirc, that I booted from tftp, then ran the rest from a USB stick because the internal storage in the hardware was not supported by linux. used as a webcam streamer pointed at plants. :P
20:19ccr: somewhat awkward setup
20:37KungFuJesus: imirkin: Sure, sorry was stuck in a meeting for work
20:37KungFuJesus: I'm on kernel 5.7 currently but I can bump it up to 5.10 I believe
20:38imirkin: KungFuJesus: well, this would be a custom kernel with patches
20:38imirkin: any semi-recent kernel would be fine
20:38imirkin: (anything from like ... 4.4 or later)
20:38KungFuJesus: right, I just meant for what you're targetting
20:38imirkin: that code doesn't change very often
20:38KungFuJesus: but yeah, it's not a production machine or anything and backing up the kernel is not a big deal for me
20:38imirkin: anyways, i think i have some patches, will try to put them together
21:22emersion: hm, so nouveau uses dma_resv_get_excl_rcu in prepare_fb
21:23emersion: drm_gem_fb_prepare_fb does the same
21:23emersion: so sounds like this part is fine
21:37imirkin: well, we have to make sure we wait on one of our implicit sync fences
21:37imirkin: i have zero recollection as to how they're tracked
21:37imirkin: i think there may have been a point in time when i looked at them
21:37imirkin: but that was at least 5y ago
22:01imirkin: emersion: some helpful things...
22:01imirkin: asy = "assembly"
22:01imirkin: asyw = "assembly wndw", asyh = "assembly head"
22:01imirkin: wndw = basically, an image. head = basically, an output.
22:02imirkin: unfortunately there are many generations of display hw, and some of them do some things differently
22:02imirkin: so it doesn't always map 1:1 to a particular generation
22:04emersion: what is an assembly?
22:05imirkin: it's the state that's being built up
22:05imirkin: by the driver
22:05imirkin: but hasn't been written out to the hardware
22:06imirkin: (perhaps not THE best term, but just explaining the var names you might see)
22:06imirkin: coz it took ben explaining that convention about 3x to me before i actually remembered it :)
22:18emersion: i see, thanks :)
22:20emersion: Lyude: btw, feel free to CC me on these nouveau patches
22:21Lyude: emersion: np, meant to send them out last night but I was distracted by america
22:21emersion: no worries, take your time