13:00 tagr: karolherbst: no, turns out we don't always flip
13:00 karolherbst: ehh :/
13:00 tagr: we end up flipping, or at least "synchronizing" when there's a GLX application running and that presents
13:01 tagr: it's not actually flipping in that case, but at least some callback (ms_present_flush()) is being called
13:01 tagr: karolherbst: do you know if there's any reason why we don't do double-buffering with flipping in X with modesetting?
13:02 tagr: I'm wondering if I should just add a very normal eglSwapBuffers()/do_pageflip() implementation, which at this point seems to be the cleanest thing we can do
13:03 tagr: this has the downside that it requires X to be modified, but perhaps that's acceptable
13:03 karolherbst: tagr: probably latency?
13:04 karolherbst: I think I'd discuss such changes with X developers first as it's also questionable if those can be backported or not (or if distribution would pick them up)
13:04 tagr: yeah, I'll ask over on #xorg-devel
13:05 tagr: I wonder though, if they can't be backported (and they're likely going to be fairly big), if it's even worth implementing, because with 1.21 there will be modifiers support anyway and this will likely not be used anymore
13:05 karolherbst: right
13:05 tagr: in general it might be nice to have properly vblank-synchronized rendering, though
13:06 karolherbst: I have an idea
13:06 karolherbst: maybe we should focus on fixing modifier unaware wayland compositors first?
13:06 tagr: that should already work with my latest Mesa patch
13:06 karolherbst: did you submit that already?
13:07 tagr: modifier unaware wayland compositors will still properly synchronize on end of frame, so ->flush_resource() ends up getting called
13:07 karolherbst: would like to get that in asap as this would make gnome work out of the box
13:07 karolherbst: and probably other waylan defaulting desktops
13:07 tagr: okay, let me clean up the patch a little and submit it
13:07 karolherbst: cool, thanks
13:08 tagr: I was kind of hesitant to submit it as-is because there's still this known issue with windowed X
13:08 tagr: there's one more workaround that I could think of that might work and that's something that James Jones had hinted at
13:08 karolherbst: yeah... but I think we can always fix it on top, even if changes to the original fixed are required
13:08 tagr: so there's a couple of glFlush() sprinkled around glamor, and that results in ->flush() getting called on the Mesa context
13:09 karolherbst: mhhh
13:09 tagr: now, we could potentially create a list within the context of all resources requiring a de-tiling blit and then just do the blit everytime ->flush() is called
13:10 tagr: I suspect that's going to be pretty miserable in terms of performance because from what I can tell, this ->flush() is called on the order of 5-10 times per frame
13:11 karolherbst: yep
13:11 karolherbst: and applications might flush _and_ page flip
13:11 tagr: yeah, some do, actually
13:12 karolherbst: maybe we should approach this from an API perspective, and in GL the most obvious way of making sure stuff gets presented is to page flip... but the other option is front buffer rendering + flushing, anything else?
13:15 tagr: karolherbst: anything that relies only on front buffer rendering has the additional problem that we need extra API to actually flush the front buffer
13:15 tagr: karolherbst: I did a quick proof of concept that basically adds gbm_bo_flush(), which I guess isn't really desirable in the general case
13:15 karolherbst: mhhh
13:16 tagr: so that got me as far as getting windowed glxgears to work, but it would then only update the screen as long as glxgears was running, so it's not really useful
13:17 karolherbst: heh.. I bet
13:17 tagr: hence why I keep coming back to double-buffering and swapping/page-flipping
13:17 karolherbst: yeah.. I think at this point to talk with X developers about this as they might have a good idea on how to handle this situation
13:18 karolherbst: I am also inclined to say we need modifier aware or page flipping applications in order to do anything useful, but that leaves Xorg out as I bet we won't get a 1.21 release anytime soon :/
13:19 karolherbst: tagr: did you check if glxgears was working on modifier unaware wayland compositor?
13:19 karolherbst: wondering if we need additional fixes for xwayland from 1.20 or not
13:20 tagr: hm... I'd have to check, not sure how I'd reproduce that
13:20 karolherbst: I can also check once you create the MR
13:21 tagr: there's now support in Weston's git for disabling the modifier support, but I'm not sure it's emulating exactly modifier-unaware behaviour
13:21 karolherbst: I know that mutter is modifier unaware by default
13:21 tagr: because I think it ends up basically still using ->resource_create_with_modifiers(), only it's passing in DRM_FORMAT_MOD_LINEAR
13:22 karolherbst: anyway. I can test that on my end, wouldn't be a problem
13:22 tagr: I could try to build mutter in my development environment, but that might take me a while
13:22 tagr: yeah, perhaps I should first clean up the code and send it out
13:23 karolherbst: maybe you want to base your image on a real distribution instead, would make a lot of things less painful :)
13:23 karolherbst: I build my own kernel but just use a stock fedora userspace and create the image with l4t scripts :D
13:24 tagr: karolherbst: I've tried that a couple of times but then I usually ended up in a situation where I had to rebuild a bunch of things anyway and that was all really messy
13:24 karolherbst: ahh, I see
13:25 tagr: this way at least I get to cross-compile everything and once I have all build scripts and dependencies in place it just takes a one-liner to rebuild everything
13:25 tagr: it's pretty neat, except that I have to write a new build script everytime I add something new
13:25 karolherbst: use gentoo and build your tegra distribution :p :D
13:26 tagr: I sometimes use stock distributions, but only if I know that I don't have to modify any userspace
13:26 tagr: karolherbst: this is kind of like gentoo, but more tailored to my specific workflow
13:26 karolherbst: well, that's where gentoo helps :p you can just apply custom patches to every package by just placing them into the correct direcoty
13:26 karolherbst: yeah... just needs maintaining I guess
13:27 karolherbst: oh well
13:27 tagr: I added a nice little feature to this system where I can basically create a symlinks directly into a working copy and build from there
13:27 tagr: which is really cool because I don't have to write the code, generate a patch and then start the build
13:28 karolherbst: true
13:28 tagr: anyway, not perfect, but works really well in most situations
13:34 imirkin: tagr: kmscube may be easier to test with the precise modifier setup you want to play with
13:35 tagr: imirkin: yeah, I've been using that for very basic testing, but the case that karolherbst was referring to is glxgears on top of non-modifiers aware Wayland compositors
13:35 tagr: since the code paths are slightly different there
13:36 imirkin: tagr: sure, but you could make kmscube modifier-unaware
13:36 imirkin: i guess it's a little different, but somewhat similar?
13:36 tagr: I suspect that Xwayland will work, though, because the compositors must by definition do the correct synchronization
13:36 tagr: imirkin: yes, I do have a version of kmscube that is modifier-unaware
13:37 tagr: imirkin: actually, I do have local patches that implement more Weston-like handling of modifiers in kmscube that I never got around to submitting
13:37 tagr: although... I think someone might have merged something similar upstream, I seem to recall
13:39 imirkin: afaik upstream kmscube does have some sort of modifier support
13:39 imirkin: i've largely stayed away from groking the implications of modifiers
13:39 imirkin: which is probably a large part of the reason we find ourselves in the current situation ... sorry =/
13:44 tagr: imirkin: there's definitely a way to manually set a modifier, but ideally kmscube would query the KMS driver for a list of supported modifiers
13:45 tagr: I'll try to dig out that series after I've sent out the Mesa patch
13:45 imirkin: ah yeah. i guess kmscube is meant as a simpler demo app
13:45 imirkin: but pretty soon it'll be more sophisticated than gnome/plasmashell :)
13:47 tagr: yeah, applications that are primarily used by developers can tend to grow lots and lots of bells and whistles =)
14:26 imirkin: AndrewR: hey, so escalating Xorg cpu usage fixed by those patches?
14:28 tagr: karolherbst: ironically the one use-case that is now broken is the one use-case that used to work by accident
14:29 karolherbst: ehhh :/
14:29 tagr: I wonder if we can somehow detect that we're running in X with modesetting and take advantage of that situation to just not do detiled blits
14:51 AndrewR: imirkin, I think yes: after two days of uptime X only eats 4% (out of full 3.4Ghz cpu, butstill much better than before). qemu 'bug' still around ....but profile points at mesa/libdrm ..
14:51 imirkin: AndrewR: qemu "bug" is just cpu usage *while qemu is running* right?
14:51 imirkin: AndrewR: and you don't see any errors about 100+ stuck events right?
14:52 imirkin: AndrewR: "Event handler iterated %d times" -- that one -- that's death if you hit it
14:58 AndrewR: imirkin, no, no such messages ... qemu sort of stabilized at eating 99% of cpu ...but initially it was only eating like 40% out of 1.4Ghz core ..now it eats 98% out of 3.4Ghz ...
15:02 bencoh: .59
15:02 bencoh: (woops, hey)
15:06 imirkin: AndrewR: ok. qemu sounds like it's hitting some slow copying paths
15:06 imirkin: or ... dunno
15:07 imirkin: the fact that so much cpu is going to libdrm_nouveau is odd.
15:08 imirkin: worth figuring out why, but i think unrelated
15:11 AndrewR: imirkin, may be vram vs gtt fragmentation/growth? I'll retry with second GPU as dest device (it has 1Gb vs 384 in my primary card)
15:12 imirkin: that cpu usage would not appear in libdrm_nouveau
15:13 AndrewR: mmm...may be? I mean, slower memory will result in more waiting? (and fences about waiting on events, no?)
15:14 imirkin: most of that is not cpu waits
15:14 imirkin: i.e. not cpu usage.
15:25 tagr: karolherbst: ouch... now it looks like X from git is also broken, though I think that's because it's no longer trying to use modifiers at all
15:25 tagr: not sure what happened
15:26 tagr: karolherbst: do you want me to send out the MR anyway so that you can take a look? probably better to hold off on merging it until it's seen more testing anyway
15:34 karolherbst: you can mark it as "WIP: "
15:39 Lyude: ahaha
15:39 Lyude: so nvidia confirmed the maxclk caps stuff is actually quite wonky
15:40 Lyude: and apparently as well we probably can just drop the interlaced cap checks and assume it's not supported anywhere
15:40 imirkin: Lyude: that's not surprising
15:40 karolherbst: *for DP :p
15:40 Lyude: karolherbst: yeah :p
15:41 imirkin: i was a bit surprised that ben went ahead with all that cap stuff
15:41 imirkin: it always seemed like a bunch of lies to me, but who knows
15:41 karolherbst: oh well
15:41 Lyude: imirkin: supposedly the dp interlace was just put there by hw engineers because they thought they might be able to add it to hw someday, but never actually bothered to
15:42 imirkin: hehe
15:42 karolherbst: people there must have loved interlaced
15:42 karolherbst: I would just stop listen if somebody requests interlaced support in hw :p
15:42 Lyude: karolherbst: tbh-it does at least make sense if nvidia were ever to go the route of removing native hdmi encoders like intel did
15:42 Lyude: since you'd still want to support interlaced for adapters and such
15:42 karolherbst: right....
15:45 Lyude: anyway - it does seem like they do read -some- of it, andy said they were going to try to do a writeup of what the proprietary driver does
15:55 qeeg_: hey, i've been working on my nv3 emulation for 86box, and i'm trying to figure out if maybe my ramht hashing algorithm is bad? currently i have this return (handle ^ (handle >> 8) ^ (handle >> 16) ^ (handle >> 24) ^ chanid) & 0xff;
15:57 imirkin: qeeg_: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/core/ramht.c#L26
15:57 imirkin: this is what we use in nouveau.
15:57 qeeg_: that's for nv4+
15:57 qeeg_: i need nv3
15:57 imirkin: i have no additional information.
15:58 imirkin: (the nv4+ thing appears to be different than yours though)
15:58 imirkin: e.g. the chanid is shifted up some bits.
15:58 qeeg_: i'd think maybe the right info for nv3 would be in the riva_* files in here but i dunno https://github.com/freedesktop/xorg-xf86-video-nv/tree/master/src
15:59 imirkin: maybe.
16:00 qeeg_: no results when searching for hash or ramht in the entire repo though...
16:00 qeeg_: which is strange
16:02 imirkin: not sure whether -nv actually supported nv3
16:02 imirkin: you can check the in-kernel rivafb
16:02 imirkin: which is an fbdev driver which i *think* supported nv3
16:02 imirkin: but not 100% sure
16:03 imirkin: qeeg_: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/video/fbdev/riva?h=v5.9-rc1
16:03 imirkin: appears to support NV3.
16:09 qeeg_: it seems both of them only support using fixed tables for RAMHT
16:10 imirkin: they're derived from the same codebase
16:16 qeeg_: imirkin, what exactly is the order_base_2 function supposed to do, in layman's terms?
16:16 qeeg_: just a basic int log2 or?
16:17 imirkin: yeah, ilog2 +/- 1
16:18 imirkin: i.e. what is order_base_2(1)? 1 or 0. dunno.
16:30 qeeg_: imirkin, weird, just messing with the hash function changes what's sent to method 0
16:30 qeeg_: handles still aren't found for the majority of the method 0 writes though...
16:53 imirkin: qeeg_: might be easier to run linux + rivafb or something?
16:53 imirkin: i.e. easier to debug
16:53 imirkin: since you're in control of both sides
16:53 qeeg_: i dunno
16:57 qeeg_: interesting... it's writing to RAMHT shortly before sending these objects along
16:58 imirkin: i think that's normal
16:59 qeeg_: however it's writing to a completely different location than i'm calculating as hashes
16:59 qeeg_: like, 0x4610 and i'm calculating 0x10
17:00 imirkin: you have a & 0xff in there
17:00 imirkin: whereas we have a u32. dunno if that matters.
17:00 imirkin: or maybe you're picking the wrong offset
17:03 qeeg_: i'm not anding with 0xff anymore
17:03 qeeg_: now i'm using the exact same code as nouveau, just massaged slightly...
17:03 qeeg_: here's my current code :p https://hastebin.com/dodiyejoge.cpp
17:04 imirkin: that ramht_bits thing seems wrong
17:04 imirkin: 4096 should be 12 or 11
17:04 qeeg_: ah
17:05 imirkin: or 13
17:05 imirkin: heh
17:05 imirkin: er no. definitely not 13.
17:05 imirkin: probably 12
17:05 qeeg_: gimme like, a table of values i should be using, and then i'll try it :p
17:05 imirkin: just look at what order_base_2(4096) will return
17:05 qeeg_: i did lol
17:05 imirkin: heh
17:06 qeeg_: actually
17:06 qeeg_: it did order_base_2(4096 >> 3)
17:06 qeeg_: for... some reason :/
17:07 imirkin: oops
17:07 imirkin: we really should fix that order_base_2 thing -- i thought i had, but it must have reappeared
17:07 imirkin: that should just call ilog2 or ffs or whatever
17:08 qeeg_: well, anyway
17:08 qeeg_: still didn't fix it :c
17:09 imirkin: so it sounds like it should be 11 for 4096?
17:11 qeeg_: yeah that's not working either...
17:12 qeeg_: here's more of my code https://hastebin.com/enelixowop.cpp
17:30 qeeg_: any clues, imirkin?
17:30 imirkin: haven't been looking, it's not something i know in any detail
17:31 imirkin: if i were trying to work it out
17:31 imirkin: i'd take the inputs and expected hash values
17:31 imirkin: and then tweak the algo until it gave me that.
17:32 imirkin: the basics are right, but some detail must be slightly off.
17:33 imirkin: 0x4610 should never be an option -- that's a lot of bits.
17:33 imirkin: so perhaps you're looking at the value wrong
17:40 qeeg_: lessee here
17:41 qeeg_: handle 0x1050 channel 0 is giving me a hash of 0x290
21:16 pmoreau: imirkin: How does the hw know to look at s==2 rather than s==1 for example, inside the tic/tsc array (if I understand it right)? https://gitlab.freedesktop.org/mesa/mesa/-/blob/1ccd681109e80516430a3be489dca1be15316d50/src/gallium/drivers/nouveau/nv50/nv50_tex.c#L312
21:16 pmoreau: I looked at all the usages and definitions of the different arrays containing sampler/texture data, but I still can’t get the index change to work properly.
21:20 imirkin: pmoreau: hw doesn't know or care
21:20 imirkin: (let me check ... heh)
21:20 imirkin: oh that.
21:20 imirkin: right
21:20 imirkin: like i said - hw doesn't know or care
21:21 imirkin: HOWEVER
21:21 imirkin: the shader knows and cares.
21:21 imirkin: so this data has to line up with what the shader does
21:21 pmoreau: Okay, and where would that be specified in the shader? O:-)
21:22 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp#n682
21:22 pmoreau: I was wondering if it might be in the shader, but only had a quick look and failed at finding it.
21:22 imirkin: so yeah. pretty dumb code.
21:22 imirkin: basically nv50 doesn't have support for sample-indexed texture fetches. so we fake it.
21:22 imirkin: but to fake it, we need to know how many samples there are in a particular texture
21:22 pmoreau: Obviously, I looked at nv50_ir_from_tgsi.cpp and nv50_ir_emit_nv50.cpp, but I should have looked at the intermediate step. xD
21:23 imirkin: well. i added that code, so i should hopefully remember about it =]
21:23 pmoreau: ;-)
21:23 pmoreau: Awesome, thanks for the tip! Let me try that…
21:24 imirkin: that seems like _really_ dumb code btw
21:24 imirkin: i can't believe i wrote that
21:24 imirkin: why not just switch (type) { this: off += that, that: off += otherthat }
21:24 imirkin: whatever
21:24 imirkin: i was probably just excited that msaa textures were working
21:25 imirkin: and i didn't know jack shit about any of this stuff at the time
21:39 pmoreau: Sadly it looks like there is something else wrong (probably not related to MS, but just textures in general). I will have to leave it be for today though.
21:41 imirkin: probably something else relying on the order
21:41 imirkin: but in a slightly sneakier way
21:41 imirkin: pmoreau: oh right
21:41 imirkin: so
21:42 imirkin: BEGIN_NV04(push, NV50_3D(BIND_TIC(s)), 1);
21:42 imirkin: the board kinda cares about it
21:42 imirkin: and i guess it expects that particular order
21:42 pmoreau: :-/
21:42 imirkin: <reg32 offset="0x1448" name="BIND_TIC" length="3" stride="8"><!-- VP, GP, FP -->
21:43 imirkin: so ... yeah. s kinda has to be that
21:43 imirkin: you can either use that for all your ordering
21:43 imirkin: or you can remap it here
21:43 pmoreau: Well, I guess there is no fighting the hardware then. :-D
21:45 RSpliet: Hardware has a tendency of winning
21:45 pmoreau: How would I be able to remap it there? Or you mean changing BIND_TIC to do like `BIND_TIC(1) == 2 * offset` and `BIND_TIC(2) == 1 * offset`?
21:48 pmoreau: I think I’ll keep the current ordering for textures then, and instead change the ordering of constbufs to match.
21:49 imirkin: pmoreau: i just mean instead of passing in s
21:49 imirkin: pass in the remapped value of s to be the thing the hw wants
21:50 imirkin: i.e. instead of BIND_TIC(s) you do BIND_TIC(t)
21:50 imirkin: where t = the remapped value of s
21:50 imirkin: and obviously same deal with TSC
21:50 imirkin: however it may be simpler to just store it in the order the hw wants and not mess with it?
21:50 imirkin: and dump compute into [3]
21:51 imirkin: instead of doing all the messing which you're doing for imho little benefit
21:52 pmoreau: I’ll just keep the ordering the hw expects
21:53 pmoreau: But I’ll still try to change it for constbufs (I don’t think the hw expects a particular order there?) for consistency as currently for textures/samplers 1->geo & 2->frag, but for constbufs 1->frag & 2->geo.
21:54 imirkin: not sure
21:56 pmoreau: I will see how loud the hardware complains tomorrow (or much I do, depending on how it goes). :-D
22:18 imirkin: pmoreau: but ... what's the motivation to touch the ubo stuff?
22:18 imirkin: just leave it all alone, and use [3] for compute and be done with it