00:02aphysically: I appreciate the information; I'll pass along to the relevant ffmpeg people
00:08zmike: mareko: hot off today's test branch https://pastebin.com/ubuT4AHA
00:08zmike: I'm coming for you
00:09bnieuwenhuizen: zmike: I like how it is faster with textures
00:09zmike: zink is just a hungry driver
00:09zmike: it can't be bothered with lesser workloads
00:11mareko: zmike: soon it'll be 50 million
00:11zmike: the one from mesa-demos is at 50 million
00:11zmike: I don't know why they have such different numbers
00:15zmike: based on the fact that a cursory glance shows them to be very similar, anyway
00:15mareko: it's glDrawArrays, which is always faster
00:17mareko: you can certainly catch up with draws only
00:17zmike: I'm actually not sure how much more I can do at this point tbh
00:18zmike: I'm more or less just looping over emitting draw packets and nothing else now
00:18bnieuwenhuizen: zmike: time to start GPU limited perf then? :)
00:18mareko: you just need to remove that if (!num_textures) usleep(1);
00:18zmike: I totally forgot that I had that
00:18zmike: bnieuwenhuizen: if I get some ideas there then I will, but otherwise I'm totally stumped atm
00:19mareko: vertex attribs are slower
00:19zmike: slower than what?
00:22mareko: how do you submit multi draws to radv?
00:22zmike: I wrote an extension
00:23zmike: it provides vkCmdDrawMultiMESA
00:23zmike: and an indexed version
00:31zmike: the !textures case is indeed confusing though because it's not a fluke
00:31zmike: that's just how it runs
00:56mareko: for back-to-back draws, you seem to be bound by the mesa frontend like us
00:57mareko: zmike: what's your CPU?
00:57zmike: mareko: i7 6700k
01:06mareko: I'm mostly done decreasing the frontend overhead for back-to-back draws by 50%, so it should be easy to hit 50 mil. draws/s without no_error for DrawElements
01:06jenatali: Out of curiosity, has anyone tried fuzzing Mesa's SPIR-V parser?
01:07zmike: mareko: nice, I'm looking forward to doing some profiling after I pull in your atomics reduction series since I've been seeing that stuff show up a lot
01:07zmike: though also the draw-merger takes up a kinda crazy amount of cpu time in the driver thread
01:08mareko: zmike: that's because it's merging 256 draws and giving you 1 multi draw, try them separately :)
01:08bnieuwenhuizen: jenatali: I think there were some people at Google who did some of it
01:08airlied: jenatali: there are some hits for spirv-fuzz
01:08airlied: in the issues tracker
01:08bnieuwenhuizen: even added some CTS cases :)
01:08zmike: mareko: I have :)
01:08jenatali: Ah cool
01:09zmike: mareko: the pre-tc times were/are rough for zink
01:10mareko: zmike: if we fold vertex buffers into multi draws, it might be beneficial to have a merger thread
01:10zmike: I've been considering doing a rewrite from vertex attribs -> ubos, but if you're looking at that kind of thing then I'll probably hold off
01:10mareko: ubos don't have formats
01:11zmike: yeah, it'd end up being kinda gross I imagine
01:11zmike: but the overhead from changing vertex buffers/bindings is pretty substantial
01:12zmike: not something on my near-term radar in any case since I haven't been cpu limited for a while
01:12zmike: need to figure out a gpu profiler that works without wsi support
01:14mareko: you need a better GPU
01:15mareko: (good luck buying one)
01:15zmike: not yet I don't, radeonsi is still pretty far ahead of me in unigine heaven
01:16zmike: a better gpu won't fix that
01:18mareko: both radeonsi and radv can create RGP traces, which can be compared side by side in our windows app
01:19zmike: they can, but only if your driver has wsi support
01:19zmike: which zink does not
01:19zmike:learned all about that this morning
01:21airlied: zmike: time to jump on the wsi train then :-P
01:21jenatali: Needs some Copper
01:22zmike: well once compute stuff lands then I can look at merging more of the barrier stuff needed for wsi
01:22zmike: but it seems that's still a long ways off
01:56bl4ckb0ne: there's a lot of false positive CI failure on waffle, mostly XOpenDisplay failing
01:56bl4ckb0ne: do you have and issue for that xexaxo1 ?
04:35mareko: what's the maintenance cost of having one chip tested by CI?
05:10jekstrand: mareko: Aren't you set up on our Jenkins instance?
05:20anholt: mareko: hopefully we'll have some intel in public CI soon. !8162. I also sent them hardware so we can get i965 covered.
05:41jekstrand: I'm very curious as to why they're starting with APL. Those machines are sloooowwwww
05:42jekstrand: They do test fun corners, though.
05:43jekstrand: I guess they can be pretty cheap but I don't think an i3 costs that much more.
05:47mareko: jekstrand: yes, but I mean Gitlab
05:48mareko: what's the cost in terms of time/effort that has to be dedicated to the Gitlab CI if I have the hw?
05:54anholt: jekstrand: the point is coverage of chromebooks.
05:54anholt: which have fun corners as noted.
05:55anholt: mareko: gitlab ci maintenance is all in initial setup of your farm, and then work in de-flaking when you extend your test set. in between, it's really quiet. the up front setup sucks, though.
05:56mareko: anholt: do you have GPU reset and/or reboot when things hang?
05:56anholt: yes, we reboot on every job, which is part of why up front setup sucks so much.
05:56anholt: but not rebooting on every job means lots of ongoing maintenance.
05:58mareko: anholt: do you soft reboot or do you disconnect the power for some time?
05:58anholt: I've got a bunch of docs of my system at https://docs.mesa3d.org/ci/bare-metal.html,
05:59anholt: the db410c and db820cs are physically powered off with relays. the chezas are sort-of-hard powered off by talking to the EC over servo.
06:01anholt: the two cheza failures in the time that baremetal cheza's been around (since May) have been stable "this board doesn't power on at all any more", the EC's been great otherwise. (which makes sense, this is the same power control that Chrome uses in their pre- and post-merge CI on more boards and more commits than we do)
08:41MrCooper: mareko: might want to get in touch with tomeu about how they're handling the machines for the radeonsi-stoney-* jobs
09:11pepp: MrCooper, mareko : I believe the stoney ci machines are setup using lava
09:12cwabbott: jekstrand: so, I was chatting with Venemo about this shader-merging-in-nir thing, and one annoyingly sticky point is what to do when the different stages have different float-control modes, because apparently that's a thing you can do
09:17Kayden: so...trying to optimize...there would be parts of the shader that used one mode, and parts using another
09:18Kayden: that isn't a thing in SPIR-V, is it? different float controls per entrypoint or something?
09:18cwabbott: the two options we could think of, in increasing order of effort, are (1) add a nir intrinsic to change mode, and just short-circuit a bunch of optimizations like cse as soon as it's inserted via a shader_info flag (2) add a float control mode thingy to nir_cf_node and nir_block (automatically inherited from its parent, so most places wouldn't have to do anything) and then hashing in that in cse, and rejecting
09:18cwabbott: opt_algebraic if it crosses modes
09:18cwabbott: Kayden: no, but different entrypoints per shader stage are a thing
09:19cwabbott: afaict there's no way to say all stages must use the same mode
09:19cwabbott: that would've been great, but no
09:20cwabbott: and this is about shader merging, where certain combinations like VS + TCS or VS + GS get merged into a single mega-shader
09:20cwabbott: right now, it's just done in LLVM/ACO and ACO has apparently gone with (2) already
09:22Kayden: that's kind of nice in that you can do local optimizations assuming that it doesn't change, but it's nicely defined
09:28dschuermann: it even does CSE if the former float control mode is a superset of the latter
09:28dschuermann: I doubt it ever happens, though
09:29cwabbott: the thing is, having different incompatible rounding modes/denorm modes is probably gonna be really really uncommon
09:31cwabbott: but at the same time, having us just bail on everything or having an assert(!shader_info->different_rounding_modes) at the top of almost every optimization is ugly too
09:31xexaxo1: bl4ckb0ne: they used to be 1/10 but have become more regular now. no issue yet opened yet...
09:32xexaxo1: bl4ckb0ne: longer timeout might help but it's just a clutch :-\
09:32Venemo: I think this is such a rare edge case that it's perfectly okay to just say that it's illegal to use any nir_opt_* which cares about the float controls, when different modes are used in the shaders
09:33dschuermann: cwabbott: one version would be to set a noopt flag on these shaders
09:33cwabbott: dschuermann: I think that's just (1)
09:34tomeu: mareko: yeah, happy to answer any questions regarding our LAVA-based setup
14:01zmike: anholt: are you now okay with that MR I foolished on?
14:59Venemo: anholt: what's up with that TTN UBO bug?
15:09jekstrand: cwabbott: :-/
15:09jekstrand: cwabbott: You could also not merge in that case but maybe that's horrible?
15:10cwabbott: jekstrand: you can't not merge - it's just how the hardware works
15:10cwabbott: there literally aren't separate stages
15:16jekstrand: cwabbott: The way we handle this in our back-end is a set of "change the mode" intrinsics
15:17jekstrand: so roughly 1
15:17Venemo: jekstrand: there is no way around it. we currently "merge" in the backend, but we think there is benefit in doing it in NIR instead
15:17jekstrand: I really don't like the idea of ALU having dependence on intrinsics. That breaks a *lot* of assumptions.
15:18cwabbott: jekstrand: I think the idea would be to just turn off most optimizations if that intrinsic is ever inserted
15:18Venemo: jekstrand: how does your "change the mode" intrinsic work?
15:18cwabbott: if we want to have any chance of optimizing anything, we'd want it part of the block instead
15:18jekstrand: cwabbott: Yeah, but it's hard to know what opts to turn off. :-(
15:18Venemo: jekstrand: we'd be ok to turn off every opt in this case, I think.
15:19cwabbott: jekstrand: you just emit the "change the mode" thing at the beginning of the shader in the backend, right?
15:21cwabbott: or at least you emit in such a way that things are consistent
15:22jekstrand: cwabbott: We have to use the "change the mode" thing for every time a specific mode is used, even on conversions.
15:22jekstrand: So there's a global mode and then there are certain conversion ops with a specific mode.
15:22jekstrand: We implement said conversions by "change the mode; convert; change back if needed"
15:22Venemo: jekstrand: is this a NIR intrinsic or is it part of the backend?
15:23cwabbott: jekstrand: ok... on AMD conversions have their own instructions iirc
15:23cwabbott: so we normally only have one global mode
15:23cwabbott: except for this awkward merged-shader-with-different-modes thing where we have to change the mode
15:24cwabbott: and as Venemo said, we currently merge the shaders in the backend but are considering doing it in NIR
15:25Venemo: AFAICT this feature is very rarely used, so this is probably something that never happens, but it's in the spec, so it's gotta work. but we're okay having it less optimal than the sane case
15:29jekstrand: Venemo: It's a back-end intrinsic
15:29Venemo: I see. so it's less of a headache for you
15:29jekstrand: and our back-end is sufficiently dumb that handling a modal thing is easy.
15:34jekstrand: I really don't like the "change modes" thing in NIR unless we're really careful with validating that it can only be in certain very limited places.
15:35Venemo: jekstrand: how do you feel about the alternative, such as specifying the mode in nir_block, or nir_cf_node?
15:35jekstrand: I'm not sure I like that either. I'm tryin to decide which I like less. :-P
15:37Venemo: you can also suggest other alternatives
15:38jekstrand: I don't have any off-hand
15:39jekstrand: Well, I had one thought but it's worse. :-P
15:39Venemo: what was it?
15:48jekstrand: Adding a new CF type that's basically just a container and require it to go on that.
15:48jekstrand: But adding a new CF type is painful
15:57jenatali: You could make it a function property, and prevent inlining between functions with different properties until it gets to the backend?
15:57jekstrand: I doubt their back-end supports functions.
15:57jekstrand: My container CF node idea is basically that but without using "real" functions.
15:57jekstrand: I still don't know that it's a good idea, though.
16:05cwabbott: yeah, we don't support functions at all
16:06cwabbott: I think putting it in a container CF node would give you more-or-less the same level of pain as putting the mode in each block/cf node
16:23jekstrand: cwabbott: Possibly
16:25jekstrand: cwabbott: On a different note, since you hack on turnip a lot, could you take a look at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676
16:25cwabbott: jekstrand: full disclosure, I have approximately 0 idea how Vulkan dispatch works
16:26cwabbott: I've had the good fortune to not have needed to look at it :)
16:32jekstrand: cwabbott: This is why you shouldn't be carrying your own copy of ANV's dispatch code. :-P
16:33jekstrand: cwabbott: If you're not the right person, who is?
16:36cwabbott: jekstrand: my git blame says tu_entrypoints_gen.py was last touched by bnieuwenhuizen at the beginning of time (i.e. when it was cargo culted from radv, probably)
16:36Venemo: jekstrand: so, did we convince you of either solutions?
16:37cwabbott: I'm not sure if anyone currently working on turnip actually knows how that stuff works
16:37jekstrand: cwabbott: Well, it should be pretty easy to look at the ANV patches and write similar ones.
16:37jekstrand: cwabbott: Or I can code it blind and you can test. :P
16:38cwabbott: jekstrand: one annoying thing is, we haven't implemented KHR_copy_commands2
16:38jekstrand: cwabbott: That's not a requirement
16:38cwabbott: can you opt-in to vk_common_* functions?
16:39cwabbott: if not then it would be a problem since iirc you deleted the non-KHR_copy_commands2 copy/blit entrypoints
16:41jekstrand: cwabbott: Yeah, all the common stuff is opt-in
16:42jekstrand: cwabbott: You opt in by just deleting your entrypoing
16:42cwabbott: ah, ok
16:42jekstrand: cwabbott: It always prefers driver entrypoints.
16:53jekstrand: cwabbott: What it means is that when you go to do VK_KHR_copy_commands2, you can just delete the old ones instead of trying to keep both working.
17:04idr: Sort of like a layered implementation without the layer.
17:59jekstrand: I'm getting ir3 CI failures on a branch that only touches intel
17:59jekstrand: robclark: ^^
18:00jekstrand: Maybe it's ok? It's just RA fails.
18:00robclark: there are still a couple shader-db shaders that RA fail.. that is normal, shouldn't cause CI job to fail
18:01jekstrand: I've not watched a CI job run in a while, I guess.
18:02jekstrand: I guess we do lots of shader-db runs
18:04jekstrand: Looks like my CI failure was a flake
18:05anholt: also apologies to anyone that was just running a job on freedrenos. I had to restart gitlab-runner to reconfigure for the new http cache.
18:07jekstrand: anholt: Looks like maybe some unit tests timed out
18:07jekstrand: anholt: Our assembler tests seem to be really close to the timeout
18:09jekstrand: I don't know why. The Haswell assembler tests on my machine run in 0.24s
18:09jekstrand: and 30s in CI
18:10jekstrand: I bet it's slow disk or something
18:19Venemo: cwabbott: speaking of the shader merging, what would be the most painless way to pass some I/O through temporaries instead of shared memory? What I have in mind is create a variable for them and store to that, then load from that, and at the end call nir_lower_vars_to_ssa - does this make sense or is there a better way?
19:06anholt: jekstrand: sorry, back now. so, we just banned the glcpp tests in CI because of intermittent timeouts in them. strong suspicion now that those tests were innocent and we've just moved the intermittent fails to another set of tests
19:16jekstrand: anholt: :-/
19:16jekstrand: anholt: FWIW, the glcpp tests also use lots of disk
19:17jekstrand: Well, open lots of files anyway
19:19airlied: must be a big latency in there somewhere
19:19imirkin: at least explains why no one could repro the issues locally
19:20anholt: jekstrand: presumably i965_asm doesn't use lots of disk, though
19:20anholt: or fork a ton of subprocesses
19:20jekstrand: anholt: It's a few dozen files
19:20anholt: (which were notable features of glcpp)
19:20jekstrand: I don't know if it forks
19:21anholt: it's just way too suspicious to me that we banned glcpp and we immediately got similar fails in a brand new set of tests.
19:23jekstrand: cwabbott: I'm blind coding you a turnip change. :-)
19:23anholt: the unit tests are cheap, perhaps we should disable parallel on them for a while and see if we get some clues.
19:34mdnavare: hwentlan: Could you check with Nicholas on what game application or any test application did he use for a full stack VRR validation with Gnome Desktop?
19:38hwentlan: madnavare: he wrote his own. that said, i've used this tool before which seems to work well: https://github.com/Nixola/VRRTest
19:39hwentlan: i have not checked which games work well with VRR on Linux. any game should be fine. we've used unigine heaven and superposition for testing as well
19:41Venemo: anholt: ping again
19:50anholt: Venemo: dumped some thoughts in the bug
20:03Venemo: anholt: thx. do you know what causes this? also, if TTN can't know the correct range I'd be okay if it just set 0, ~0
20:06anholt: Venemo: it can know the correct range, you've got it on the decl
20:06anholt: but why are we emitting a load_ubo on const file 0? that looks like the "make a load_uniform" path
20:10Venemo: AFAIK the same function can emit both
20:11Venemo: AFAIU if it's an array it emits a load_ubo otherwise a load_uniform
20:11anholt: given that everyone (basically) is going to lower to ubos anyway after receiving some nir, all that matters is that the decl code and the load code make the same decision as to which path ttn does.
20:15Venemo: As far as I can tell, they do
20:15Venemo: Don't they?
20:15anholt: decl_var uniform INTERP_MODE_NONE vec4 uniform_0 (0, 0, 0) <-- decl took the make a uniform path
20:16Venemo: oh, sigh
20:16anholt: ok, but i'm confused how TTN didn't fail validation
20:16anholt: when it first made it
20:17anholt: this is why I suggested NIR_PRINT
20:17Venemo: It also declares a UBO
20:18Venemo: at least there is one in the attached file
20:18Venemo: So I assume that is what it's trying to access
20:18anholt: ah. the nir validation that failed is after lower_uniforms_to_ubo. so, presumably, we actually did the load_uniforms path and it's only after lowering to ubo that things broke
20:18anholt: and hey look, in the load_uniform path, it uses c->ubo_sizes to set the range.
20:19anholt: so the load_uniform range has been busted the whole time, and we just started noticing because I added validation for range_base.
20:19Venemo: I don't think so
20:19Venemo: ttn_emit_declaration is what sets the ubo size
20:19anholt: yes. and it skips it in the load_uniforms path.
20:20Venemo: what do you mean?
20:21anholt: ttn is correctly doing decl uniforms and load uniforms. but the problem is (and this would have been obvious from the get-go with NIR_PRINT=1) the load_uniform range is bogus because we don't set up ubo_sizes for the uniforms path.
20:21Venemo: Ah, right
20:21anholt: once we lower to ubos, the bogus range becomes a bogus range_base, and then it gets validated, and boom.
20:22Venemo: Okay, I think I see what the problem is
20:22Venemo: It mixes up uniforms with the 1st ubo
20:24anholt: so probably just use num_uniforms * 16 instead of ubo_sizes
21:09Venemo: OK, I can try that
21:27mdnavare: hwentlan: Cool, has he open sourced the application he wrote is it on github? Yes we are currently using VRRTest as well, do you happen to rem if the Vsync on that should does that refer to Freesync and should be enabled?
21:31mdnavare: hwentlan: So on unigine , you can enable a scene with variable refresh rates?
21:33hwentlan: mdnavare: no he hasn't open sourced it.
21:34hwentlan: AMD implementation (in DDX and Mesa) will automatically set vrr_enabled if the content is a full screen graphics buffer
21:35mdnavare: hwentlan: Yes thats how it is for Intel implementation as well in DDX and Mesa
21:36mdnavare: hwentlan: I meant on the Nixola VRRTest application, there is a Vsync toggle switch, so for testing VRR on the AMD or Intel stack , that Vsync switch should be set to true or false?
21:37mdnavare: hwentlan: I mean if its false, does it supercede the VRR enable prop and set that to false?
21:42airlied: jekstrand: I'd probably suggest bnieuwenhuizen also as the best reviewer for the joint scripts :-P
21:44bnieuwenhuizen: yeah, it is on my list
21:45hwentlan: Vsync and vrr are not mutually exclusive. If vsync is false and the flip comes during active it will take effect immediately, causing tearing (by design)
21:46hwentlan: If vsync is true we will never flip mid-frame
21:46hwentlan: I suggest testing with vsync=true
21:46jekstrand: airlied: I've converted all but RADV at this point
21:47jekstrand: airlied: Need to run CI to make sure they didn't explode, though. :)
21:47airlied: jekstrand: well I did radv in a branch
21:47airlied: feel free to pull it in
21:47jekstrand: airlied: Yeah, and I can pull that in if you want.
21:47jekstrand: Or it can get rebased on top
21:47jekstrand: I've churned a bit so it'll need a rebase
21:47jekstrand: Not too bad though.
21:47jekstrand: And I'm still churning so don't rebase just yet
21:49jekstrand: It's got even more magic. :)
21:50airlied: yay magic
21:52jekstrand: So far, every driver except ANV and RADV has gotten GetInstanceProcAddr wrong. :-(
21:52bnieuwenhuizen: wait what
21:52bnieuwenhuizen: I thought we copied the freedreno one from RADV
21:52jekstrand: freedreno may have been ok but I think it bitrotted
21:52bnieuwenhuizen: or did we just happen to fix GetInstanceProcAddr in ANV/RADV?
21:52jekstrand: In any case, they're all going to be correct after this. :-)
21:53vsyrjala: hwentlan: if we want to allow async + vrr i guess someone should define whether an async flip causes the vblank to terminate early or not
21:55jekstrand:pulls in the RADV patches because WTH not
21:55mdnavare: hwentlan: So vsync that they refer to in the Vrrtest app is Asyn basically?
21:56mdnavare: hwentlan: Well I mean setting Vsync = false will allow Async flips, so if we set Vsync = true then it will be synchronous flips?
21:56mdnavare: vsyrjala: so in our testing we want Vsync = true right so its sync flips with VRR
22:06jekstrand: airlied: Did you forget to add a file in that branch?
22:08jekstrand: airlied: Never mind, I'm fixing it.
22:14jekstrand: bnieuwenhuizen: Since when did RADV start building on Windows?
22:15bnieuwenhuizen: jekstrand: trying to get https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6162 in by bits and pieces
22:15jekstrand: bnieuwenhuizen: Fun
22:15emersion:still doesn't understand the point of this
22:18bnieuwenhuizen: shader compilation maybe? maybe onset to something more private for consoles? who knows?
22:21jekstrand: airlied: Ok, I've got your radv patches on the branch
22:32jenatali: Oh wow, that MR is pretty small now, cool
23:03bnieuwenhuizen: jekstrand: finished churning the dispatch MR?
23:07jekstrand: bnieuwenhuizen: Just about
23:07jekstrand: bnieuwenhuizen: Two patches away
23:13agd5f: if there a replacement for fbcon which would allow us to runtime pm the GPU? Right now, or pm ref count never drops to 0 because the fbdev is open for the console
23:15airlied: agd5f: does it not drop when the screen blanks, or does the crtc not get turned off?
23:15agd5f: crtc gets turned off, but I think fbdev can access fb whenever it wants
23:15agd5f: so we need to keep it up for that
23:15airlied: the fb should get evicted
23:16agd5f: I think fbdev provides a physical addr
23:16airlied: though if there are userspace mmaps that is trouble some but there shouldn't be any of those
23:16airlied: as in it should be possible to know if userspace has ever mmaped the fbdev
23:20agd5f: doesn't fb_info->fix.mem_start have to available all the time?
23:25airlied: agd5f: only used if userspace does an mmap
23:25airlied: I don't think fbcon reolies on it
23:28airlied: like I expect there is some work to deal with userspace doing an mmap, but it shoudlnt' be too hard to get it evictable and wake things up if fbdev wants in