00:33mareko: tarceri: no, but I think adding some number to GL_ACTIVE_UNIFORMS might be enough
00:52mareko: tarceri: I'm busy with other stuff
01:53imirkin: airlied: you could expose GL 4.6 without exposing the ext. i.e. put the no-op functionality in, and just don't show the string.
01:53imirkin: (for aniso)
02:17airlied: imirkin: nah cts tests fr it
02:18airlied: would rather expose GL versions that aren't at least trying to pass CTS :-P
02:21imirkin: airlied: what does it test for?
02:27airlied: imirkin: not really sure, I think it counts the number of color transitions
02:28Rush: imirkin: no expect by any means but exposing no-op extensions doesn't let software deal with the lack of extension ...
02:28Rush: *I meant "no expert by any means"
02:29imirkin: Rush: nothign for software to do
02:29imirkin: it just improves image quality
02:30Rush: imirkin: yes and I'm building digital painting software so quality is important :-P
02:32imirkin: right, but if the driver doesn't do it, then it doesn't do it
02:32imirkin: nothing software can do about it
02:32imirkin: there's no "workaround"
02:32imirkin: (other than to not use 3d accel at all)
02:33imirkin: and what aniso does isn't even defined anywhere
02:35airlied: yeah hence why implementing is messy, and I would assume a sw impl would be horrible performing
02:40airlied: maybe gl4.6 will lure me in at some point, gotta get vulkan 1.0 done first
02:40airlied: them lines won't draw themselves correctly
02:41Rush: airlied: so I think the issue with "leaks" I had was a single long-running opengl context that was never explicitly destroyed. Will make sure to have short-lived contexts to have the texture memory cleaned up on a regular basis. Now valgrind only sees stuff such as "_mesa_symbol_table_add_symbol (symbol_table.c:197)" which I assume is some one-time allocation for the shader parser or smth like that
02:42airlied: Rush: even a long lived context should kill textures memory
02:42airlied: unless the app still has a handle to it
02:44Rush: airlied: yeah .. the app is Node.JS ... and WebGL ("headless-gl") ... maybe it is an application / library issue and the explciit destroy simple forces things to wind down.
02:47airlied: Rush: yeah killing the context will killall resources associated with it
02:48airlied: yeah sounds like it might be a bug in those leves, around not deleteing texture resources
04:21ccr: I assume post-suspend/resume cycle graphical weirdness is a kernel issue and should be reported there? (ancient NVidia GeForce Go chip + Nouveau in this case)
06:09hanetzer: figure this may be a good place to ask, but is there any kind of public bugtracker for amd gpus? (not specifically in linux)
06:11airlied: hanetzer: no non-linux noe
06:11hanetzer: blep. well that sucks.
06:11airlied: not sure I've seen any hw vendor ever have a public bugtracker
06:11airlied: it would be pretty unmanageable
06:11hanetzer: yeah. but then again, intel and amd are the only major gpu vendors playing ball with linux :)
06:12hanetzer: so one would hope.
06:12airlied: for linux you can file an issue in gitlab
06:12hanetzer: yeah, already did for an issue that's relatively linux specific, but this is purely a hw issue :)
06:14airlied: hanetzer: is that he overheating thing?
06:14airlied: if you thought it used to work, just boot an older distro from a livecd
06:14airlied: or install an older kernel
06:14airlied: and see, if you can't find a old stable point, it's likely the hw is actually broken
06:15hanetzer: no; pre-os (post/bios/boot menu/etc) I have no output over dp, so its purely hardware & firmware
06:15airlied: did it used to work pre-os?
06:16hanetzer: no, not with this gpu at least.
06:16hanetzer: current gpu is asrock rx 5700 xt taichi, prior was a rx480 and rx580
06:38ccr: hanetzer, perhaps a silly question, but have you checked that your CMOS battery is good?
06:39hanetzer: its keeping settings (again, I'm able to do stuff over hdmi) and its fairly close to brand new
06:39ccr: okay. just asked because I'm on Intel, but my ASUS uefi bios has a funny "feature" that if the CMOS battery is low/bad, it defaults to VGA output and does not initialize HDMI at all .. so only when OS initializes gfx, pictures appears on HDMI output.
06:40hanetzer: I could replace it, just on the off chance
06:40ccr: probably a long shot
06:41airlied: hanetzer: the bios not working might be related though
06:41airlied: if it works over hdmi
06:42airlied: " It didn't happen until several weeks ago, " pretty much means try a kernel that used to work from a serveral weeks ago
06:42airlied: or an old live iso
06:42airlied: and see if it still works
06:43hanetzer: no, "It didn't happen until I put in this gpu". Putting in the old one, works fine.
06:43airlied: ah okay that isn't clear from the issue
06:44hanetzer: could have sworn I said that. Sometimes I mix up whats in my mind and what I type/say :)
08:01tjaalton: looking at mesa CI it seems that i965/iris/anv isn't tested at all?
08:13glennk: hanetzer, what motherboard is it?
08:15hanetzer: uh, Asus ROG CROSSHAIR VII HERO (WI-FI)
08:15hanetzer: god I hate 'gaming motherboard' names
08:17hanetzer: also modern phone names suck too. 'OnePlus 7t Pro 5g McLaren' wtf is all this.
08:19glennk: heh, have the same one with a non-xt 5700
08:19hanetzer: what specific gpu do you have?
08:20glennk: a gigabyte reference one, not sure what the model name is
08:20hanetzer: would you mind dumping its vbios? :P
08:22hanetzer: https://andrealmeid.com/post/2020-05-01-vbios2/ << if ya don't. Interested in picking it apart and seeing what happens. I assume dp-out works for you pre-os?
08:26glennk: https://www.techpowerup.com/vgabios/215273/gigabyte-rx5700-8192-190616 should be equivalent to the one i have
08:26glennk: what mb bios version do you have?
08:27hanetzer: true, but its also possible those guys took some other technique. latest from asrock
08:27hanetzer: erm. asus
08:27hanetzer: ROG-CROSSHAIR-VII-HERO-WIFI-ASUS-3103.CAP << this file
08:28glennk: i haven't tried that version, have something a bit older where pcie 4 is still enabled on it
08:28hanetzer: hmm... that may have something to do with it...
08:28hanetzer: what cpu is in it?
08:28mceier: hanetzer: did you try changing dp standard in the monitor (if you can) to e.g. DP 1.1 ? I have 5700 XT and setting monitor to use DP 1.2↑ causes the same issue during boot (monitor doesn't turn on)
08:29hanetzer: mceier: not an option. have custom monitors, zisworks 4k monitors
08:33hanetzer: hmm. according to the bios 'changelog' it only disables gen4 on ryzen 3xxx
08:36glennk: have you tried disabling fast boot and/or clearing cmos?
08:36hanetzer: fastboot is off, cmos clear no change.
08:39hanetzer: I'ma try a downgrade to before disabling of pcie4
08:43glennk: if you have a pre-3xxx cpu pcie4 isn't a thing
08:46withnomad: now let's talk about cuda and shader and constant root signatures , the last is the first real attempt. But cuda would duplicate the entries in the queues for array based copies of the same instructions. It can accelerate by default small snippets of straightline code. signatures are bit more interesting, as they can preload different instructions into queues and consume them even with using a single iteration (do not need multiple copies to be large)
08:46withnomad: Now the implementation of it is unknown can be multiple versions, but they do not allow branches inside of such code. Which is what I am fixing up.
08:55withnomad: it is fairly understood, that such behavior of the code comes when dispatcher is down and nops are forwarded to the circuit hence, and this really is seeming to be the case when RST signal after halting is captured and made hence unfunctional of flushing the internal queues.
09:00withnomad: i think coherency domains are in core of vulkan allready, and inherent thing of SIMD archs. So texture cache can be invalidated to the instruction cache to make self-modifying code to work right! Second option is to use no page tables and write combined memory.
09:03withnomad: the more complex version needing register allocation hacks requires round about 100words in i-cache to be reused and works without self-modifying code.
09:48daniels: tjaalton: Intel's CI system is completely separate from the upstream one
10:25AndrewR: airlied, hm, I compiled git mesa (git-636f770233) but can't see how LP_CL=1 works (clinfo still sees no platforms) . I tried to set MESA_LOADER_DRIVER_OVERRIDE=swrast GALLIUM_DRIVER=llvmpipe LP_CL=1 before clinfo - "No devices found in platform"
10:31MrCooper: AndrewR: did you build Mesa with gallium-opencl & opencl-spirv enabled?
10:32AndrewR: MrCooper, meson ../ --prefix=/usr/X11R7 --strip --buildtype debugoptimized -Degl=true -Ddri-drivers=r100,r200,i965,nouveau -Dplatforms=drm,x11 -Dgallium-drivers=i915,r600,radeonsi,swrast,virgl,nouveau,r300,iris -Dvulkan-drivers=amd,intel -Dgallium-nine=true -Dgallium-opencl=icd -Dgallium-va=true -Dgallium-xvmc=true -Dgallium-xa=false -Dopencl-spirv=true - I think yes ?
10:34withnomad: Actually ancient times there was right away such feature as display lists. This enforced the coherency protocol right away. Compatibility profile needs to enable profiles down to ogl 1.0. I would choose the self-modifying code and hence the codepath which packs the instruction stream too.
11:03airlied: AndrewR: probably missing the libclc spirv library
11:04airlied: since you have to build libclc from git
11:06AndrewR: airlied, yeah ...still have old libclc (will try to rebuild it ...but may be not right now). Thanks!
11:44karolherbst: airlied: we will get the required one with llvm-12, no?
11:57AndrewR: airlied, strange SPIRV-LLVM-Translator-llvm_release_100 (for llvm 10) does not install llvm-spirv ..manually copied it to /usr/bin, now libclc found it and trying to build itself ...
12:19AndrewR: airlied, it works: Device Name llvmpipe (LLVM 10.0.0, 256 bits) \o/
12:31AndrewR: airlied, luxcoreui (2.4alpha from may 2020) seems to work, too :}
12:52Akien: Hi there. Is there a dedicated channel to talk about zink, or would it be here?
12:59withnomad: Last time i looked at zink (though i do not know too much about vulkan) it lacked so many features, that it was not very appealing. As getting vulkan enabled cpu rasterizer you might be better off with gfx-rs and the gfx-compability layer and run it off ontop of llvmpipe maybe.
13:00karolherbst: withnomad: what you say makes no sense
13:01withnomad: karolherbst: no sense can be made for a guy who can not think.
13:01withnomad: gfx-rs has ICD compability and opengl backend which could probably run ontop of llvmpipe
13:02karolherbst: besides the point, and please stop insulting people
13:08withnomad: https://www.d.umn.edu/~gshute/logic/alu.xhtml all the gates in ieee1164 make use of multiplexers, which reset the state of alus even on bubbles, if one wants to get closer look into how multiplexers work, there is a patent. https://patents.google.com/patent/US6675182
13:13withnomad: if you would not rotate the stuff in mux and wraparound accordingly, it would damage the logic of course, cause it would according kichhoff law would drive possibly too big currents to the logic.
13:15daniels: Akien: yep, #zink is it :)
13:15kisak: hello withnomad, looking at the backscroll and stepping back for a moment, what are you trying to achieve at the moment? It appears you're expressing ideas for the sake of doing so, but there's no context or issue you're trying work out in doing so.
13:16withnomad: you might be thinking that perhaps i was wrong, since there is a parallel mux, but parallel case mux does not mean it is parallel.
13:16withnomad: begin is parallel context, but if you embed different cases in different begin blocks it won't be a mux anymore
13:16withnomad: all the items of case inside the begin are executed sequentially still.
13:17withnomad: hence every bubble in the circuit will take as much energy as real work.
13:18withnomad: it is analogoous to mechanical parts even like breaking inside the electrical car, to use breaks in the car, it requires energy, and this can be directed to the capacitors like supercapacitors developed for that purpose.
13:18karolherbst: kisak: that person is well known, I just wasn't aware who it was...
13:27linkmauve: Hi Intel people, what could be the cause of this kind of artifact on a gen7? https://pix.mathieui.net/o/yCDqS.jpg
13:27withnomad: this is fairly long time ago, when the solution was offered, the easy version is quite simple, some lines in the driver and some things preloaded to cache and it will fill the bubbles subsitutions like doing real work instead of blocking.
13:28linkmauve: The horizontal lines blink, and the “orientation” changes based on resolution.
13:29withnomad: because display lists are in fact cached, from times of ogl1.0 that requires cache coherent protocol to be implemented always, and a route from one cache to another, which on gpus is clockless due to texture arbiters.
13:30withnomad: so bu torning down the dispatcher , this is the first step , colliding the caches and invalidating the content is the other step, and you should have all the perf in the world.
13:30linkmauve: This isn’t my computer, nor am I close to it so debugging is a bit harder, it also didn’t happen on Ubuntu 16.04 to 19.10, and only started now on ArchLinux (Linux 5.8).
13:31withnomad: i also developed new intrinsics for long latency ops and so this gets into more wins, and also all compression, but i won't commit that work.
13:31mripard: danvet_: have you seen https://firstname.lastname@example.org/ ?
13:32mripard: I guess you'll have some comments :)
13:32linkmauve: Sorry not ArchLinux, Debian Sid (so probably same kernel).
13:33mripard: (like do we want to have it on atomic_begin _check and _flush too?)
13:34danvet_: mripard, oh where's that magic cocci script?
13:34danvet_: for one offs like this would be good to include it in the commit message
13:35danvet_: mripard, aside from that I like
13:35danvet_: well the only bikeshed I have is that I'm not sure drm_atomic_state should be a state
13:35danvet_: it's more like the commit
13:35danvet_: but that ship sailed unfortunately :-/
13:35danvet_: or it would at least be a much larger scale renaming exercise
13:37mripard: danvet_: https://pastebin.com/h0AgxWbq
13:37mripard: it's pretty long, but I'll include it in the commit log if you want
13:38danvet_: yeah I think that's useful
13:38danvet_: since we have more hooks that need the same transformation, eventually
13:39bnieuwenhuizen: danvet_: I have been thinking a bit more about getfb2, but isn't changing the number of planes (from 1 without modifiers to 2 with modifiers) returned from getfb2 considered a forbidden change because non-modifier-aware mesa can't handle multi-plane images?
13:39withnomad: i was not interested in vulkan right when the project was started as MANTLE, because this started with totally wrong ideas gloals or missions. there is no point to feed dedicated gpu memory from cpu side of the bus in the ring.
13:39withnomad: and vulkan offers maybe some clean ups, which would be ok but...
13:39withnomad: the ideas are totally wrong.
13:39danvet_: mripard, the other bikeshed: maybe we should add a boilerplate to all the hooks that now have the drm_atomic_state *state argument
13:39danvet_: about what this is good for (aka everything)
13:39danvet_: instead of just removing the existing blabla
13:39danvet_: but I guess also ok as-is
13:40danvet_: mripard, anyway, with the cocci added for reference: a-b: me
13:40mripard: awesome, thanks :)
13:44withnomad: and for reclocking , since there are no clocked logic blocks , since there can not be more than one which is in the dispatcher, otherwise elementary games would not run even simple games would not process well. so the reclocking is a frequency generator with main purpose to fuck up someones eardrums when some uses this as a wheapong or make new kettle for heating water for the tea.
13:49withnomad: to torn a dispatcher a simple halt can be used , several other options, and in opencl that requires redefining one function in the public header.
13:49withnomad: this function sends rst signal to the chip, and it allows to redefine and revere it and call the redefined one instead,
14:35Akien: daniels: thanks!
14:49MrCooper: someone who knows python should go through the "git grep 'env python$'" results and change them to say python3 or python2 instead
14:55jekstrand: dcbaker: ^^
14:56dcbaker[m]: Which project are we talking about?
15:03jenatali: jekstrand: Was there something you were waiting for my review on?
15:04jekstrand: jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6750 is going to affect you but karolherbst reviewed it and I don't think CF is really your area.
15:04jekstrand: jenatali: You also expressed interest in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871
15:05jenatali: jekstrand: I'll take a peek but yeah CF's not really my area
15:05jenatali: Ah yeah, I'll check that one out too :) thanks!
15:06karolherbst:had to deal with too many random things the past days/weeks :/
15:06pcercuei: I can trigger a bug in Linux' MM code by running "fuser -k -HUP /dev/dri/card1"
15:06pcercuei: one out of ~10 times, I get this: https://pastebin.com/raw/PXFp7k7a
15:07jekstrand: jenatali: There was a bug in the CF structurizer that I came across when we first landed it but I didn't understand the code very well and applied something of a band-aid fix. That MR has a what I believe to be a correct fix.
15:07pcercuei: I don't really know what BUG() it is, I don't see any in get_user_pages_fast_only()
15:07jekstrand: jenatali: Also, after spending about a solid week with the structurizer I think I genuinely understand the over-all structure and am reasonably convinced that it's probably correct for the vast majority of cases.
15:08jekstrand: I'm not 100% sure about irreducible cases but those are unlikely to come up in practice when we pass -O0 to clang.
15:10MrCooper: dcbaker: mesa
15:11jenatali: jekstrand: Awesome, sounds great. One more reason we need to get our code on top of master
15:25pinchartl: does anyone know what NV stands for in semi-planar formats ? is it a historical reference to nvidia, or something else ?
15:29ajax: that's... a really good question.
15:39danvet_: linusw, is your plan to invest a bit of time into fbdev emulation for android with your linaro hat on?
15:40linusw: danvet_: yeah makes sense.
15:40vsyrjala: pinchartl: i always thought the V just comes from YV12. why the N though? native?
15:40danvet_: if you do, I think it would be great to extend the fbdev testcase in igt with that stuff
15:40danvet_: so we have higher chances of things not breaking again
15:40linusw: danvet_: I'm syncing som gigs of Android ftm
15:41danvet_: lol :-/
15:41danvet_: e.g. we have vblank wait support for fbdev, so a testcase that checks the drm driver for vblank support
15:41danvet_: and then makes sure the fbdev one works too
15:42danvet_: or if we really have to add pixel format change support, some test that tries to exercise that
15:42danvet_: it can cheat after all and look at the kms side for what should be supported
15:42linusw: danvet_: once I understand the problem that should be doable ... I guess. I don't know what igt is or anything.
15:42jenatali: vsyrjala: I wonder if N is for iNterleaved
15:43danvet_: linusw, https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#testing-and-validation
15:43danvet_: this igt
15:43danvet_: there's an fbdev testcase, but it's not very smart yet
15:44danvet_: I guess for better integration it should try to open the fbdev device for the drm driver we're trying to test
15:44danvet_: at least as a default
15:52vsyrjala: jenatali: yeah, could be
15:55linusw: danvet_: is this something we plan to fold into the kernel self-tests?
15:56danvet_: linusw, thus far not
15:56danvet_: but it's serving a somewhat similar purpose
15:56danvet_: just for drm
15:57danvet_: other stuff like e.g. xfstest is also out of tree
15:57danvet_: the things we have in selftests are our in-tree unit tests
15:57danvet_: which I'm hoping will become kunit tests eventually, maybe, in some future
15:57danvet_:has too many dreams perhaps
16:01AndrewR: karolherbst, interesting, I tried to apply https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4974 and build fails with "/usr/bin/../lib/gcc/i586-slackware-linux/5.5.0/../../../../include/c++/5.5.0/ext/new_allocator.h:120:23: error: no matching constructor for initialization of 'clover::module::symbol'"
16:05karolherbst: ehh.. ehh?
16:06karolherbst: maybe some conflict
16:06karolherbst: will take a look later
16:43jekstrand: jenatali: Did you want to give any opinions on the structurizer MR? If not, I'd like to assign Marge.
16:44jenatali: jekstrand: I don't think so, I'd say go for it
16:50jekstrand: cmarcelo: Could I get you to look at the first patch in !6871. The nir/find_array_copies one?
16:50jekstrand: Or maybe cwabbott
16:50cmarcelo: jekstrand: will take a look
16:50jekstrand: cmarcelo: You would likely be interested in that whole MR
16:55jekstrand: jenatali: Should we be expecting a MR against master soon? :-)
16:55jekstrand: A very large MR
16:56jekstrand: karolherbst: We really need to think about adding an optimizer to clover. I can write up a basic opt loop if you'd like.
16:57jekstrand: karolherbst: Writing all these CL-based optimizations without having a good way to CI them is making me nervous.
16:57karolherbst: makes sense
16:57karolherbst: I also wouldn't mind if we share the basic stuff with st/mesa
16:57karolherbst: so drivers can more or less expect the same stuff
16:58jenatali: jekstrand: We're pushing through some issues trying to get conformance submissions put together for CL and GL, once those are submitted we'll hopefully have some more cycles to continue rebasing on top of master
16:58jekstrand: I guess you did add a gallium_nir thing, didn't you?
16:58karolherbst: uhm... I don't think so?
16:58karolherbst: but there is a callback drivers can implement so that gallium calls into driver specific opt loops
16:58karolherbst: we could mandate that for clover
16:59jenatali: jekstrand: I also wouldn't say "very large", only moderately large :)
16:59jekstrand: jenatali: I guess you are mostly adding files so, unless you're trying to keep history, it doesn't have to be a lot of patches.
17:00jenatali: jekstrand: Yeah, I think we're going to throw away history for the bits that go upstream probably, but kusma's been doing most of the work trying to stage that
17:00jenatali: And all of the bits that touch upstream should already be there, except for conversions and printf
17:01jekstrand: I've still got a hack for that in my branch
17:01jekstrand: My kernels want round-up/down float->int
17:01jenatali: And on top of conversions, the vload_half/vstore_half
17:02jenatali: I don't remember if we decided how we wanted to handle fp16<->fp32 conversions, since there's already opcodes with rounding modes
17:02jekstrand: I think the conversion intrinsic should handle everything. The lowering pass can then turn them into the opcodes we have.
17:02jenatali: But the CL functions require adding the round-up and round-down variants as well
17:03karolherbst: did we actually decide on how we want to handle conversions in nir longterm?
17:03jekstrand: Honestly, I think the intrinsics may be a good way to do it
17:03jenatali: karolherbst: I think we agreed that a conversion intrinsic + constant indices for mode, plus a lowering pass
17:03karolherbst: ahh, yeah
17:03karolherbst: that sounds sane
17:04karolherbst: sounds like a lot of code needs reworking but I guess...
17:04karolherbst: maybe it helps to know what hw actually supports?
17:04karolherbst: uhm.. supports what
17:04jekstrand: If we really care about optimizing them or constant-folding, we can break the few common ones out into ALU ops.
17:04karolherbst: I could probably come up with a list of conversions nv hw supports directly
17:04jenatali: I thought we said that we'll just always lower the intrinsics when the source is a constant so we can fold it?
17:04jekstrand: I expect the lowering pass will take a callback for "can you handle this one?"
17:04jekstrand: jenatali: Oh, right. Yeah, that works.
17:05karolherbst: mhhh, how do we add the rounding mode in there then?
17:05jenatali: karolherbst: As a constant index?
17:05karolherbst: sure, but for the constant folding
17:06karolherbst: if we convert it to alu ops, then we would need all types anyway
17:06jenatali: We'll lower it into a series of alu ops to do the rounding
17:06jenatali: Then all of those will fold away
17:06karolherbst: I guess
17:06karolherbst: and then drivers can opt in for: _always_ give me the convert intrinsic instead of the alu ops
17:06jenatali: That way we don't have to implement it in both nir_builder and C, and we don't have to explode our opcodes
17:07jenatali: We've already got nir_builder implementations of rounding, it's just baked into vtn right now, so it'll just be a matter of refactoring that into a lowering pass
17:07karolherbst: I am just wondering if we have enough conversion opts that it's worth caring about it?
17:08karolherbst: like would we even need to keep the alu ops at all
17:08karolherbst: or could one opt pass on the intrinsic handle everything already
17:08karolherbst: heh.. we have u2fmp now as well.. mhh
17:08jekstrand: jenatali: I'd be up for writing the code to add the intrinsic, pull stuff out, and make the pass. I just don't have a good way to test it thoroughly.
17:09jenatali: jekstrand: CL CTS test_conversions ?
17:09karolherbst: doesn't test constant folding
17:09jekstrand: jenatali: Yea, I guess.
17:09jenatali: Oh sure
17:09jekstrand: karolherbst: I'm not worried about constant folding if the solution is to lower to ALU first
17:09jenatali: jekstrand: test_conversions is *very* thorough
17:09karolherbst: jekstrand: I am wondering if we even need the alu ops
17:09karolherbst: we could just.. get rid of them entirely
17:10jekstrand: karolherbst: Which ALU ops are you suggesting we get rid of?
17:10karolherbst: conversion ones
17:10jekstrand: jenatali: Is it all currently living in your branch? If so, which branch?
17:10jekstrand: karolherbst: I think we want to keep the basics.
17:10jekstrand: karolherbst: We want nir_opt_algebraic to be able to work on them
17:11jekstrand: And extending that to work on intrinsics is probably possible but a massive pile of design work
17:11karolherbst: why though? I doubt it would be a lot of work to port the opts over to work on the convert intrinsic
17:11karolherbst: it's a lot of work
17:11jenatali: jekstrand: Yeah, you can find it in https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/compiler/spirv/vtn_alu.c#L695
17:11karolherbst: not arguing that
17:12jenatali: jekstrand: It's split across a series of MRs of original implementation, bugfixes, adding new fp16 opcodes, etc, but I can track them all down if you'd rather go based on commits rather than tip-of-tree code
17:12jekstrand: jenatali: tip-of-tree is fine. My intention is to pull the stuff out into a builder helper and then work on top of master from there.
17:13karolherbst: AndrewR: compiles fine here
17:13jenatali: jekstrand: Cool, sounds great. Let me know if you need help
17:14karolherbst: ohh.. maybe it's a 32 bit thing?
17:22jekstrand: jenatali: Any reason why it's using both a destination type and a destination bit size rather than a sized type?
17:23jenatali: jekstrand: Probably not a good one
17:23jekstrand: Also, why is it looping over channels rather than using vector ops?
17:24jenatali: jekstrand: https://gitlab.freedesktop.org/kusma/mesa/-/issues/43
17:25jenatali: Please feel free to improve the code as you see fit :)
17:30jekstrand: Oh, this is daniels code. :)
17:30jenatali: Yeah, most of it
17:31jenatali: I added int -> float rounding, and a couple tweaks/bugfixes
17:32daniels: the NIR documentation wasn't very helpful :P
17:33jekstrand: daniels: If you can tell me where the documentation is, I can fix it. :P
17:41jekstrand: A long time ago, I wrote something of a blog post about NIR providing some high-level design information. I should do that again only put it in the mesa tree and make it up-to-date.
17:41anholt: jekstrand: being able to put docs on docs.mesa3d.org now is pretty cool
17:41jekstrand: anholt: Yeah...
17:42anholt: .rst does not delight me, but anything is better than handwritten html I never wanted to touch or awful wikis.
17:42jenatali: jekstrand: Yeah that blog post was my only primer for NIR before diving into it, and it really isn't sufficient :(
17:43jekstrand: anholt: Yeah, rst isn't amazing. I tend to prefer markdown. But it's not bad and sphinx is pretty well made, IMO.
17:44karolherbst: sphinx can accept markdown files though, but I assume it doesn't give you all the features of rst then :p
17:51karolherbst: AndrewR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4974/diffs?commit_id=45c7dba65de658fa5217e1f132e9218ba22a76c6#7604d7c288c0a605c9143852901c247db21d5f92_112_131
17:59lrusak: is triple buffer simple swapping in a queue of buffers rather than just swapping back and forth between two buffers?
18:01lrusak: commit in question -> https://github.com/lrusak/xbmc/commit/e38e97a20940a9c898f99dc293ce8a11a48f2aea
18:01bnieuwenhuizen: lrusak: it is not a queue. frames can be skipped for display if rendering is faster than the monitor
18:09bl4ckb0ne: does the panfrost driver has vulkan support for mali t860?
18:10bnieuwenhuizen: bl4ckb0ne: no
18:10bnieuwenhuizen: no vulkan support yet for panfrost
18:10cwabbott: jekstrand: don't forget about https://people.freedesktop.org/~cwabbott0/nir-docs/
18:10bl4ckb0ne: bnieuwenhuizen: thanks!
18:10cwabbott: definitely horribly out-of-date by now though
18:11jekstrand: cwabbott: I think I have a slightly more up-to-date version of that in my tree somewhere
18:12jekstrand: jenatali: Some of these helpers confuse me. I'm not 100% sure they're correct....
18:12jenatali: jekstrand: They do pass the CTS - what's your concerns?
18:13jekstrand: jenatali: It could be cases that never come up
18:13jenatali: jekstrand: Possible, since we don't support double or half
18:13jekstrand: jenatali: But, for instance, it does a nir_imm_floatN_t(b, FLT_MAX, src->bit_size). What is that going to do if FLT_MAX can't fit?
18:13jenatali: Everything else should come up though
18:14jekstrand: In any case, we can handle this all in review.
18:14jenatali: Hm... I think CL doesn't generate those conversion ops for halves
18:14jekstrand: I'm going to post it more-or-less as-is
18:14jenatali: Sounds good
18:14jekstrand: With a few cosmetic modifications
18:15jenatali: jekstrand: CL doesn't generate the conversion opcode when rounding modes are used for fp16, it can only explicitly specify rounding modes when using the vload/vstore_half extension opcodes
18:22daniels: jekstrand: the documentation is in MR comments and #dri-devel backlog afaik
18:23daniels: but yeah, more seriously, if the question is 'why this way?', the answer is 'because I'd not realised there was another way'
18:32lrusak: bnieuwenhuizen, ah okay, so triple buffering will only help with async?
18:42ajax: enh. it helps in pretty much any situation where you render faster than the display and you don't mind using 100% of the cpu/gpu time.
18:42ajax: synced presentation or not
18:43bnieuwenhuizen: In theory what ajax said, in practice it has some issues that prevent it from really helping with latency
19:03AndrewR: karolherbst, thanks, I tried to manually edit diff I downloaded, but currently mesa rebuild in progress - it will take some time ..Thanks a lot for such superfast response!
19:08jekstrand: jenatali: Can you vstore float16?
19:08jenatali: jekstrand: Yep
19:08jekstrand: Oh, yes you can. I see what's happening now.
19:13jekstrand: jenatali: I just pushed to a for/jenatali branch on my gitlab with the refactored conversion helper. I think things are a bit cleaner now. Time to cherry-pick it all onto my iris clover branch so I can do the lowering pass and have some hope of testing it.
19:14jenatali: jekstrand: Sounds good :)
19:27jekstrand: Ugh... Have to figure out how to rebase on anholt's loader changes
19:35austriancoder: MrCooper: any thoughts about https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6790 ?
19:46jekstrand: karolherbst: What do I need to install to get libclc?
19:46karolherbst: uhm.. ask jenatali or airlied for the binary :p
19:46karolherbst: not sure if we have binary packages with the spirv bits out yet
19:46jekstrand: Someone want to shuffle me a sketchy binary?
19:47karolherbst: I could probably forward what I have today, but it might be out of date
19:47airlied: there is one in the ci docker image :-p
19:47karolherbst: ahh, fair enough
19:47jekstrand: airlied: You landed your CI patches?
19:47airlied: but i built one locally yesterday, gimme a few mins to get to a pc
19:48airlied: jekstrand: still in mr,
19:48airlied: piglit has inconsistency in how many tests run for some reason
19:50jekstrand: I could build it but that would require switching LLVM versions somewhere in my stack. 😬
19:51airlied: jekstrand: not really
19:51airlied: you can build it with llvm 10
19:51jekstrand: why is there a trailing -?
19:51airlied: libclc can be built out of tree
19:51airlied: jekstrand: just triples
19:52airlied: originally it was spirv64--.spv, not sure why mesa3d got added :-P
19:52airlied: libclc libraries are all triple named
19:52jenatali: airlied: I think it was suggested from LLVM upstream folks
19:54jekstrand: airlied: Got a pkgconfig to go with that?
19:54airlied: jekstrand: just install distro libclc
19:54airlied: and drop it in the dir
19:55jekstrand: oh, ok.
19:57jekstrand: Woo! I'm up-and-running again!
20:03airlied:will hopefully track down piglit issue today, but have a kernel regression to work out in parallel
20:05airlied: might be worth adding a few conversiony tests to piglit
20:05airlied: at least hit some of the more useful cases
20:05jenatali: As long as they run faster than the CTS...
20:06jenatali: I never want to run that test ever again
20:06anholt: how long is the cts taking? and are you running in parallel?
20:06airlied: yeah the CTS is probably a bit too comprehsive
20:09jenatali: anholt: That specific test has ~900 test groups, for each of (source type, dest type, rounding mode) running through every single possible input value
20:09jenatali: It's... ridiculous
20:09jekstrand: Oh, my....
20:10AndrewR: karolherbst, 6040 also fails after i applied it on top of 4974 ... "../src/gallium/frontends/clover/core/printf.cpp:268:46: error: too few arguments to function call, expected 2, have 1"
20:11airlied: jenatali: hours or days? :-P
20:11jenatali: airlied: On hardware, like 28 hours?
20:11jenatali: I tried on WARP (our D3D software GPU) and gave up after a few days
20:12jenatali: After it only made it through like 50 test groups...
20:12airlied: okay so on llvmpipe we could expect it to take a while :-p
20:12jenatali: Yes, a while :)
20:13jenatali: airlied: Math bruteforce is up there too, though not quite as long
20:17jekstrand: airlied, jenatali: Sounds like we need a CPU rasterizer race!
20:17airlied: jekstrand: we both just need to get threadrippers first :-p
20:17jenatali: airlied: I think we actually have one on order specifically to accelerate running the CL CTS on our software driver......
20:18airlied: seomday I'll get CL CTS built locally and try and run it :-P
20:19airlied: can I run cl1.2 from master yet?
20:20jenatali: airlied: Don't think so, https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4974 is still open
20:20jenatali: Though I'm not sure how much of that is AMD-specific vs NIR
20:21airlied: jenatali: I meant with CTS as well :-P
20:21airlied: it had some wierd cl 1.2 in a branch
20:21jenatali: airlied: Oh, yeah, CL1.2 CTS is merged into master
20:21jekstrand: CL 3.0 here we com!
20:22jenatali: I'm looking forward to flipping that switch :)
20:22jekstrand: Supposidly, it's going to be very little work.
20:22jenatali: Yeah, just implement some caps and functions
20:24jekstrand: Everything is optional!
20:24airlied: ah cool
20:24jenatali: Yeah, all you need to implement is the caps to say you don't support anything :P
20:24jekstrand: Well, nearly everything. I think they made a few quality-of-life things required
20:25airlied:fails to build CL CTS again
20:25airlied:wonders what is so hard about just making mkdir build; cmake -G Ninja ..; ninja work
20:26jenatali: airlied: What's failing for you? I didn't think it was that hard...
20:26jenatali: airlied: Did you see https://github.com/KhronosGroup/OpenCL-CTS/blob/master/build_lnx.sh ?
20:26airlied: yeah it's full of paths to things it surely can find itself
20:26airlied: but can't
20:26jenatali:is apparently too used to that from Windows development...
20:27airlied: and README.md is very is insightful :P
20:36yshui`: will glXSwapBuffers block glFinish in mesa? The impression I got from my experiments is that it doesn't.
20:39anholt: no, glfinish shouldn't care about a previous swapbuffers.
20:39anholt: (but also, if you're calling glfinish, you're almost always doing something wrong)
20:45yshui`: i see. so the opengl wiki is wrong. also the nvidia driver does wait for swap buffers in glfinish.
20:46yshui`: why is glfinish bad?
20:47anholt: generally, you don't want to stall your gpu.
20:48anholt: there's no reason to call glfinish, unless you're doing some external synchronization thing, but if you are you should be using fence fds to keep from stalling the gpu.
20:52yshui`: yes, using fences would be idea. but if my application is light on gpu usage it shouldn't matter that much.
20:52anholt: but, like, what are you doing where glfinish is something you need to call in the first place?
20:54yshui`: minimize input to screen latency in a X compositor
20:55yshui`: if i let the gl commands sit in the queue waiting for buffer to swap, the latency would be > 1 frame
20:56yshui`: if i wait for swap to complete before starting render, the latency will be within 1 frame
20:58anholt: if you want to wait for a swap, you should probably be using glXWaitForSbcOML() or whatever the egl equivalent is, not a glfinish.
20:59yshui`: nvidia doesn't have that iirc :'(
20:59anholt: (but if you're trying to do that, why specifically the swap as the time to start compositing, as opposed to, say, halfway through the frame time?)
21:02yshui`: well it's a start. to start rendering midway through a frame i need to gauge how long the render will take.
21:03yshui`: so i dont miss the vblank
21:07yshui`: but ideally i do want to start rendering as late as possible
21:08ajax: anholt: did i965/iris ever get a fence implementation that wasn't equivalent to glFinish though?
21:11ajax: yshui`: "let gl commands sit in a buffer" - SwapBuffers implies glFlush, implies rendering is submitted to the hw before SwapBuffers returns. Finish is even more brutal, in that it waits for rendering to _complete_
21:12yshui`: the hw cannot start processing those commands before the buffer is swapped
21:13ajax: i mean... if the rendering destination is awaiting a previous swap to release, maybe?
21:13yshui`: you cannot writing to the back buffer when it's still the front buffer
21:13yshui`: yeah, that is why i need to wait for the buffer swap
21:16ajax: but, if the buffer you're waiting to render to is _currently_ the front, then you've already got a swap pending. right? (assuming not triple-buffering)
21:18ajax: (this conversation is giving me deja vu)
21:21AndrewR: karolherbst, https://pastebin.com/nfNEey3e (full compile error with printf patch applied)
21:21yshui`: yes. the pending swap has to finish, before the comnands can be processed
21:22ajax: right. so nvidia doesn't have glXWaitForSbcOML, but it does have glXWaitVideoSyncSGI, which is a bit more awkward as a "sleep until the frame posts" api but can be made to work
21:23yshui`: i think i would just use glfinish on nvidia. it's improper, but nvidia's glfinish does wait for swap
21:24ajax: (why - convo in #xorg-devel the other day about what glx extns nvidia supports)
21:24ajax: yeah, fair enouhg
21:28yshui`: huh, nvidia does have sgi video sync
21:29karolherbst: AndrewR: well, if you merge multiple MRs together locally, you have to resolve your compile errors yourself :p I think the MR is just conflicting with the other one
21:29ajax: i'm not really sure why they don't have OML_sync_control tbh, it's pretty innocuous
21:29yshui`: what is the difference between the "frame counter" in video sync, vs. SBC?
21:30ajax: nothing, afaik
21:30ajax: "swap buffer counter"
21:30yshui`: yeah, weird they don't have sync control
21:31ajax: i do think the reason they don't have OML_swap_method is that any non-trivial swap-method you could specify with it is then required to be a permanent decision for the whole fbconfig (and any drawable created against it), which limits the optimizations you can do
21:31ajax: personally i'd be happy to nuke OML_swap_method from mesa
21:32yshui`: i heard swap_method doesn't make much sense when a compositor is present.
21:40ajax: it's... icky. like it might be cool if there was a way to convey the desired swap method to the compositor and have that matter
21:40ajax: but it's kind of bogus to force it to be a property of the drawable instead of something you optimize for per-swap
21:45anholt: ajax: arb sync may still be weird, I dunno. but the fence fd stuff in iris should be legit.
21:46jekstrand: karolherbst: Does your little CTS runner work on the conversions tests?
21:46jenatali: Probably not, they have a different format
21:47karolherbst: it does
21:47jenatali: Oh, cool
21:47karolherbst: not saying it's pretty though :p
21:47karolherbst: but in python it's also not that bad to create the list
21:48karolherbst: doesn't handle "half" but that should be easy to add
21:48jenatali: karolherbst: I don't think the conversions test can target half
21:48karolherbst: and it generates some invalid combinations, but the CTS is cool with it
21:48karolherbst: I see
21:48jenatali: There's a dedicated test_half for that
21:49karolherbst: the biggest problem with my runner is, that it depends on the return code, and the CTS isn't consistent with any of that
21:49karolherbst: so I still need to think of a solution besides fixing the CTS
21:49anholt: cl cts is not deqp-ish?
21:49jenatali: The official CTS runner python script looks for ERROR and FAILED output...
21:49AndrewR: karolherbst, ok :}
21:49karolherbst: I think I should do that as well
21:50karolherbst: the python script was also more to get away from my bash script I had before which was even uglier
21:50jenatali: karolherbst: I just hit an issue where a CTS run produced garbage output, and I can't tell if it was trying to pass or fail...
21:50jenatali: E.g. sigwt lmmfas LMMALCHS_T LMMCP_OTPR ufrFL sn tutts asdTsigwt lmmfas LMMUEHS_T
21:50karolherbst: jenatali: that's on windows, isn't it? :p
21:50jenatali: karolherbst: Aye
21:50karolherbst: also threading
21:50karolherbst: windows printf isn't thread safe
21:51jenatali: It looks like characters are missing, I dunno
21:51karolherbst: so lineendings are global
21:51karolherbst: or something
21:51karolherbst: on linux you are guarenteed that printf prints line by line
21:51karolherbst: so the order is messed up, but lines are consistent
21:51karolherbst: on windows it's all garbage
21:51jenatali: Like "etn ihc_e_lg:C_E_LO_OTPR|C_E_OYHS_T" is supposed to be "testing with cl_mem_flags: CL_<something> | CL_<something>"
21:52karolherbst: maybe something else is happening? :D
21:52karolherbst: I just know that printf + threading + windws == bad
21:52jenatali: The real test is does it happen again :P this one's only a half hour to run, so not too bad
21:52jenatali: And it finished and didn't happen again, hooray
21:59jekstrand: Hrm... I seem to have broken char->uchar sat conversions
22:00jenatali: jekstrand: Shouldn't that just be [0, 127] clamping?
22:00jekstrand: jenatali: I'm sure it is
22:00jekstrand: jenatali: I just need to figure out why it's busted
22:00jekstrand: I'm sure it's simple
22:01AndrewR: karolherbst, I added NULL as argument, now clinfo says: printf() buffer size 1048576 (1024KiB) - is this correct?
22:01jenatali: Probably the char I/O rather than the conversion itself
22:01karolherbst: AndrewR: uhhh, maybe?
22:01karolherbst: no clue
22:08jekstrand: Crisis averted...
22:08ajax: huh. so the classic drivers only expose the 'undefined' oml swap method class.
22:09ajax: gallium also explicitly exposes 'copy' for some reason. i guess because internally it has to work anyway.
22:09jekstrand: jenatali: How long should I expect these tests to take?
22:09jenatali: jekstrand: The conversions tests? Forever :D
22:09karolherbst: jekstrand: very long if you go for the proper run
22:10jekstrand: karolherbst: How do I run conversions with your runner?
22:10jenatali: If you add -w, it's maybe an hour for a full run?
22:10jekstrand: jenatali: -w?
22:10jenatali: "wimpy" mode
22:10jenatali: It scopes it down to only run like 1/32 of the tests
22:10karolherbst: I think I disabled conversions... let me see
22:11ajax: i wonder how widely used it actually is. man i miss google code search.
22:11karolherbst: jekstrand: clctsrunner.py -w -i conversions
22:11karolherbst: or without -w if you want to go slower :p
22:12jekstrand: karolherbst: clctsrunner.py: error: unrecognized arguments: -w
22:12jekstrand: karolherbst: Do I need a newer version?
22:12karolherbst: I did quite a couple of changes
22:12karolherbst: even add arguments for overwriting clovers OpenCL versions and stuff
22:13jekstrand: Ok, running now
22:13jekstrand: And failing like mad
22:13karolherbst: do some pass?
22:14jekstrand: karolherbst: Yeah, a bunch do
22:14karolherbst: ahh, okay
22:14jekstrand: But I made some cases assert that didn't look like they did the right thing :)
22:16jekstrand: Like rounding an integer to nearest-even. What even is that?
22:16jenatali: jekstrand: What's the full conversion?
22:16jekstrand: convert_uchar_rte( float )
22:17jenatali: That's float -> uchar
22:17jenatali: So what's 120.5 convert to? :P
22:18jekstrand: Yeah, but it wasn't doing round_even. It was just hoping that that the conversion did that by magic
22:18karolherbst: _rte is default for float
22:18jekstrand: wait... yeah
22:18jenatali: _rtz is default for float -> int, _rte is default for int -> float
22:18jenatali: Or float -> float
22:18jekstrand: Sorry, was thinking rtne it's rtz
22:19karolherbst: ohh right
22:19karolherbst: to int it was _rtz...
22:19jenatali: Which matches NIR's constant folding for conversions, as well as CL's spec
22:24jekstrand: <built-in>:1:10: fatal error: 'opencl-c.h' file not found
22:24jekstrand: Oh, righ... I need to update my LLV
22:27jekstrand: Ok, now it's really running. :)
22:28jekstrand: I don't really get how float->flot rtne is supposed to work...
22:30jekstrand: At 40%. 100% pass so far
22:32karolherbst: jekstrand: ehh.. ignore those which don't make sense
22:32karolherbst: my runner generates test which are invalid
22:32karolherbst: but CTS just "passes" them
22:33karolherbst: so I just didn't care about removing those
22:33jekstrand: that's fine
22:33jekstrand: I'm also running "for real" on my laptop"
22:33karolherbst: I think they are even valid cl functions? not sure
22:33karolherbst: they just don't do much
22:34jenatali: jekstrand: Double -> float rtne makes sense, since double has a mantissa that can represent values that float's can't
22:34jenatali: jekstrand: Same reason int -> float rtne makes sense, you need to round to the closest representable 24bit value from a 32bit integer
22:36jekstrand: Pass 945 Fails 51 Crashes 4 Timeouts 0
22:36jenatali: Not bad
22:37karolherbst: jekstrand: I guess that's all lowered to the normal opcodes or is that with real hw support?
22:38jenatali: karolherbst: This is with code that we're using for CLOn12, hooked straight into vtn
22:38karolherbst: anyway, I would like to wire up the real stuff for nouveau, I just need to get a good overview on what we support directly and what not
22:38jenatali: Slightly refactored and cleaned up
22:38jekstrand: jenatali: I'm done for the night but here's what I've got: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945
22:38karolherbst: so, just reusing whatever we have now
22:39jenatali: jekstrand: Cool, thanks, will take a look :)
22:39jekstrand: karolherbst: I may wire it up "for real" on Intel as well but we need to get the lowering path working first.
22:39karolherbst: that's fair
22:39jenatali: jekstrand: Whoa, didn't realize you actually did hook it up to an intrinsic, that was fast
22:39karolherbst: do you think it makes sense to move to the intrinsic for vulkan and opengl as well?
22:40karolherbst: the intrinsic is closer to how we do conversions in codegen as well
22:53jenatali: That... really wasn't as bad as I expected it to be, cool
23:09AndrewR: karolherbst, ah, according to this https://lists.ffmpeg.org/pipermail/ffmpeg-cvslog/2017-November/111296.html ffmpeg need OpenCL images :}
23:17jenatali: OpenCL images are in the works
23:17jenatali: Well, for generic Clover at least, can't comment on Nouveau
23:19AndrewR: jenatali, I currently abuse llvmpipe's OpenCL mode :}
23:19jenatali: Ah, then you're looking for airlied to hook up images :P
23:21AndrewR: jenatali, sure, but he is obviously busy - I think I'll just leave xchat open and go to bed for now ....
23:28airlied: jenatali, karolherbst : with CL images when do we know the image formats and sampler?
23:28airlied:expects llvmpipe will need some rework here
23:29karolherbst: formats are runtime only
23:29jenatali: airlied: Formats are known at kernel enqueue time, as are sampler properties, unless the sampler was declared inline
23:34airlied: yeah lklvmpipe will have to rebuild shaders at runtime then
23:35airlied: I suppose I could design a texture block in software, might be useful for gl4/vulkan as well
23:43jenatali: airlied: Yeah, CLOn12 rebuilds shaders at runtime, since DXIL requires the type information at the shader decl, and we have to lower instructions based on some sampler properties