00:19bl4ckb0ne: imirkin: i found a solution for my problem last night, but i still have 1 fail on 8 tests when my window is resized
00:20imirkin: if you're overly picky, you can end up with fractional problems
00:20imirkin: a pixel is either covered or not
00:20imirkin: you can't divide it in 2
00:20bl4ckb0ne: i think in this case its an ortho matrix issue
00:21bl4ckb0ne: because my resized window is not a square
00:21bl4ckb0ne: maybe a glScissor or a glViewport could help me
00:36bl4ckb0ne: wew it works
00:40bl4ckb0ne: is one test "good enough" for an extension, or there should be more"
00:40imirkin: depends on the ext
00:41imirkin: esp when this is just a clone of desktop functionality, you should (a) test that it works at all and (b) make sure you cover any departures from the desktop ext
00:41imirkin: if any
00:41bl4ckb0ne: its basically arb_draw_instanced for gles
00:42imirkin: but i haven't looked at the ext
00:42imirkin: does it differ in any way?
00:42imirkin: are there new queries? or things that are allowed in the desktop ext that aren't in the gles one?
00:43bl4ckb0ne: its gl_InstanceIDEXT instead of gl_InstanceIDARB for the keyword
00:43bl4ckb0ne: im gonna look at the files to check
00:44imirkin: note that this ext is also a desktop ext
00:44imirkin: and it says " The error INVALID_OPERATION is generated if DrawArraysInstancedEXT
00:44imirkin: or DrawElementsInstancedEXT is called during display list
00:44imirkin: which i think is different than the ARB one, although i'd have to check
00:45bl4ckb0ne: > The error INVALID_OPERATION is generated if DrawArraysInstancedARB
00:45bl4ckb0ne: or DrawElementsInstancedARB is called during display list
00:45imirkin: then you're good =]
00:46imirkin: also, by the looks of it, EXT_draw_instanced *does not* add gl_InstanceIDEXT to the desktop variant
00:46imirkin: as that is expected to come from EXT_gpu_shader4
00:46bl4ckb0ne: > OES_element_index_uint affects the definition of this extension.
00:46bl4ckb0ne: maybe not
00:46bl4ckb0ne: yeah im re-reading it
00:47imirkin: see issue 1 at the end
00:47imirkin: also note that it only makes changes to the OpenGL ES Shading Language Specification
00:47imirkin: under "Dependencies on OpenGL ES 2.0" section
00:48imirkin: alternatively, you could just choose to only expose it for GL ES
00:48imirkin: since on desktop, there's ARB_draw_instanced
00:48imirkin: i think that's probably the better plan
00:50bl4ckb0ne: will do, thanks
01:10karolherbst: robher: 90dc0d1ce890419f977e460b8258d25187dde64f broke loading nouveau on my jetson nano. Any ideas? Reverting helps on top of 5.6.3
01:11karolherbst: the vdd_gpu node isn't found when nouveau tries to load it
01:17robher: karolherbst: Can you add some prints to of_find_node_by_phandle() to log the lookups?
01:18karolherbst: robher: after or before the revert?
01:22karolherbst: robher: uhm.. I kind of don't see anything obviously useful to dump there.. handle looks like some opaque 32 bit number?
01:25robher: karolherbst: Something like: printk("OF: phandle = %d, node = %pOFf", handle, np)
01:26robher: karolherbst: then we need vdd_gpu node's phandle value out of /proc/device-tree
01:26karolherbst: ahh, I see
01:27karolherbst: robher: it doesn't have a phandle... at least not on my kernel with that one reverted
01:27karolherbst: cat /proc/device-tree/regulators/regulator@6/phandle returns nothing
01:28karolherbst: regulator-name is VDD_GPU
01:29karolherbst: ohh wait
01:29karolherbst: I am silly :)
01:29karolherbst: phandle is dumped as binary
01:34karolherbst: robher: https://gist.githubusercontent.com/karolherbst/5859ede13e28bd67c3749b6325252ff3/raw/27d8f66b5855aeff17db510ba4cafa25e4f808ce/gistfile1.txt
01:35karolherbst: # hexdump /proc/device-tree/regulators/regulator@6/phandle: 0000 1d00
01:35karolherbst: mhh, maybe I need a different path.. let me search
01:37karolherbst: but it seems to be that one
01:38karolherbst: hexdump /proc/device-tree/gpu@57000000/vdd-supply also returns 0000 1d00
01:42bl4ckb0ne: imirkin: https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/205
01:43bl4ckb0ne: and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3204
01:43airlied: uggh the output ordering for transform feedback and geom invocations is all kinds of messy
01:49robher: karolherbst: what's the phandle value for external-memory-controller@7001b000 ?
01:50robher: karolherbst: that's what the cache finds for phandle 29
01:54karolherbst: robher: ufff: 0000000 0000 1d00
01:55robher: karolherbst: uhh, 64 bits?
01:55karolherbst: no... the first 0 are the address
01:55karolherbst: I forgot to omit them
02:00robher: karolherbst: Jon Hunter had reported an issue on some Tegra platform with this patch too. IIRC, it was a different failure though. He was going to investigate, but I never heard more. Seems like the bootloader is doing DT mods that aren't valid.
02:01karolherbst: robher: I could dump the entire bootloader log as well if that helps
02:01karolherbst: probably not though
02:04robher: karolherbst: a comparison of the dtb in storage and what's provided to the kernel would be good. It's in /sys/firmware/devicetree/fdt IIRC.
02:04robher: karolherbst: have to go now. Will check on this in the morning.
02:05karolherbst: robher: mhhh....
02:06karolherbst: I can extract the dtb from the partition directly actually... mhh, maybe I should also just update the frimware as well
02:41airlied: uggh have to run invocations per primitive, not primitives per invocation
02:42imirkin: i think the idea is for it to be done in parallel
02:42imirkin: the max invocations only has to be like 32
02:42imirkin: also if you batch primitives per invocation, you end up with non-deterministic order
02:43imirkin: which can affect overdraw as well as like atomic stuff, not to mention xfb
02:43airlied: yeah llvmpipe batches input prims into 4-wide vectors
02:44airlied: and invocations were wrapped around that
02:44airlied: but as I've discovered that is the wrong order :-P
02:44imirkin: SoA vs AoS again
02:44airlied: I can probably change it to batch per invocation rather than primitive if there are invocs
02:44airlied: for now I've hacked it :-P
02:44airlied: now bgra why do you hate me
02:45imirkin: i expect the feeling is mutual?
03:02airlied: totally :-P
03:11airlied: uggh the bgra fails seem to be some sort of 32-bit float to 8-bit fail
03:15HdkR: That fails with bgra and not rgba? Wacky
03:21airlied: HdkR: yeah it's a bit strange I woudln't see fails elsewhere
03:21airlied: HdkR: the test does a float -> float comparison with no epislon
03:23HdkR: Which tests are these? :D
03:24airlied: HdkR: the private GL conformance ones
03:24HdkR: ah right
03:26airlied: src 0 0 0.0156862754 0.0313725509 0.0470588244 0.0627451017 0.0784313753 0.0941176489 0.109803922
03:26airlied: src 0 0 0.0156862754 0.0313725509 0.0470588282 0.0627451017 0.0784313753 0.0941176564 0.10980393
03:26airlied: spot the difference :-P
03:26HdkR: whew, that's a small amount
03:27airlied: yeah 3d40c0c1 3d40c0c2 in hex
03:27HdkR: Love being 1ulp off
04:13vivijim: Lyude: mdnavare: thanks for reporting and sorry for the longlong delay... drm-tip is compiling again
04:13vivijim: airlied: ^
04:21airlied: vivijim: thx!
10:00danvet: bnieuwen1uizen, can you try to move another one?
10:01danvet: since I just moved the one you couldn't move
10:01danvet: ajax, didn't you get Reporter rights for drm for issue moving and stuff?
10:02danvet: hm we added mesa
10:03danvet: bnieuwen1uizen, just realized that you should have Reporter access
10:03danvet: since we added the entire mesa group as Reporters
10:03danvet:no idea what's not working there
10:33bnieuwenhuizen: danvet: my suspicion was that it was attachment not moving because I found some historical gitlab bugs, but if you succeeded then that obviously is not the issue (anymore)
10:34bnieuwenhuizen: danvet: https://gitlab.freedesktop.org/mesa/mesa/-/issues/930 fails for me still
10:35danvet: bnieuwenhuizen, should I try to move that one too?
10:36danvet: or want to update the fd.o issue with that one and hope
10:39danvet: bnieuwen1uizen, as a test gave you developer rights for drm/amd, can you try whether that helps?
10:52MrCooper: bnieuwen1uizen: are you trying with the "Move issue" UI, or with "/move drm/amd" in a comment?
11:34pq: Is there a drm.debug category that shows me the pageflip events DRM sends to userspace?
11:35pq: I can't see anything in 0x1ff that would be relevant.
11:36pq: hm, 0x2 results in no messages at all
11:40cwabbott: jekstrand: so, I just found out that the qualcomm load ubo instruction takes an indirect argument that's in units of 4 dword's, and there's a dword immediate offset/size within the instruction if you only want to load part of a vec4
11:41cwabbott: so, it's basically oriented towards the std140 alignment model
11:42cwabbott: I think that I could reverse-engineer the immediate offset using ALIGN_OFFSET, but reverse-engineering the original indirect offset from the byte offset we get is trickier
11:46cwabbott: we could just subtract ALIGN_OFFSET and then divide, but then we'd still be left with an extra "& 0x3fffffff" at the end due to how we throw away the high bits of the offset when converting to bytes in spirv_to_nir
11:47cwabbott: and I'm not sure we'd be able to optimize away the mess in the more complicated cases
11:49cwabbott: ideally we'd just have the vec4 offset handed to us
11:51emersion: speaking of drm.debug, i'd really love something that just shows the reason why an atomic commit fails
11:51cwabbott: if we don't have that, then I guess we'd have to emit no_unsigned_overflow when generating the address arithmetic, and use it later
12:03jvesely: daniels: does this look OK to you? https://github.com/jvesely/llvm-project/commits/libclc-win
12:03jvesely: I'm just wating for CI to finish before pushing
12:13daniels: jvesely: yep, that lgtm, thankyou!
12:31cwabbott: jekstrand: the other thing is, our load/store instructions all take a "bindless base" (descriptor set) and descriptor index, and do the offset math + descriptor load themselves, so we can't use nir_lower_explicit_io as it stands
12:34cwabbott: and, of course, the descriptor set is an immediate embedded in the instruction
12:36cwabbott: which means that we can't use any addressing format other than logical, as we need to know the descriptor set statically at the load/store instruction
12:37cwabbott: and I don't think that nir_lower_explicit_io is currently setup to handle the logical format
12:38cwabbott: seems like for SSBO's we have a similar problem, and the current solution is to emit the shift and just take the hit
12:43cwabbott: I wonder if we shouldn't just fork nir_lower_explicit_io and write our own version that does what we want
12:49cwabbott: it just sucks that then we won't be able to use the load/store vectorizer
13:11cwabbott: ugh, actually nothing even sets a non-zero value for align_offset now
13:37cwabbott: yeah, I just can't see a way to get what we actually want without rewriting a lot of the i/o lowering code
13:39cwabbott: it's just not set up for hardware with more... peculiar restrictions
14:03MrCooper: danvet: hmm, I moved https://gitlab.freedesktop.org/drm/misc/-/issues/15 last week, but now I can't seem to move https://gitlab.freedesktop.org/mesa/mesa/-/issues/2764 to drm/misc either; maybe the Mesa group membership isn't active anymore for some reason?
14:04danvet: MrCooper, let me crank up your permissions a bit for testing
14:05danvet: MrCooper, hm what's your gitlab nick?
14:05danvet: ah found it
14:05danvet: MrCooper, try again?
14:06danvet: oh crap wrong project
14:06MrCooper: worked now
14:06danvet: I changed nothing!
14:06danvet: not in drm-misc at least
14:07MrCooper: just a fluke then?
14:07danvet: I have no idea
14:08danvet: daniels, ^^ we seem to have some people having trouble with moving issues ...
14:08danvet: hwentlan, did you see my nagging about VRR and async?
14:09danvet: could be I'm wrong on it, since I have no idea how exactly amd hw works for vrr
14:09MrCooper:not having much luck getting any response from DC folks lately...
14:12danvet: MrCooper, well you lost your magic @amd.com powers
14:12danvet: airlied and me have been bickering about the amd firewall since years :-)
14:15imirkin: welcome to ... the other side
14:16MrCooper: there's no "AMD firewall", people on my former team like agd5f, pepp, mareko_ and Christian are always responsive
14:18daniels: danvet, MrCooper: please file an issue on freedesktop/freedesktop about moving issues with some details
14:19MrCooper: ugh, sorry but I'm in the post-Easter swamp and don't care about it all that much right now; bnieuwen1uizen maybe?
14:19danvet: MrCooper, we've not complained about the old hats ...
14:20danvet: daniels, bnieuwen1uizen already filed one, but the issue he's blocked on is now moved because I pressed the button for testing
14:20danvet: and it worked for me
14:26karolherbst: robher: do you want me to send you the two dtb files or what's the best way to continue debugging this?
14:28robher: karolherbst: yes, please send them to me.
14:31karolherbst: robher: sent
14:49jekstrand: cwabbott: *ugh*
14:49jekstrand: cwabbott: As far as explicit_io goes, you should be able to just add a new address format that is a vec3 of (set, binding, offset) and trust constant folding to give you an immediate for set
14:50jekstrand: As far as align_mul/offset goes, yes, nothing currently sets them. However, that's mostly because I'm lazy and didn't bother figuring out how to do any better because I didn't care.
14:50agd5f: danvet, MrCooper many of the DC guys have been tied up on a special project recently so they may be a bit slow to respond.
14:50jekstrand: In theory, we should be able to determine that the alignment is actually 16B for std140 UBOs.
14:52jekstrand: cwabbott: It'll just take more effort in nir_lower_explicit_io. For cases where we can chase the whole chain, we should be able to get perfect alignment information.
14:52jekstrand: Variable pointers makes things substantially harder because we don't currently have an alignment on cast derefs to tell us where to re-start with pointers.
14:53jekstrand: However, variable pointers is only for SSBOs IIRC
14:54cwabbott: the annoying part isn't just the alignment, though... it's also that the units themselves have to be vec4
14:55cwabbott: or you have to trust nir to cleanup the entire deref chain
14:56cwabbott: which at the moment it can't do, because it doesn't record that the multiplications can't overflow and afaik can't use that to clean it up
15:03jekstrand: cwabbott: Right. However, a ">> 2" isn't going to kill you. My #1 concern is correctness.
15:03jekstrand: cwabbott: That said.... you could make an address format that's (set, binding, vec4 offset, byte offset)
15:03jekstrand: or something
15:04jekstrand: or (set, binding, array offset, const offset)
15:04jekstrand: that sounds more practical
15:04robher: karolherbst: I pinged Jon about the issue. Definitely the bootloader creating a broken dtb.
15:04jekstrand: Where array_offset could be in units of vec4s if you assume std140
15:04robher: karolherbst: and Cc'ed you.
15:10karolherbst: robher: ahh okay :)
15:15karolherbst: thanks for looking into this, btw!
15:37danvet: j4ni, [RFC 4/6] drm/bridge/sii8620: fix extcon dependency <- my apologies for the cc
15:55mripard: siqueira: I've been looking at your v7 for the writeback support in IGT, but I couldn't tell from the archive why it hasn't been applied?
15:56emersion: eheh, i guess i still need to review it
16:02hwentlan: danvet: no, didn't see the nagging. apologies for not being very responsive these days
16:02danvet: hwentlan, oh I only nagged today after mdnavare asked some questions over w/e
16:02danvet: could very well be that I'm not understanding the full magic of amd DC hw
16:03danvet: but at least from just reading the code I'm worried we might have a problem
16:10emersion: is the CRTC gamma_lut supposed to be applied to all planes, including the cursor plane?
16:10emersion: it seems like radeon doesn't apply the gamma_lut to the cursor plane while i915 does
16:11imirkin: emersion: unclear
16:11imirkin: there was a proposal to have a per-plane lut property
16:12imirkin: i don't think it was ever merged
16:12imirkin: traditionally, in hw, gamma doesn't apply to cursors. however more recent hw has made things more configurable, i think.
16:16emersion: i see
16:16emersion: and there's no way for user-space to figure that out
16:21tlwoerner: ajax: the application period for google SeasonOfDocs is now open
16:22tlwoerner: ajax: one of the first things we'll need is an "ideas" page, I've taken a stab at a first draft here: https://www.x.org/wiki/SeasonOfDocsIdeas
16:23tlwoerner: we'll need to add more ideas and mentors if we want to make a serious application
16:23tlwoerner: i'll try to flesh out the "about" section with more details/information
16:24tlwoerner: if anyone is curious... we're hoping to consolidate the Xorg/fdo documentation into something more consistent and comprehensive (one voice, one format, etc)
16:24daniels: dcbaker: hey, what was the rationale for https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/meson.build#L1409 ?
16:25tlwoerner: similar to google's GSoC, google also now has a "SeasonOfDocs" program where google will pay someone to work on your project's documentation
16:25daniels: dcbaker: i had to rebuild my Windows machine, and after following the exact same steps as last time, the Mesa/Meson build now fails to find LLVM using CMake, but works if I force the method to config-tool ...
16:26dcbaker: daniels: cmake always statically links, and llvm only builds statially on cmake, and a lot of times llvm on windows doesn't have llvm-config
16:26dcbaker: with meson 0.54 we can stop setting it
16:26danvet: sravn, 20200408121333.GM3456981@phenom.ffwll.local <- did you see that reply from me?
16:27daniels: dcbaker: neat, so >= 0.51 && <= 0.54 would work?
16:27dcbaker: I fixed the cmake dependency for llvm
16:27daniels: *< 0.54
16:27daniels: hmm, maybe your fix is why I can't build it anymore :P one of the things that came with the rebuild was upgrading from Meson 0.53 to 0.54
16:28dcbaker: with 0.54 cmake will bail if you ask to link dynamically
16:28dcbaker: you can't link llvm dynamically on windows
16:28daniels: I have --default-library=static
16:28daniels: i know, and I disabled the dylibs
16:29dcbaker: do you have -Dshared_llvm=false
16:30dcbaker: which it looks like we need to fix, since it defaults to a value that can never work on windows...
16:30dcbaker:has patches to write
16:30daniels: oh hm
16:31danvet: sravn, also did you intentionally skip "[PATCH 07/44] drm/vboxvideo: Use devm_drm_dev_alloc"?
16:34daniels: dcbaker: how about https://static.fooishbar.org/shared-llvm-auto.diff ?
16:35daniels: dcbaker: sorry, F5
16:37dcbaker: you could simplify that to `if shared_llvm == 'auto'\n _shared_llvm = host_machine.system() != 'windows'`, otherwise yeah
16:38daniels: oh, does 'true' and 'false' auto-evaluate to bool?
16:39sravn: danvet, mripard, robher: cherry-picked three panel bindings fixes and pushed to drm-misc-fixes. I hope it is done in the right way..
16:40danvet: sravn, uh cherry-pick is kinda only for if you misplace it
16:40danvet: if it should got to drm-misc-fixes, just push it there
16:40daniels: dcbaker: mm, it doesn't, but updated it now - will pop it into an MR as well
16:40danvet: per the flowchart thing
16:40dcbaker: daniels: no,i meahhttps://paste.centos.org/view/3144b369
16:41danvet: sravn, but it's pushed now, so *shrug*
16:41daniels: dcbaker: yeah, if you reload, that exact thing is there now
16:42sravn: danvet: "20200408121333.GM3456981@phenom.ffwll.local" - this is the reply where you told me to polish my glasses (dev != pdev). But I did not look too close at the patches. Can take an extra look
16:42danvet: sravn, doesn't need to be, just wanted to check that you've seen it and are ok
16:43danvet: and that I didn't miss something on my side
16:45sravn: danvet: two of the fixes was already in drm-misc-next for some time. The last was pushed to drm-misc-next but yes, should have pushed direct to drm-misc-fixes. Knows better next time
16:45sravn: danvet: I think I skipped vobvideo because someone else had looked. Thomas IIRC.
16:46danvet: ah if they're already there then yeah just cherry-pick over
16:46danvet: sravn, you skipped only 1 of the 4 patches
16:47danvet: anyway I'll resubmit soon and then I can ping on the remaining patches for some acks
16:47danvet: plus I already have some more :-)
16:50dcbaker: daniels: go ahead and mention me, that seems like a good fix. We should also backport that to 20.0
16:51sravn: danvet: fine, I will wait for the resubmit and check the patches I have not yet acked
16:51sravn: danvet: some more, sounds good!
16:53daniels: dcbaker: cool cool, thanks
17:43Lyude: danvet: btw-do you think the kthread_worker stuff that Tejun mentioned on the lkml looks acceptable? I've been working on reimplementing the vblank workers with it and I'm pretty sure it's exactly what we need (also, using this definitely makes rescheduling and flushing a lot easier)
18:16jstultz: danvet: anholt : So is this just a shared gitlab instance issue, or is there something I need to dig into that the drm_hwc project isn't doing right? https://gitlab.freedesktop.org/drm-hwcomposer/drm-hwcomposer/-/issues/32
18:20danvet: Lyude, msm already uses that for what looks very much like a vblank worker
18:22danvet: well ok not vblank, but some encoder-level frame event thing
18:22danvet: anyway if it fits, sounds good
18:23danvet: jstultz, it happens occasionally since the cleaning isn't fully automated
18:23danvet: re-run the pipeline and pipe up on #freedesktop about which machine needs to be kicked
18:24jstultz: danvet: ok. i just wanted to make sure there wasn't anything wrong with the drm_hwc test (which is really just checking patch style and not much else)
18:25danvet: well maybe you want to look into the recommended ci templates and stuff
18:25jstultz: danvet: thanks. i'll check that out
18:26danvet: jstultz, also on the job page you can see which machine is hurting
18:26danvet: so just go over to #freedesktop with that and ask for someone to deliver a swift kick
18:27jstultz: danvet: jobs page? sorry, the reporter didn't give a specific instance, just a general notice he was seeing failures
18:27danvet: look closer
18:27jstultz: danvet: ah, under the pipeline.. ok found it
18:28jstultz: danvet: ok it looks like it was fdo-packet-m1xl-4 (#555)
18:28danvet: if you look at details on https://gitlab.freedesktop.org/andrii82/drm-hwcomposer/pipelines/132015/builds
18:28danvet: the one that succeed was run on a different box
18:30jstultz: right. ok, thanks for walking me through that. :)
18:39mdnavare: danvet: Yes still not clear why AMD uses the DRM_MODE_PAGE_FLIP_ASYNC flag for testing vrr in IGT, hwentlan?
18:41mdnavare: danvet: vsyrjala: Do you think we need to support asynchronous flips on i915 atomic driver for VRR to be working?
18:43mdnavare: danvet: vsyrjala: My understanding was that with the regular atomic page flips (synhronous), if we enable VRR in the driver, it will automagically either send the push and flip the frame right away or at the flip decision boundary or at the max vblank so we really should be okay using just the non blocking synchornous flip requests from userspace, thoughts?
18:50danvet: mdnavare, nicolas replied with explanations and all on the igt patch thread, I replied too
18:50danvet: it's just igt issues that we should improve
18:51danvet: no need to implement FLIP_ASYNC
18:51danvet: mdnavare, probably best we continue the discussion on the igt
18:51danvet: and for the igt make sure that it still works on amd hw, I'm sure nicolas/harry are happy the test patches on their hw
18:52danvet: mdnavare, but yeah summary is we're fine, just need some igt work
18:56mdnavare: danvet: So from the kernel pov vrr flip request should just happen with regular synchronous non blocking page flip ioctl?
18:56danvet: mdnavare, yup
18:57danvet: just normal atomic commit, nothing special
18:57mdnavare: danvet: cool
18:57danvet: but we need to improve the igt, so there's still work
18:57mdnavare: danvet: and where did nicolas respond?
18:57danvet: igt thread
18:57danvet: you're cced too
18:57mdnavare: okay lemme look that up
19:03mdnavare: danvet: Btw thanks a lot for clarifying things up on the async and vrr together on the igt thread
19:19hwentlan: danvet, mdnavare, yes, we can test the IGT patches on our HW when they're ready
19:50sravn: robher: thanks for all the feedback!
19:52robher: sravn: Thanks for throwing a few issues in to make sure I was paying attention. ;)
20:03alyssa:wonders how much floating point guts to expose to NIR
20:03alyssa: Bifrost has versions of ffma/fadd that let you bias the exponent
20:04alyssa: Intended for special function reductions, but I can imagine opts for things involving ('exp2', 'a(is_integer)') -- no clue if that pattern shows up in real shaders, though
20:05jekstrand: Do we ever support writing system values in mesa? I don't thihnk so. I think writes area always outputs.
20:05airlied: hmm definitely haven't found my memory leak, cts runner is using 32GB of RAM, screw you cts-runner I got 64GB :-P
20:05alyssa: jekstrand: Not that I know of
20:07cwabbott: alyssa: iirc I actually got the compiler to use those instructions for something like "a * b * 2.0"
20:07imirkin: alyssa: fyi, nvidia has that too
20:07cwabbott: I bet that one shows up more often :)
20:07imirkin: powers of 2 between -3 and 3, iirc
20:08cwabbott: imirkin: this one is for any power of 2 :)
20:08imirkin: sure, but that's just more bits for that extra exponent
20:08imirkin: the mul already renormalizes, what's an extra direct add to the exponent
20:08cwabbott: primarily meant for special-purpose things like implementing exp2 and so on
20:08alyssa: cwabbott: That particular sequence isn't fusing in the blob here but the blob isn't always, uh... ahem.
20:09alyssa:has been implementing exp2/log2 this week, always fun
20:09cwabbott: alyssa: I'd implement those sorts of things directly in the bifrost compiler
20:09alyssa: That's the plan
20:10cwabbott: no point exposing your "here's the table o' magic stuff" instruction to NIR
20:10alyssa: But the _MSCALE bits might be a bit more general
20:11alyssa: (Aside - the algo for log2 on newer bifrost is a little funny, not sure if you were following scrollback but for x normalized to [0.75, 1.5), they have one op to generate (x - 1) and the other to do log2(x)/(x-1) and a third to multiply
20:12alyssa: If it's a polynomial approx I guess it makes the Taylor/etc series a little simpler, still a little funny to me :) )
20:13cwabbott: is the log2(x)/(x-1) one one of those special bi-slot ops?
20:14krh: jekstrand: I have a r/w ir3 sysvalue
20:14jekstrand: krh: :(
20:14cwabbott: like the new rcp one
20:14alyssa: cwabbott: Interestingly, no, it's ADD alone and they do the x-1 on the FMA slot
20:14alyssa: (For exp2, they do use a bi-slot one)
20:14krh: jekstrand: actually, it's just marked an output at ir3 level so that it doesn't get clobbered by the shader
20:15alyssa: Opcode neighbors the usual _TABLE ops
20:15krh: jekstrand: it's delivered in the vs, but has to be preserved in the same register if we chain to gs or tcs
20:23cwabbott: alyssa: the ADD slot one takes (x - 1) as an argument, right?
20:24cwabbott: log(x)/(x - 1) seems to have a "nice" taylor series around 1
20:24alyssa: cwabbott: Nope, it takes x directly, normalizes it, and then computes log2(x) / (x - 1)
20:25alyssa: and indeed, it is a "nice" series, in the sense the division knocks the degree of log(x)'s series down
20:25cwabbott: alyssa: maybe there's another argument you missed that's using the bypass source?
20:25alyssa: cwabbott: I don't think so, I'm testing the ops individually.
20:26cwabbott: actually, I guess that makes sense... since the mantissa bits give you an offset from 1