00:05alyssa: jenatali: VK spec is a little ambiguous but I think you're probably ok
00:06jenatali: Yeah? Atomics in a TCS happening N+1 times because they're on both the critical path to patch constants and control points?
00:06alyssa: My take is that an app doing that is in UB
00:06alyssa: in VK
00:06jenatali: Oh cool
00:07alyssa: in GL, the spec does.. not support that reading
00:07alyssa: but given it's relaxed in VK let's call it a GL spec bug and move on ;)
00:17alyssa: mareko: actually, that raises a good question - do we expect this opt to benefit every proton game using tess, because it undoes the tcs lowering in dxvk/vkd3d-proton?
00:18alyssa: I assume you had a gl workload in mind tho
04:47nowallchanged: a little elaboration on the technique: 256+115+72=443 where as base is 256 index 115 and value 72, similarly 256+120+70=446, so 443-115-256=72 512-72=440, now where 26 and 16 come from, 187 and 190 come from 443-256 and 446-256, where as 184 and 186 come from 1024-840 and 1024-838, now also 26 and 16 is 443+397 i.e last inverse bounded index is 883 and 446+392 838, so 883+115 subtract
04:47nowallchanged: from 1024 is 26 1024-888-120 is 16, so it's in place algorithm, now constants , all arithmetic is done by those . It's pseudo code but many opportunities on how bisg banks as wished, lot more compact for wire protocol than var-uint.
04:57nowallchanged: distance is another term and that comes as 512-443 is 69, the method is very simple, there is also -69, and the vector procedures work without flow control, you build a performant compute alus this way, but this is all very long story, the compute paradigm works intact of data access . This paradigm was the last i worked on, it is for filesystems.
04:58nowallchanged: and this all is for ensurance that the world of science did not make any mistakes on the number systems or numeral systems, it's the most brilliant way to present digits as it is done now.
05:05nowallchanged: if there is a need , i explain how all the system works by specification, commodity hw is very very performant and low-power capable from sw tweaks.
05:13nowallchanged: you'll get a clue very quick from those talks, but artemis project to visit the moon , would give you glimpse as to how powerful ar drones and such with that tech, and as to how the universe really works and radio waves and such, unfortunately estonians do not have air force, swedes and finnish have though, last are our ethnic groups just like us very close, same clan.
05:30nowallchanged: it's literally yes the same tribes which came from central europe and dislocated to this area, had a punch of wars with other nations which today do not exist, because we finished them off in the region, which is also quite brutal and perhaps does not fit to modern age anymore.
05:30nowallchanged: bunch
05:37nowallchanged: And joss is computer systems engineer the abuse from my dad and their friends made me even wiser if not stronger.
05:39nowallchanged: At the moment with oak fragments or javel we can not go against russians so lots of new military establishing is needed , and a lot of it comes down to electronics too, other part is explosives and air force and such.
05:41nowallchanged: it took 40 years to replace the the guns in the army, now they are as lethal as possible
05:41nowallchanged: those are infantry guns
05:55nowallchanged: so technically no more talks by is planned in front of you, i am busy to cover my micro debts as well as to recover form injuries, and i do program a bit too, linux systems has been feeding me well, but there is modernizing needed to be started , otherwise linux is ok, but this bloated method set it contains is only a start , package managents are solid, and work is not too complex.
05:58nowallchanged: the shortcomings of software and very inflexible hardware like r300 as well as nv34 and gma945 that i had covered now, just taught me much, i could tell right away that there must be a way to run them well, but the puzzle lasted for decades as to how exactly
06:02nowallchanged: the code materilizes very quickly now when moores law is dead officially, it's only a matter of time when you also start to look back to clean up the old next to new systems, we are doing fine there entirely.
06:03K900: My guy no one cares please just stop
06:04nowallchanged: yeah i stop, you are a monkey i have no time for anyways
06:05nowallchanged: it's an official brain fart and sausage jam you are at , if you come near my locations again i fire bullets
06:06nowallchanged: i treat all of you with respect you deserve and this respect is like for laura keskinen you get crippled entirely
06:07nowallchanged: good bye, never in my hotels you come to scam again, you got it? we invested million dollars to there, and business is busted cause of such people. But the idea was not mine, those dad friends are as retarded as you.
06:08nowallchanged: cheers.
06:09K900: Wow I can't believe that actually worked lol
06:21Ermine: hopefully
06:50airlied: dwfreed: ^
07:30kode54: wow, a /24 from kazakhstan this time
09:51dolphin: airlied, sima: Sent the drm-intel-next-fixes
12:36alyssa: glehmann: do you know off hand how ishl.nuw and ishl.nsw are defined in NIR?
12:37alyssa: i'm trying to fix my address mode optimization, the problem is that the hardware does extend-then-shift but GL things will do shift-then-extend. those are equivalent if you know that the shift won't overflow (because the API level buffer is <4GiB)
12:37alyssa: so I think I need to teach glsl-to-nir to annotate things as .nuw or something
12:39alyssa: ditto for imul.nuw I guess
12:39alyssa: lower_uniforms_to_ubo, for example, does a 32-bit imul_imm which is... difficult to see through
12:45Venemo: alyssa: what kind of tcs lowering is there in dxvk / vkd3d-proton? I was unaware of that.
12:47Venemo: alyssa: I think the main benefit of such an optimization would be to games that use tessellation for primitive culling. it would reduce memory accesses done by VS+TCS, except for those that are not used for calculating the tessellation levels, and would allow us to use AMD's "shortcut" instruction to set all tess levels to zero or one instead of a memory write, for tess level outputs.
12:47Venemo: I think that apps / games that cull a lot of primitives using tessellation, would be able to do so faster with this optimization
12:52glehmann: alyssa: nsw/nuw in NIR are the same as SPIR-V. For ishl this means undefined behavior if any of the bits shifted away are non zero (for nuw)/not the same as sign bit (fow nsw) , or if the sign bit of the source and result are different for nsw
12:52alyssa: Venemo: I haven't looked at the dxvk/vkd3d-proton side, but hull shaders are split up in HLSL so the lowering is happening either in proton or in microsoft's compiler. and jenatali's comments imply the former
12:53alyssa: Venemo: makes sense. is that a common use of tessellation? sorry I just started working on AAA titles, like, Tuesday.
12:54alyssa: glehmann: cool, thanks. that should work
12:54alyssa: not looking forward to plumbing nuw bits throughout the GLSL stack but it's ... fine
13:00glehmann: why would the GLSL shift be nuw?
13:01Venemo: alyssa: I haven't verified that myself, but it sounds like that is a use of tessellation, yeah, unless I misunderstood what mareko was saying.
13:03alyssa: Venemo: alright
13:04alyssa: glehmann: because when glsl-to-nir is emitting load_ubo's and things, it knows that the UBO/SSBO is less than 4GiB
13:04alyssa: (unlike spirv-to-nir, grown up APIs allow bigger buffers potentially)
14:04glehmann: isn't that done in nir_lower_io, not glsl to nir?
14:10mareko: alyssa: Unigine Heaven culls patches that are outside the viewport by setting tess levels to 0, AMD in the past have advised game developers to cull such patches in TCS
14:14alyssa: glehmann: yeah, I guess so
14:14alyssa: more knobs (:
14:14alyssa: mareko: gotcha. makes sense, thanks
14:14alyssa: yeah, such a pass seems reasonable then
14:15mareko: alyssa: TCS outputs are passed to TES via memory; if TCS outputs are written for culled patches, that's wasted bandwidth
14:23alyssa: sure
14:57alyssa: vectorizing bounds check t_t
18:30karolherbst: do we have a pass somewhere which makes image load/stores to operate on a vec4 for the pixel data?
18:32karolherbst: mhh actually, I only need it for image_load
18:39mareko: karolherbst: what do you mean?
18:40mareko: image loads always return vec4
18:42zf: Hi! Does anyone know how I can get an equivalent of glXSwapBuffersMscOML() + glXWaitForSbcOML() for Vulkan?
18:43zf: VK_KHR_present_wait seems kind of like the right thing, but it seems very vague about when the wait is signaled, and the best language I can find suggests that it's signaled at "first pixel out" which is too early.
19:31jenatali: I think the people here probably don't have much overlap with our normal announcement channels, but this one specifically is probably relevant: Microsoft publicly announced that D3D is switching from DXIL to SPIR-V going forward
19:32mattst88: wow, interesting
19:34zf: (Perhaps I should bring my question to a mailing list, or somewhere else?)
19:36glehmann: jenatali: structured or unstructured spir-v?
19:36zamundaaa[m]: zf: what timing do you want, if not first pixel out?
19:36jenatali: glehmann: I seriously hope structured...
19:37zf: zamundaaa[m]: Last pixel out, I believe? Basically what I want to do is be able to XCopyArea() on the window's drawable and make sure I'm picking up the frame I just presented.
19:38zamundaaa[m]: Why would you use X11 to copy the window content when Vulkan can do that much more efficiently?
19:41zf: The way things are set up I can't just create a swapchain for the drawable I'm trying to copy to
19:42zf: Is that going to be necessary? It's not with GLX_OML_sync_control, but if it's necessary for Vulkan then I'll have to rework a lot of things
19:43zamundaaa[m]: I don't know too much about X11, but in any sanely designed API you wouldn't have to time a copy request exactly in between two frames to get it right
19:43zamundaaa[m]: Thst would be a very racy thing to do, to put it mildly
19:44zf: We're waiting for the first frame to complete before doing another present/swap
19:45zamundaaa[m]: If you wait for last pixel out, you'd very reliably render at half the refresh rate of the display with that
19:47alyssa: jenatali: spicy.
19:47zamundaaa[m]: Most likely what you actually need is first pixel out, and present with mailbox or fifo, so that the driver only swaps buffers for the next frame and not earlier
19:48zf: That's not quite the problem we're trying to solve unfortunately
19:50karolherbst: mareko: yeah.. except for OpenCL C image loads on `image2d_depth_t` which have scalar return values
19:51karolherbst: and they are also scalar in spirv
19:51karolherbst: but if rusticl is the only thing having to deal with it, I guess I can write the lowering, I was just wondering if anybody already had to
19:53karolherbst: jenatali: great news :)
19:54zf: This is for Wine. We're trying to do something that seems weird and roundabout, in creating a swapchain on an offscreen window and manually compositing it to an onscreen one. This is because Windows lets you create swapchains in certain circumstances that can't be translated directly
19:55zf: In this case we implement wglSwapBuffers() over glXSwapBuffersMscOML() + glXWaitForSbcOML() + XCopyArea()
19:55zf: If we need to virtualize the entire swapchain, well, so be it, but I was hoping that Vulkan would let us do what GLX does
19:58zf: I don't understand why, but I believe the GLX path lets us present at the original refresh rate? Note that we don't actually need the image to be "visible", we just need X11 to be able to see it, so it doesn't copy the image from the last frame
19:58zf: OML_sync_control has clear language that specifies pretty much this. I don't know what language if any Vulkan has for this
20:05linkmauve: zf, how are you going to make that work on Wayland? Wouldn’t it work on both to export the dmabuf you just rendered, map the buffer, wait until the sync thingy is signaled, then do the copy manually?
20:08zf: I haven't looked into Wayland, but I can say we don't necessarily need CPU access ourselves. Is that going to be functionally equivalent to XCopyArea()?
20:19zf: i.e. I don't know if XCopyArea() is itself going to do a download
20:21linkmauve: I don’t know anywhere near enough about Xwayland to be able to answer that.
20:28zf: If we have to download the image just to upload it again, that sounds a lot worse in theory, if not also in practice. This is of course a compositing problem, but...
22:45JoshuaAshton: zf: XCopyArea just results in glamor GPU work