00:41 berberlightning: it's simply bullshit what you do, and it's like even though i said it's bullshit and described how big cow shit you do, it's like you keep on doing it, where as i already showed how to do everything properly the dependencies are handled fine, cause length is invariant it is taken from the same next power and assembled as allocation based of value and index, so let's do it with numbers
00:41 berberlightning: then, so the fake dependencies have to work, or pseudo deps is a batter name, what happens is you add a pc compressed identifiers sequence to the hash which has a size 6k the whole data bank of every variable is only 6thousound decimal digits, you access that from the hash by adding it to the memory location as sum dudes this is only less than two bytes, two bytes max value is 65536
00:41 berberlightning: units, and it has all the pc's enumerated in as to where to query the lengths and it does it multiaccess , so all lengths are queried together, the assembly of such bank and all the compilation process is extremely fast, the runtime is extremely fast and low power as well, the battery would not last for 7 days like linuses old employer transmeta promised, but with such performance as
00:41 berberlightning: you offer now it would last for years. Of course it's a body blow for some companies, but however for safety reasons me and tom know that amd can not expose their clockless domain, we would beam eachother into ashes soon enough, i simulated all the blocks of miaow and opened the gtkwave diagrams timeline toggled in and out , amd knows what tom and me know, it can not be exposed for
00:41 berberlightning: safety reasons. Because of that i made a new way, you can not directly use my code to emit waves which harm people, i call them supersonic, ultrasonic whatever is the spectre, cause clockless can do that , and i still have those molds in all over my body when they emitted such waves on me, correctly focused wave will crush your heart just like this and only wireless antenna is needed
00:41 berberlightning: which is equipped with any modern board. they emitted both versions to me, short burst to my heart for demonstration as well as the backteria culturing out of spec wave, which made me very sick, it was some military battleground, yyeah i nearly died but came out ontop once again eventually recovered through the miracle, what they did was hope for last minute that i die, delivering the
00:41 berberlightning: needed energy from the sky with lasers which made floods to happen on my location, room was full of water, and i was very sick, however i am truely special in tissue strength, and such people are all donors in this world, cause they are labeled through monester medicine mob, and they chip you without your approval, to take all the substances out as to what they need for vaccines etc.
00:41 berberlightning: Though the active low does not permit that, the reptilians do not care, however i am not an alien those traits come genetically mutant ways, and this is no going back several centuries like noble prize fraud winner svande pääbo believes. It happens cause of military testings and science of mass destruction weapons. For an example the energy of nuclear bombs wave goes three times
00:41 berberlightning: around all of the earth, and the so called mushroom where there is enough proximity and that is what my location of granny was when my dad was born, very stramge things will happen one generation forward, and me as twin dominant got +4 in kellars table, and such are so called gods or dieties that nowdays are used as cows due to result fixings. My brain scan is alive as hell, it can
00:41 berberlightning: lit up all the darkness, but they injured my joints and i have had issues, its so active cause my skull is thicker than usual and i have pronounced forehead, brain cells are more protected and in bigger vacuum, so you just learn how to master your head, if severely milked and injured and broke off guy can do it, you can do it too. My reflexes as well as instinctive and fast thinking
01:22 DemiMarie: dwfreed: banhammer?
01:22 airlied: already done
02:05 alyssa: how it started: I'm going to work on agx for a few minutes
02:05 alyssa: how it's going: this is a fundamental bug in core NIR affecting every driver
02:05 alyssa: :clown:
03:12 idr: alyssa: I looked at that issue. That's... misery.
05:43 glehmann: how do I know if this is an acceptable change for the intel trace CI? https://mesa.pages.freedesktop.org/-/mesa/-/jobs/60404969/artifacts/results/summary/results/trace@gl-intel-kbl@ror@ror-default.trace.html
05:43 glehmann: there is a visual difference, but is it a bug or just a unrelated minor precision change?
05:46 glehmann: also the "Calculating the difference..." doesn't seem to work and the more info link is a 404
07:00 sima: tursulin, is the drm-intel-gt-next-2024-06-12 PR the one that didn't deliver to airlied's inbox?
07:07 airlied:doesn't have an inbox anymore :-)
07:07 airlied: but PR's need to get to lore
07:08 airlied: and match q = (s:\"pull\" t:\"airlied\" d:last.week..)
07:08 airlied: if I take a week of my plan fails :-P
07:09 tursulin: sima, airlied: I did not know it did not get delivered but someone just pinged me that it was missing
07:10 tursulin: https://lore.kernel.org/dri-devel/Zmmazub+U9ewH9ts@linux/
07:11 airlied: I have that in my lore list, I might have failed to process it on my end
07:13 sima: tursulin, hm I thought you've mentioned something that one of your pr bounced on airlied's gmail
07:13 sima: and thought maybe it was this one
07:13 tursulin: hmm don't think that was me
07:15 airlied: and it's all fine if a PR does bounce on my email, since I used to use patchwork and now I use lei to pull them
07:16 airlied: I think I lost it because I was hoping around drm-misc-next pulls to make sure v3d built without warnings and jumped over it
07:21 airlied: pulled it into my local tree now
09:10 jfalempe: I sent a patch for review, that affects multiple subsystem https://patchwork.freedesktop.org/series/135356/
09:11 jfalempe: But it was taken in akpm-mm tree, and is now in linux-next/master, causing some build failure.
11:54 Hazematman: Hey all, I've been working on an MR to improve llvmpipe & lavapipe android support to work without kms_swrast, as well as improve the mesa documentation for android to include an out of tree build into an android image. I would appreciate any feedback on my MR https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344 :)
12:15 alyssa: idr: (:
12:26 FL4SHK[m]: So if I write my own Vulkan driver, do I still benefit from a Mesa backend?
12:46 alyssa: yes
12:55 FL4SHK[m]: neat
12:56 FL4SHK[m]: where does the integration with Mesa come in?
12:56 FL4SHK[m]: I'm a little confused about that
12:57 FL4SHK[m]: I have a GCC port I've written most of for the CPU/GPU (they will have similar instruction sets) I'm developing
12:58 FL4SHK[m]: can I combine this with Mesa
12:58 zmike: DavidHeidelberg: are you back now?
12:59 Hazematman: FL4SHK[m]: Checkout `src/vulkan` in mesa, there is a bunch of common code in the folder to help implement a lot of vulkan functionality. In your own driver you would call functions from that module.
12:59 Hazematman: Additionally you can look at the shader compiler backends for different platforms to see how they use common code for shader compilers in mesa. That kind of goes outside vulkan, but the is a lot of common code in `src/compiler/nir` for example.
12:59 FL4SHK[m]: I see
12:59 FL4SHK[m]: thanks
13:00 FL4SHK[m]: can I make use of my GCC port?
13:05 Hazematman: <FL4SHK[m]> "can I make use of my GCC port?" <- Not exactly sure what you're doing, but I assume you have a GCC port for a different cpu arch that you want to use in a cross compiler fashion? In that case yes, you just need to setup a cross compiler environment with meson. You can read this for instructions, but you should be able to setup a cross file that points to your custom gcc and use that to build mesa
13:05 Hazematman: https://docs.mesa3d.org/meson.html#cross-compilation
13:05 FL4SHK[m]: so I have a GCC port for the GPU
13:05 FL4SHK[m]: that's what I'm getting at
13:06 FL4SHK[m]: I'm doing something a bit different for my CPU/GPU design
13:06 FL4SHK[m]: like I mentioned they will run similar instruction sets, though with some modifications if necessary
13:07 FL4SHK[m]: I can compile regular C/C++ code with the GCC port targeting the GPU
13:09 Hazematman: In that case I'm not exactly sure it would be as useful here... Mesa has its own compiler infrastructure, since vulkan expects SPIRV and OpenGL expects GLSL or SPIRV. For drivers in Mesa the common code typically will take SPIRV or GLSL, compile it into Mesa's IR which is called NIR, and then the drivers will ingest NIR and convert it to native machine code for their architecture.
13:09 Hazematman: So i'm not sure what exactly you're hoping to achieve by making use of Mesa if you have your own compiler infrastructure already.
13:09 Hazematman: There is common code for handling things like sync, swapchain, wsi, etc in Mesa that might be of interest to you
13:11 FL4SHK[m]: so, I could just use the GCC port for running regular C/C++ on the GPU, and also write a Mesa port.
13:11 FL4SHK[m]: what I want is to be able to run Vulkan and hopefully OpenGL on the machine
13:12 FL4SHK[m]: the whole system is hopefully going to be a many-core machine where CPU/GPU are hooked up via an interconnect inside the FPGA
13:12 FL4SHK[m]: I am unsure how many cores I can fit.
13:13 FL4SHK[m]: I have 500k logic cells but none of those have to be used to implement any kind of RAM
13:13 FL4SHK[m]: so there's a lot of space for cores if I keep them simple. One core might be 1000 or so logic cells
13:13 FL4SHK[m]: maybe more like 2000
13:14 FL4SHK[m]: oh, wait a minute
13:14 FL4SHK[m]: no, the cores will be more than 2000 logic cells
13:14 FL4SHK[m]: and there'd be fewer of them
13:15 FL4SHK[m]: the 2000 logic cells number is from if I didn't include vector instructions
13:15 FL4SHK[m]: but I want to include vector float instructions at least
13:15 FL4SHK[m]: 32-bit floats that is
13:19 Hazematman: Sounds like an ambitious project 😅 just an FYI then if you want to run generic OpenGL or Vulkan application you'll need to be able to ingest GLSL or SPIRV, which I don't think GCC supports. So at some point you'll need to look at modifying GCC to support that or building a new compiler that can handle those.
13:20 FL4SHK[m]: well
13:20 FL4SHK[m]: yeah, it's an ambitious project
13:20 FL4SHK[m]: haha
13:20 FL4SHK[m]: the GCC port wasn't too bad
13:20 FL4SHK[m]: took a couple months
13:20 FL4SHK[m]: it's a long term project, mind you
13:20 FL4SHK[m]: at least I'm not writing an OS too
13:21 FL4SHK[m]: so I'll go ahead and write a SPIR-V compiler for the system
13:21 FL4SHK[m]: I heard that you can just do a basic translation from SPIR-V to the GPU instruction set
13:21 FL4SHK[m]: which is an assembly-to-assembly transpile
13:22 FL4SHK[m]: I might be oversimplifying it
13:22 DemiMarie: Is similar CPU and GPU ISAs a bad idea?
13:23 FL4SHK[m]: possibly
13:23 karolherbst: yes
13:23 FL4SHK[m]: I'm gonna try it
13:23 karolherbst: GPUs by definition generally don't need vector instructions, because that's implicit in how they run things
13:23 DemiMarie: I suggest going with a conventional design.
13:24 FL4SHK[m]: what I was going to try is hooking up a lot of small cores
13:24 FL4SHK[m]: since that's a novel idea mostly
13:24 FL4SHK[m]: apparently it's been tried with x86
13:25 FL4SHK[m]: in the 2010s
13:25 DemiMarie: FL4SHK: tried and failed
13:25 karolherbst: yeah, and it was a bad idea
13:25 FL4SHK[m]: why did it fail?
13:25 karolherbst: because running the same CPU code on a GPU with the same ISA is a wrong promise
13:26 karolherbst: I suggest reading this series of blog posts: https://pharr.org/matt/blog/2018/04/18/ispc-origins
13:26 karolherbst: as it covers this topic explicitly
13:32 DemiMarie: FL4SHK: There are three areas in which I would like to see something new.
13:32 FL4SHK[m]: tell me
13:32 FL4SHK[m]: my design is absolutely not finalized right now
13:33 FL4SHK[m]: I coul djust go for the manycore thing for the CPU
13:33 DemiMarie: The first is fault isolation: If a context faults or must be reset, the GPU should guarantee that other contexts are not affected.
13:35 DemiMarie: The second is bounded-latency instruction-level preemption of all hardware, including fixed-function. This means that even a malicious shader cannot prevent itself from being preempted in a bounded amount of time, allowing the GPU to be used in real-time systems.
13:35 alyssa: FL4SHK[m]: for mesa you really want to write a nir backend
13:35 alyssa: it's not hard
13:36 DemiMarie: The third is security isolation: the GPU should guarantee that information cannot be leaked across contexts.
13:37 FL4SHK[m]: alyssa: sounds good
13:38 FL4SHK[m]: I sitll want to hook up the GPU cores directly to my on-chip interconnect
13:38 FL4SHK[m]: inside the FPGA
13:39 Hazematman: DemiMarie: Isn't this mostly covered by hardware GPU contexts and per context virtual memory. Or there something I'm missing?
13:39 FL4SHK[m]: that's what I thought too
13:39 FL4SHK[m]: per-context virtual memory is like virtual memory per-process on a CPU right?
13:40 Hazematman: 1 & 2 would be a game changer for real time GPU usage especially in SC systems
13:41 Hazematman: FL4SHK[m]: yeah the same concept but applied to GPUs. A graphics context will have its own virtual address space and typically the kernel driver is responsible for mapping GPU virtual memory to physical memory
13:43 FL4SHK[m]: I see. Also, from the looks of it, the vectorization stuff that failed for LRB was for general purpose code right?
13:43 FL4SHK[m]: not as much for the shader code maybe?
13:44 karolherbst: FL4SHK[m]: the point is, that once you rely on auto vectorization for performance you are kinda screwed
13:45 FL4SHK[m]: even for shader code?
13:45 karolherbst: that's why GPUs are generally SIMT rather than SIMD
13:45 FL4SHK[m]: I see
13:45 karolherbst: so the ISA looks all scalar, but implicitly the same instruction is ran on multiple threads/lanes/whatever you want to call it
13:45 FL4SHK[m]: gotcha
13:45 FL4SHK[m]: I can do that
13:45 karolherbst: and you either explicitly or implicitly manage thread masks
13:46 karolherbst: like e.g. on nvidia each instruction can be predicated to turn it of for the current thread
13:46 karolherbst: but it still executes on other threads with the predicate being true in the same warp/subgroup
13:46 FL4SHK[m]: can I at least make the instruction set similar to the CPU in some ways?
13:46 FL4SHK[m]: like let's say I use some of the same instruction encoding
13:46 FL4SHK[m]: for stuff like, ALU ops
13:47 karolherbst: yeah, the issue isn't the ISA being the same, just often a GPU ISA is more specialized so you might as well
13:47 FL4SHK[m]: I see
13:47 karolherbst: but you could have certain instructions (e.g. vector ones) only work in the "CPU" mode
13:47 karolherbst: and in the GPU mode, scalar instructions just execute in an entire subgroup
13:47 FL4SHK[m]: I see
13:47 FL4SHK[m]: So that's neat
13:47 karolherbst: each thread still has their own registers, but there is also the concept of "scalar" or "uniform" registers which are the same in each thread of a subgroup
13:48 FL4SHK[m]: I'm going to have to study GPUs further
13:48 FL4SHK[m]: so SIMT is like... partially SIMD right?
13:48 FL4SHK[m]: partitioned SIMD?
13:48 FL4SHK[m]: you have multiple SIMD units
13:48 FL4SHK[m]: many of them
13:49 karolherbst: well.. more like implicit SIMD
13:49 FL4SHK[m]: that was what I was thinking of emulating
13:49 FL4SHK[m]: hm
13:49 FL4SHK[m]: I thought there were actual SIMD engines in the hardware; that was what I learned in school
13:49 karolherbst: like.. e.g. those x86 SIMD instructions with a lane mask map directly to predicated scalar instructions in a SIMT ISA
13:50 FL4SHK[m]: small SIMD engines
13:50 FL4SHK[m]: Hm
13:50 karolherbst: yeah.. but the main difference is, that the ISA is scalar
13:50 karolherbst: (and internal details)
13:50 FL4SHK[m]: I see
13:50 karolherbst: so.. because your ISA is scalar, you don't need auto vectorization to get great performance
13:50 FL4SHK[m]: that makes sense
13:51 karolherbst: and of course that also requires a different programming language e.g. glsl where you describe what each SIMD lane/SIMT thread is doing
13:51 karolherbst: instead of looking at the entire group
13:51 karolherbst: now a days anything vecN gets scalarized anyway
13:51 karolherbst: *nowadays
13:51 FL4SHK[m]: how do you get from a scalar ISA to telling the hardware what SIMD Lanes to use?
13:51 karolherbst: (except for load/stores which some hardware can actually do wide ones of)
13:51 FL4SHK[m]: somehow that has to be figured out
13:52 karolherbst: all lanes execute the same instruction
13:52 FL4SHK[m]: then how do you get the data for them?
13:52 cwabbott: I'd say that it *is* possible to compile a SIMT language to a SIMD architecture, as long as predication is competent enough, but it requires a completely different compiler architecture
13:52 lynxeye: the important thing is that you can switch between threads when one of them is blocked on memory, which allows you to hide memory latency without large caches and sophisticated prefetchers
13:52 karolherbst: FL4SHK[m]: you mean between threads in the same group?
13:52 FL4SHK[m]: I think so
13:52 FL4SHK[m]: you have to somehow specify where the data comes from
13:52 lynxeye: and that's a direct consequence of the programming model, which you won't be able to realize with a SIMD model
13:52 karolherbst: there are subgroup operations e.g. shuffle where you can move values between threads
13:53 cwabbott: AMD's architecture, for example, is basically a SIMD core with a few goodies strapped on
13:53 karolherbst: but normally each thread is just pulling the data it needs directly
13:53 FL4SHK[m]: cwabbott: ah, yeah, that's what I was reading
13:53 FL4SHK[m]: that AMD actually does have SIMD machines
13:53 FL4SHK[m]: well, SIMD cores
13:53 cwabbott: scalar registers are just normal registers, vector registers are SIMD registers, etc.
13:53 FL4SHK[m]: yeah that is actually teh model I was going to emulate
13:53 cwabbott: subgroup ops are vector shuffles
13:53 FL4SHK[m]: ooh
13:54 FL4SHK[m]: so then
13:54 FL4SHK[m]: is what karol was talking about for nvidia then?
13:54 FL4SHK[m]: or is it applicable to AMD as well?
13:54 cwabbott: just nvidia
13:54 karolherbst: the concepts are similar, just mostly different terms being used
13:54 karolherbst: or different models :P
13:55 karolherbst: though AMD manages masks explicitly, no?
13:55 FL4SHK[m]: that sounds like putting the masks into the ISA
13:55 cwabbott: yes, AMD manages masks explicitly
13:55 FL4SHK[m]: I'd be happy to emulate that
13:55 cwabbott: the point i was trying to make is, it is possible to have a different model in the hardware that's more explicit
13:55 cwabbott: and more SIMD-like
13:55 karolherbst: yeah, fair
13:55 cwabbott: but the *compiler* still has to be SIMT
13:56 FL4SHK[m]: oh, I see
13:56 cwabbott: i.e. the register allocator has to still be aware of what the original control flow is
13:56 FL4SHK[m]: so if I emulate AMD, I need to translate from SIMT to the ISA's more SIMD like arch?
13:56 FL4SHK[m]: in the compiler
13:57 cwabbott: yes, you would need to do that
13:57 cwabbott: for example, ACO does that when translating from NIR
13:57 FL4SHK[m]: okay, that sounds like something that could be a happy medium for my hardware design
13:57 cwabbott: but it also keeps a representation of the original control flow
13:57 FL4SHK[m]: I could still go with my original design idea?
13:57 FL4SHK[m]: I see
13:57 FL4SHK[m]: so if I port Mesa, can I still access that information in my backend?
13:57 cwabbott: and register allocates vector registers with that
13:58 cwabbott: so an existing backend that's written assuming a SIMD model would be mostly useless
13:58 cwabbott: it has to at least be aware of the higher-level representation
13:59 FL4SHK[m]: right, that's what I'm asking about. Can I access the higher-level representation from my backend?
13:59 cwabbott: mesa's IR (NIR) is explicitly only using the higher-level representation
13:59 FL4SHK[m]: ah, got it
13:59 karolherbst: I think "vector registers" are kinda a dangerous terms, because you still don't get a vector reg as you'd get on a SIMD x86 ISA, right? It's still a thread private register you get, you just encode "this operates on different values per thread" or did I misunderstand how that's working on AMD?
13:59 FL4SHK[m]: I thought AMD actually did use CPU-like SIMD registers
14:00 FL4SHK[m]: based upon their documentation I read
14:00 cwabbott: i'd say it's more like CPU SIMD registers
14:00 karolherbst: I think it depends on how you look at it
14:00 FL4SHK[m]: excellent, that's exactly what I wanted to hear
14:00 cwabbott: yes, it's sort-of a difference in semantics
14:01 FL4SHK[m]: then my question is, if I go with CPU-like SIMD registers in my ISA, can I emulate AMD's model?
14:01 karolherbst: if you look at the entire thing as a SIMD group, yes it makes more sense to say it's SIMD like, but if you look at it from a "single thread" perspective it kinda doesn't
14:01 cwabbott: yes, but beware that you do have to explicitly design things to "be nice" with the SIMT model
14:01 cwabbott: for example, 16-bit values have to go in the upper/lower half of 32-bit values
14:01 FL4SHK[m]: I am willing to change the hardware to better fit the SIMT model
14:02 FL4SHK[m]: since I do have control over the ISA and stuff
14:02 cwabbott: the "stride" for lack of a better word has to always be 32 bits
14:02 karolherbst: I think there were people having a common ISA on both sides, but it operated on SIMD lanes for GPU "code"
14:02 cwabbott: i.e. each vector lane must be 32 bits
14:02 karolherbst: and it was just a scalar ISA
14:02 FL4SHK[m]: oh, that's familiar to me
14:02 FL4SHK[m]: I don't mind making it 32-bit
14:03 FL4SHK[m]: so that would mean, you have 32-bit floats?
14:03 FL4SHK[m]: I was hoping to go with 32-bit floats
14:03 cwabbott: otherwise you get into a world of hurt if you try to use the higher-level control flow in your compiler
14:03 FL4SHK[m]: since those will use less hardware
14:03 FL4SHK[m]: oh
14:03 karolherbst: GPUs normally don't encode the "type" in registers, they use the same registers for int and float operations
14:03 karolherbst: but yeah.. they are most of the time 32 bit wide
14:03 FL4SHK[m]: Okay well I'll keep that in mind
14:03 FL4SHK[m]: right actually I was thinking of doing that as well
14:03 FL4SHK[m]: already had that idea
14:03 FL4SHK[m]: for the scalar stuff as well
14:04 FL4SHK[m]: and for the CPU as well :)
14:04 FL4SHK[m]: I need to write up a version of my ISA spec for a 64-bit version of the CPU
14:04 FL4SHK[m]: or just modify the existing one
14:04 karolherbst: I think the entire reason it's split in x86 is because of legacy
14:04 cwabbott: GPUs tend to use the same cores for int and float operations
14:04 FL4SHK[m]: I see, that makes sense
14:04 cwabbott: because integer stuff just isn't as important
14:05 karolherbst: nvidia has explicit int units these days
14:05 FL4SHK[m]: I've heard that before too
14:05 karolherbst: but yeah...
14:05 cwabbott: that naturally leads to using the same registers, whereas CPUs have the opposite tradeoff
14:06 karolherbst: but nvidia is weird anyway
14:06 karolherbst: if you do a float op on a result of an int alu, you need to wait one cycle more
14:06 karolherbst: vs float -> float or int -> int
14:12 mattst88: karolherbst: how do texture operations handle sources where some arguments are floats and some are ints? do you have to move the ints to the float register file?
14:12 karolherbst: mattst88: it's all raw data
14:13 karolherbst: the instruction is responsible for interpreting the data
14:13 mattst88: right, but can the texture operation take some sources from the float reg file and some from the int file?
14:13 karolherbst: float vs int regs don't exist
14:13 karolherbst: it's all registers
14:14 mattst88: oh, they use the same register file. it's just that there's a different ALU unit and some additional latency when moving results from the int ALU to the fp ALU, etc
14:14 karolherbst: that's why NIR is also entirely untyped, because it just doesn't really make sense to have typed registers
14:14 mattst88: yeah, gotcha
14:14 karolherbst: mattst88: correct
14:14 karolherbst: though on nvidia it's all werid, because the scoreboarding/latency is done at compile time
14:14 karolherbst: so the compiler has to know those rules
14:15 karolherbst: and results just appear in a register at some defined time
14:15 mattst88: yeah, makes sense
14:15 karolherbst: (which also means, that an instruction executed later can actually clober the input of a previous instruction)
14:17 mattst88: so NVIDIA has to do the software scoreboarding stuff in the compiler, like recent Intel GPUs?
14:17 mattst88: and presumably has had that for much longer?
14:20 karolherbst: yeah, it's quite old
14:20 karolherbst: they experimented with that in kepler, but made it a full requirement with maxwell
14:20 karolherbst: so like over 10 years roughly?
14:21 karolherbst: it's quite complicated really. There are also instructions which read some inputs 2 cycles later and stuff 🙃
14:26 mattst88: makes sense
14:26 alyssa: cwabbott: I assume you've seen https://github.com/ispc/ispc ?
14:27 mattst88: it usually takes at least 5 years for some innovation in nvidia gpus to show up in intel gpus :)
14:27 alyssa: mattst88: moof
14:27 alyssa: mood
14:27 cwabbott: alyssa: yes, and I assume that because it just uses llvm under the hood the codegen for more complicated things is bad
14:28 karolherbst: cwabbott: it doesn't rely on an auto vectorizer
14:28 alyssa: ^
14:28 cwabbott: yes, I know
14:28 karolherbst: it's a custom language translated to SIMD
14:28 alyssa: afaik it's just the ACO thing
14:28 karolherbst: at least it seems to perform better than auto vectorizers :D
14:28 cwabbott: but by the time it hits the backend, the higher-level information is lost, no?
14:29 karolherbst: it emits SIMD instructions/intrinsics directly afaik
14:29 alyssa: oh you mean for RA and stuff.. yeah, presumably
14:29 cwabbott: so it becomes a soup of predicated things where RA can't do a good job
14:29 karolherbst: so not sure why it would matter?
14:29 karolherbst: ahhh
14:29 karolherbst: yeah, that could be
14:29 cwabbott: imagine you have a loop, oops everything conflicts with everything else because it's all predicated
14:30 cwabbott: that sort of thing
14:30 karolherbst: though the question is: what's your alternative? use OpenMP declarations or rely on the auto-vectorizer?
14:30 cwabbott: yeah no, there's no good alternative for CPUs, you'd need to write a different backend from scratch to do it properly
14:31 karolherbst: yeah..
14:31 karolherbst: I think ispc is probably good enough of a solution here without reinventing everything
14:31 karolherbst: anyway, fascinating project nontheless
17:06 DavidHeidelberg: zmike: still around the world trip.. just occasionally doing something until I'll start at some new job :)
18:08 gfxstrand: jenatali: Thinking about WDDM in mesa "for real" and I'm not sure I want to have a hard dependency on libdxg. Thoughts about marshalling things?
18:09 jenatali: Marshaling?
18:09 jenatali: In WSL you can rely on those entrypoints being available in the distro in a libdxcore.so, and in Windows they come from gdi32.dll
18:10 gfxstrand: Yeah but let's say Ubuntu is going to ship Mesa with WDDM enabled
18:10 gfxstrand: Does that mean Ubuntu also ships libdxg and it just doesn't do anything?
18:10 gfxstrand: I guess that's probably fine. It's tiny.
18:11 jenatali: That's an option, but you could also operate like we do with the d3d12/dozen driver, which ships enabled in Ubuntu AFAIK, and just dlopen
18:11 jenatali: If libdxcore.so isn't there at runtime then you don't have WDDM anyway
18:12 gfxstrand: Yeah, but then we have to dlsym everything
18:12 gfxstrand: That's kinda what I was asking for thoughts on
18:12 jenatali: Not necessarily, can't dlopen promote things into the global namespace?
18:14 gfxstrand: Maybe we can with weak symbols of some sort
18:14 jenatali: You'd have to allow unresolved symbols at link time though I guess for that to work
18:14 alyssa: are Windows uapis stable ?
18:14 jenatali: Yes
18:14 alyssa: dang
18:15 jenatali: All APIs provided from any Windows DLL, whether it's a kernel-accessor API or just strictly usermode, is stable once it ships in a retail OS
18:16 gfxstrand: the pPrivateDatas, though, are anyone's guess.
18:16 jenatali: Right, those are generally not considered stable
18:17 jenatali: We require UMD and KMD to match because vendors have refused to commit to making those stable...
18:17 gfxstrand: Yeah, they're usually literally just a struct in a header in a perforce tree somewhere
18:32 DemiMarie: jenatali gfxstrand: does that mean that Mesa can be used as the UMD of a WDDM2 driver?
18:32 jenatali: If the driver actually has a stable KMD interface, yeah
18:32 jenatali: Or if the vendor is shipping Mesa as their UMD along with a matching KMD
18:33 gfxstrand: DemiMarie: Yes, in theory
18:34 DemiMarie: jenatali gfxstrand: Use-case is GPU acceleration in Windows guests on Qubes OS, which will require virtio-GPU native context support. That requires Mesa as the UMD and a proxy as KMD.
18:35 DemiMarie: So the KMD interface would just be a proxy for the host’s KMD.
18:36 DemiMarie: gfxstrand: What about in practice?
18:37 gfxstrand: DemiMarie: https://cdn.masto.host/mastodongamedevplace/media_attachments/files/112/679/919/407/927/273/original/007121e62bc6af56.png
18:38 DemiMarie: gfxstrand: what point are you trying to make?
18:38 gfxstrand: I have it working, more-or-less
18:38 gfxstrand: Whether or not we'll ship remains to be seen
18:40 DemiMarie: gfxstrand: in what context are you considering shipping it?
18:40 gfxstrand: Unclear
18:40 gfxstrand: Right now I'm mostly concerned with proving it possible
18:41 DemiMarie: Interesting.
18:41 DemiMarie: Which KMD are you using?
18:42 gfxstrand: The one AMD ships
18:46 Kayden: radv on windows? nice
18:47 mattst88: that's pretty amazing
18:50 zmike:cackles maniacally
18:57 Ermine: Wowsers
18:57 feaneron: nice
19:01 soreau:blinks
19:02 ccr: "the science has gone too far."
19:07 alyssa: we do what we must, because we can
19:12 DemiMarie:wonders why one would not just use AMD’s UMD on Windows
19:13 gfxstrand: Because they're crazy!
19:15 gfxstrand: Or because daniels bet her $100 she couldn't do it. :D
19:15 alyssa: psh, she can do anything
19:16 Sachiel: silly daniels making that bet when $1 would have been enough
19:19 daniels: gfxstrand: plus the cost of my time to work out PPM encoding so I could get that awesome logo on vkcube
19:21 daniels: also the shipping
19:25 gfxstrand: Okay, now I have semi-competent device enumeration such that RADV doesn't try to open my NVIDIA card
19:35 alyssa: what could go wrong with that
19:59 DragoonAethis: gfxstrand: can I bet you another $100 to make it go the other way around? :D
20:00 gfxstrand: What do you define as the other way around?
20:00 DragoonAethis: Windows Vulkan blobs on amdgpu on Linux
20:00 airlied: that's what amdgpu-pro is
20:00 gfxstrand: Yeah, no
20:00 gfxstrand: Also that
20:00 airlied: I'll take the $100 now :-P
20:00 DragoonAethis: that's cheating ;p
20:00 airlied: https://www.amd.com/en/support/linux-drivers :-P
20:05 soreau: shoulda stuck at a dollar
20:06 DragoonAethis: Should we meet at XDC or something, I'll owe you a beer
21:15 agd5f: gfxstrand, the WSL support in ROCm works that way. Basically a different ROC runtime which converts the KFD calls to dxgi calls
22:03 zf: oop
22:03 zf: sorry, wrong channel