03:32 veryniceUSandA: I told you, i had certain type of math subjects in my secondary school, combinatory and permutations, the strongest tech school in my country. 32bit variable has 1024 different bit combinations, I Know that, the data is very easy to pack. The register file or in case MILL gpu the belt takes care of things, last control lines of arbitary alu write, will allow in hardware to write the bookkeeping carry, so how the successive write will derive upto
03:32 veryniceUSandA: 4.2billion values , based where your writing arithmetic operation left the crry you know, you do not see this in verilog, it is schematic or netlists, access is done on the register file via control lines alu write or read, access is based of adder kindof, since twos complement hw access sums the bits together as powers of twos.
03:34 veryniceUSandA: why the packing is neccessary is that , if you do not do that, data cache is not being preloaded, and you have a memory to data cache bottleneck
03:35 veryniceUSandA: this cuda/opencl random number generator and kogge stone link, just starts to measure before host copies the memory to the cache.
03:35 veryniceUSandA: opencl can yeah when you use large workgroups have very fast pipeline naturally, even if the underlying instructions are dependent.
03:36 veryniceUSandA: if they are added to workgroups, the bases are forwarded all in a single clock edge, as seen from miaow dispatchers code.
03:43 veryniceUSandA: the corona pandemic is a human made scam, it is meant to be either an attack against jews who rule the world by chinease, or the other way around, attack by jews again for them to be in the ruling position, because banking system can be collapsed during a single day, to go over to cryptocurrency collaterals.
03:47 veryniceUSandA: such an attack is being very afraid of justifyingly , not yet this time by me though, but every country can have biological wheapons really really, even if they lack communication satellites like our clowns who just lie about things that they have a brain and they sent estcube to the outer space, all real forces allready have global positioning systems and nukes and whatnot.
04:01 veryniceUSandA: that actual capabilities of a computers calculating compute chip with alus is so badass superior to our current generation is just common sense, it is a memory of the whole world evolution in earth crust, the resources are so much bigger than our brains energy wise or our given capabilities or muscle memory.
04:14 veryniceUSandA: and as i was saying fragment shaders do not make use of the dispatcher pipeline, this signal along with the bases will not come through, bases come through from pipeline registers though still and memory loads are vector based on gpus. It works on very full throughput once dispatcher is torned, but it can not load stuff from memory, which has to be in that case delegated to be done in the ringbuffer instead
04:15 veryniceUSandA: due to ringbuffer needing l2 to inst and data cache copies because of say vintage display lists and alike allready.
04:30 veryniceUSandA: And by the way i have a french friend here with masters degree, most french people i have seen very lovely people, he really understood my code within 2hours on of explaning done by me , and he filled in the algorithm on the paper.
04:31 veryniceUSandA: and this guy knows almost nothing about programming, he is just accurate with the numbers.
04:57 veryniceUSandA: And also cordic division algorithm i.e the respective nasa developed ieee1164 division is just a bullshit, maybe this is not your fault overall, today i make an intrinsic for the french match guy, the real one instead, there are many real implementations also possible
05:24 veryniceUSandA: This is also a known nation english or british whatever plus some high end jew noses, who bullshit others, why should nasa offer community support of their real technologies, when they want to rule the aviation run in outer space? Does that even make any sense?
05:27 veryniceUSandA: so ronald reagan organized trillions of dollars of printed money and technology to community so they would lose the position in the outer space program of cosmology, against say russian, does it make any sense?
08:26 AndrewR: .йгше
08:26 AndrewR: sorry
09:13 mripard: danvet: airlied: can we merge rc2 to drm-next so that we can backmerge it into drm-misc?
09:16 danvet: I was just about to ask airlied whether I should do that or he'll do it tomorrow
09:17 danvet: ascent12, emersion is there an xwayland mr for all your dma-buf hints work?
09:17 danvet: I have all the other pieces I think
09:17 emersion: hm, i don't think so, not yet
09:18 danvet: mripard, btw can you pls roll -fixes to -rc2?
09:23 mripard: danvet: done
09:23 danvet: thx
09:26 airlied: i already have next on rx2 locallu
09:27 airlied: gimme a min to see if can push from my phone
09:27 danvet: hm the backmerge is sitting on a nasty swiotlb conflict anyway
09:27 danvet: see the latest from sfr
09:29 danvet: mripard, ^^ atm we can't backmerge anyway :-/
09:33 airlied: danvet: can you rebuild tip, my ssh conn died
09:33 airlied: next should be on rc2
09:35 stuom1: my MIPI-DSI panel datasheet states in power-on sequence that "keep CLKN, CLKP, D0N, D0P, D1N, D1P in STOP state (LP-11)". Is this something a DRM panel driver should handle?
09:43 airlied: danvet: actually got to it again
09:58 mripard: danvet: tzimmermann is away at the moment right? Should I do the backmerge then ?
09:58 mripard: airlied: thanks!
09:59 tzimmermann: mripard, i'm actually here
09:59 danvet: mripard, atm nouveau doesn't compile too well without SWIOTLB
09:59 danvet: I'm not sure we care
09:59 danvet: I mean, care enough to hold up the backmerge
09:59 tzimmermann: i was AFK last week
09:59 danvet: since it's broken on drm-misc-next already
09:59 danvet: I'd say we do the backmerge
09:59 danvet: too much stuff blocked on it
09:59 mripard: tzimmermann: oh, great then :)
10:00 tzimmermann: mripard, but please go ahead
10:00 tzimmermann: i'm fairly busy today with catching up
10:05 mripard: tzimmermann: ack
10:05 tzimmermann: thanks
10:18 danvet: mripard, backmerge done and all compiled tested?
10:19 mripard: danvet: backmerge done, and currently compiling :)
10:29 danvet: mripard, it doesn't compile
10:29 danvet: nouveau is already broken in drm-misc-next :-/
10:29 danvet: könig didn't compile test enough it seems
10:32 danvet: mripard, I cc'ed you on some thread
10:33 danvet: I think simplest fix is to just add an include for limits.h to nouveau_ttm.c and call it done
10:33 danvet: a-b: me if you type that
10:33 mripard: yeah, I wanted to make sure that nothing else was broken
10:36 mripard: so you want to make a commit with that commit log ? :)
10:53 danvet: mripard, well tune it down somewhat
10:55 mripard: danvet: done
10:57 danvet: mripard, that Link: is a bit abuse
10:57 danvet: the entire thing is meant to make sure patches go to the m-l first :-)
10:58 danvet: but well it's done
10:58 danvet: at least send it out with könig on cc since I expect he's typing the same thing right now too
10:59 j4ni: stuom1: whether the stop state (or ulps) needs to be handled by the driver depends on the hardware you have
11:00 j4ni: stuom1: some dsi host hardware gives you full control and responsibility, some others handle stuff for you at a higher level
11:01 j4ni: https://gitlab.freedesktop.org/ is responding "503 Service Temporarily Unavailable" to me
11:03 mripard: danvet: oops, sorry I misunderstood then :/
11:04 mripard: I'll send it
11:08 danvet: mripard, the upfront a-b: me was just so you could send it out right away
11:08 danvet: in case I'm out for lunch or whatever
11:09 danvet: also the Link: you do should be References
11:09 danvet: *should have been
11:09 danvet: so maybe reply that you've jumped the process and pushed it all already
11:09 danvet: :-)
11:17 Venemo: hey guys
11:17 Venemo: can someone explain to me why nir_opt_copy_prop_vars doesn't copy propagate through barriers?
11:19 Venemo: I don't really see what can break if it just ignores the barriers
11:43 daniels: j4ni: yep, bentiss was changing config to fix some regressions caused by weekend changes which only surfaced this morning
11:45 pendingchaos: Venemo: it could break cross-invocation communication
11:46 pendingchaos: https://pastebin.com/raw/jVUnerAB (this has a race, but I think copy-propagation is still invalid here)
11:46 Venemo: pendingchaos: does that mean that it's safe to do for same-invocation stores and loads?
11:47 Venemo: I mean, sure, cross-invocation IO would always have to respect the barriers of course
11:50 Venemo: but currently it doesn't seem that nir_opt_copy_prop_vars has any sense of cross or same invocation IO
11:53 stuom1: j4ni, thanks for the tip, I will look into it. So it can be done in driver? Do you know any examples/functions/documentation where I could look more info? All I know about the mipi HW is that it implements D-PHY
12:04 Venemo: dschuermann: is there an easy way to determine if a loop is divergent?
12:09 dschuermann: Venemo: within the loop: you can iterate over all continues and then check all conditions until top level of the loop for divergence. If you need to ensure that no invocation left the loop early, you'd also have to check the breaks
12:10 Venemo: dschuermann: so basically walk through the CF nodes within the loop somehow?
12:10 dschuermann: you can check the divergence_analysis for loop header phis before the last big refactoring. it did exactly this
12:11 dschuermann: it should be quite fast, actually
12:12 j4ni: stuom1: it gets pretty low level wrt the dsi host hardware. e.g. i915 does not have to take care of it, but the last I looked into omap dss years ago the driver had to do everything
12:12 Venemo: dschuermann: I assume that is before MR 4062 right?
12:13 j4ni: stuom1: i915 is *not* a prime example to look at when implementing dsi drivers, fwiw
12:14 dschuermann: Venemo: exactly
12:15 Venemo: dschuermann: I don't have a nir_phi_instr, though. just a nir_loop
12:16 dschuermann: why would you need a phi?
12:16 Venemo: you said "check the divergence_analysis for loop header phis"
12:16 Venemo: or do you mean I should just do what visit_loop does here: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cf6cae832c9e7c95e2df88b4e86886d1310c505a/src/compiler/nir/nir_divergence_analysis.c#L757 ?
12:17 dschuermann: no, you should do this: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cf6cae832c9e7c95e2df88b4e86886d1310c505a/src/compiler/nir/nir_divergence_analysis.c#L537
12:18 dschuermann: just take the predecessors of nir_loop_first_block except the preheader
12:25 Venemo: dschuermann: would this do? https://paste.centos.org/view/94dfed2f
12:26 Venemo: dschuermann: with a "return false;" at the end, of course
12:27 dschuermann: nir_loop_first_block already exists
12:27 Venemo: yes, so?
12:28 Venemo: I didn't touch it
12:28 dschuermann: ah, nvm
12:29 Venemo: (it's a diff)
12:29 dschuermann: no, it won't work. you have to iterate over all predecessors of nir_loop_first_block except the preheader
12:29 dschuermann: from the predecessors you take the parent node. that's where you start the while loop
12:30 Venemo: I think that is what nir_cf_node *current = loop->cf_node.parent does isn't it?
12:30 dschuermann: no, the parent of the loop is outside the loop
12:30 dschuermann: (or enclosing the loop)
12:30 Venemo: right
12:30 Venemo: so, how do I get the correct node?
12:31 dschuermann: I just told you? take all predecessors of nir_loop_first_block except the preheader?!
12:32 Venemo: this is my first attempt to work with nir's CF, sorry
12:32 dschuermann: no worries
12:32 dschuermann: it is the same as done for the phis: for the phis it takes the blocks of the phi sources, except from the preheader
12:33 dschuermann: these blocks are exactly the continue blocks from the loop, and are also exactly the predecessors of the loop's first block
12:35 Venemo: ok
12:40 Venemo: dschuermann: is this what you meant? https://paste.centos.org/view/9a58a8db
12:41 Venemo: or is the inner while loop unnecessary now?
12:42 dschuermann: no, I think you need one additional current = current->parent; before the while
12:44 Venemo: is that because "current" is not the if node, but just a child of the if node?
12:47 Venemo: also, wouldn't this assertion: assert(current->type == nir_cf_node_if); fail in case we have a nested if, then?
12:49 dschuermann: current is just the block which is also a cf_node
12:50 Venemo: so, current's parent is the if
12:50 dschuermann: the parent is either the loop or an if (because blocks can't be nested in other blocks, nor can't the block be inside a nested loop)
12:51 dschuermann: the parent can directly be the loop if the block is the last block of the loop
12:53 Venemo: and how does it look when there is a nested if?
12:53 Venemo: does this work in that case?
12:57 dschuermann: the parent of an if can be another if: the if has two lists of cf_nodes (for then and else), starting and ending with a block
12:58 dschuermann: for a nested if, it would look like this: if () { block1_node, if_node, block2_node } else { block3_node }. the inner if's parent is the outer if
12:59 Venemo: ah, okay
13:00 dschuermann: loops also contain a list of cf_nodes, but as you start at a continue, the first loop you'll encounter is the one where you started
13:02 dschuermann: Venemo: here is the documentation, which young cwabbott wrote https://people.freedesktop.org/~cwabbott0/nir-docs/control_flow.html when he was still a toddler :D
13:04 Venemo: dschuermann: ok, can I take that as a yes?
13:05 dschuermann: yes!
13:05 Venemo: :)
13:40 igorkovalenko: hi, I'm still unable to run firefox with latest master, see last comment on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6477
13:40 gitbot: Mesa issue (Merge request) 6477 in mesa "st/mesa: Use nir-to-tgsi for builtins if the driver needs TGSI" [Nir, Gallium, Merged]
13:41 igorkovalenko: is my r600 doomed now? :) llvmpipe works
14:08 imirkin: anholt: mareko --^
14:30 kisak: igorkovalenko: maybe test if r600/nir is mature enough to run firefox?
14:37 igorkovalenko: kisak: hmm I was testing clean build from master, then after that update it started with asserion, what did I done wrong )
14:39 kisak: There might be a legitimate bug there, I'm suggesting you might be able to sidestep it with a different driver (r600/nir instead of the older r600)
14:41 Venemo: pendingchaos: speaking of the nir_opt_copy_prop_vars question above. do you think it's enough to add some code to check if a load was same-invocation, before removing it in apply_barrier_for_modes, or would that still result in some weird cases?
14:41 kisak: if only I could remember the env variable that enables r600/nir
14:41 igorkovalenko: kisak: I'm happy with reverting two commits locally for now but would be great if that works with next release ;)
14:42 kisak: maybe try running R600_DEBUG=nir firefox in a terminal
14:44 kisak: you should also file a separate bug report for the regression so that it doesn't get forgotten
14:50 pendingchaos: Venemo: I think that would probably be fine
14:57 Venemo: pendingchaos: what I'm worried about is, what happens if there is a cross-invocation load after a same-invocation load. in that case the same invocation load can still be eliminated, but the store still needs to happen and the barrier still needs to be respected by the cross-invocation load
14:59 pendingchaos: the copy-propagation pass doesn't remove stores, just loads
14:59 Venemo: oh
14:59 Venemo: does this mean that the stores remain there even if they're unnecessary?
14:59 Venemo: or do we have a different opt to remove those too?
15:00 pendingchaos: we have nir_opt_dead_write_vars()
15:02 Venemo: ok
15:49 tzimmermann: fbdev write changes semantics depending on the type of video memory :(
15:53 danvet: tzimmermann, uh really?
15:53 danvet: how so?
15:54 tzimmermann: yup
15:54 tzimmermann: write always performs the write op, even if an error is present
15:55 tzimmermann: now the question is: do you return the error code or the number of written bytes
15:55 tzimmermann: ?
15:55 danvet: mripard, I have some unused vc4 variable warn ...
15:55 danvet: but it just scrolled off
15:55 danvet: tzimmermann, ugh
15:55 tzimmermann: the core (I/O memory) code returns the counter
15:55 tzimmermann: the _sys_ function returns the error code
15:55 danvet: tbh I'm wondering whether this read/write stuff is even used
15:56 tzimmermann: no idea
15:56 danvet: mildly tempted to make our fbdev read/write functions just return ENOTTY
15:57 tzimmermann: i'm working on the test case
15:58 tzimmermann: the core code appears to be correct
15:59 tzimmermann: there was even a fix that introduced the difference: 6a2a88668e90c
15:59 danvet: tzimmermann, that smells like core/vfs people fixing bad read/write implementations
16:00 danvet: not fbdev people fixing functional issues they've seen
16:00 tzimmermann: possible. but it is the correct thing
16:02 danvet: yeah, but carrying dead code around isn't the correct thing :-)
16:02 danvet: anyway I guess with the testcase it's at least not totally unused anymore ...
16:03 tzimmermann: we don't know if it is used
16:06 tzimmermann: no one tests read/write for errors. so no one noticed in 20 yrs :)
16:06 tzimmermann: or maybe rather: :(
16:10 jekstrand: karolherbst: One thing I've been thinking about: Do we want separate deref_is_* intrinsics or just one deref_mode_is intrinsic which uses a MEMORY_MODES const index?
16:10 jekstrand: jenatali: ^^
16:10 jekstrand: cmarcelo: ^^
16:14 mripard: danvet: on which branch?
16:17 danvet: mripard, everything mixed together
16:17 danvet: and I think it's not one of mine
16:17 mripard: I'll have a look
16:31 jenatali: jekstrand: Makes no difference to me, one sounds kinda cleaner I guess
16:51 karolherbst: jekstrand: I was thinking about the same and didn't come to a conlcusion
16:51 karolherbst: deref_mode_is might have the advantage that we can specify a mask and lower it to specific intrinsics
16:52 karolherbst: but yeah... I doubt we'd need it anyway
16:55 jenatali: jekstrand: Considering the +/- line count on switching to one intrinsic, I'm inclined to say go for it
16:56 jekstrand: jenatali: Yeah, I think that's my inclination as well.
16:56 jekstrand: I'll go ahead and make the change.
17:03 jekstrand: jenatali: I've made the deref_mode_is changes and squashed them into the appropreate patches.
17:04 jenatali: Cool, sounds good
17:04 jekstrand: I really don't like how many changes I've made without testing the generic pointers part. :-/
17:05 jekstrand: I think once we're pretty sure the branch is where we want it, I'll rebase my RT branch on top and test it.
17:09 jenatali: Have you checked if the CL CTS has decent dedicated coverage for generic pointers?
17:10 jekstrand: No, I haven't even looked at the CL CTS for generic
17:10 jekstrand: Sadly, we can't expose them without 3.0
17:10 jenatali: Not for realsies, but you could probably hack it up to get far enough to exercise the coverage
17:10 jenatali: If there is coverage that is
17:11 karolherbst: jekstrand: : just force 2.0 and run the tests :p
17:11 karolherbst: anyway, will probably do it this week anyway
17:11 jekstrand: karolherbst: Yeah....
17:12 karolherbst: what concerns me more is generic pointers with unified address space + SVM :/
17:12 karolherbst: that will be painful
17:12 jekstrand: karolherbst: Yeah....
17:12 karolherbst: but probably something like "try to reserve this memory and use it"
17:12 bnieuwenhuizen: why?
17:12 karolherbst: well.. I already reserve a big chunk of memory
17:12 karolherbst: for pushbuffers and stuff
17:12 jekstrand: karolherbst: You have 48 bits of address space
17:13 karolherbst: yeah
17:13 karolherbst: I just have to mmap whatever I reserve
17:13 bnieuwenhuizen: on AMD we reserve that stuff in the upper half of address space
17:13 karolherbst: bnieuwenhuizen: but for real syste SVM
17:13 karolherbst: *system
17:13 bnieuwenhuizen: whch a process wouldn't allocate from on the CPU because it is for the kerne
17:13 karolherbst: ahh
17:13 karolherbst: mhhh
17:13 jekstrand: karolherbst: do a MAP_ANONYMOUS without MAP_POPULATE and, as long as you never touch those addresses, it won't burn pages.
17:13 karolherbst: but we can't
17:13 karolherbst: pushbuffers can only use 40 bits
17:14 karolherbst: it's just annoying
17:14 karolherbst: jekstrand: yeah, that's what I am doing already
17:14 karolherbst: but I think I might need a special region for the unified address space
17:14 jekstrand: karolherbst: I don't think that shared and scratch are going to burn that much address space.
17:14 karolherbst: so that the kernel won't place anything in there
17:14 karolherbst: jekstrand: it's more about the announce of having to do so
17:15 karolherbst: and this is always a trial and error thing
17:15 karolherbst: but I think all this stuff should be handled in kernel space anyway...
17:15 karolherbst: *sigh*
17:16 jekstrand: In any case, I don't think all the SVM problems really impact compiler design.
17:16 karolherbst: yeah.. they don't besides constant memory
17:17 jekstrand: It's more about mangling addresses and fighting between driver, gcc/malloc, and the kernel.
17:17 karolherbst: but USE_HOST_PTR has similiar problems already
17:18 jekstrand: Oh, yeah.
17:18 jekstrand: SVM is a kernel interface disaster
17:18 karolherbst: yep
17:18 karolherbst: well, there is the API to promise what parameters are not SVM for real.. but seriouslyt...
17:19 karolherbst: ...
17:19 karolherbst: this belongs into the kernel :D
17:19 karolherbst: anyway, I will give it a go and see what problems I'll run into
19:03 airlied: dang it cl3.0 will require clang changes, might have to help get those upstream
19:12 karolherbst: jekstrand: you didn't do any wiring up in clover, did you?
19:14 airlied: wow the relationals tests take a loooong time
19:14 karolherbst: yep
19:15 karolherbst: like all those math tests
19:15 karolherbst: though bruteforce is quite fast
19:16 airlied: been going for 8 hours now :-P
19:17 karolherbst: I guess running on the CPU doens't help
19:17 airlied: and it's and old sandybridge machine
19:17 karolherbst: well, once you run the final submission you might want to wait a week or so :p
19:29 karolherbst: *sigh*
19:29 karolherbst: I think we have to rework the type parsing in clover for generic pointers :/
19:30 karolherbst: the spirv parsing I mean
19:46 jenatali: airlied: Have you tried conversions yet?
19:46 jenatali: karolherbst: Why?
19:56 karolherbst: jenatali: because the way we parse it. We already construct clover arguments for all types, even if they are not used as actual arguments
19:56 karolherbst: well... argument types rather
19:57 karolherbst: and just adding a generic type for this sounds.. wrong
19:57 jenatali: I'm not sure I'm following
19:57 karolherbst: not talking about vtn
19:57 karolherbst: I am talking about the stuff we parse inside clover
19:58 jenatali: Ohh
20:01 karolherbst: let's see if I can just ignore it for now
20:01 karolherbst: jekstrand: Unhandled opcode: SpvOpGenericPtrMemSemantics
20:02 karolherbst: uff..
20:03 karolherbst: what the heck is this kernel..
20:05 karolherbst: "Result is a valid Memory Semantics which includes mask bits set for the Storage Class for the specific (non-Generic) Storage Class of Pointer." that phrasing though
20:06 jenatali: Sounds like it tries to subset the storage classes of generic pointers?
20:08 airlied: jenatali: I'm only on wimpy at the moment, but I'm not sure if I hit conversions yet or it crashes
20:08 airlied: karolherbst: I had to do a bit of ptr rework yesterady for local
20:08 karolherbst: airlied: ahh, cool
20:08 karolherbst: jenatali: no idea...
20:10 airlied: https://gitlab.freedesktop.org/airlied/mesa/-/commits/clover-wip
20:10 airlied: see module/args: add a new ptr align field
20:10 airlied: and the Dave Airlie's avatar
20:10 airlied: clover: store the pointer alignment to align local memory storage
20:10 airlied: oops
20:10 karolherbst: airlied: I already discussed that with curro and I already have a patch somewhere though
20:11 karolherbst: airlied: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/d43a0fee1c60a072e1b7505e063dda8f2e393c3b
20:11 karolherbst: curro didn't like the idea of adding a new field
20:12 airlied: it would help if curro could documnet what target_align means
20:12 curro: didn't object the new field though it would have been mostly redundant
20:12 airlied: because that isn't what the llvm frontend does with it
20:12 airlied: at least from my reading ofr the llvm datalayout api
20:12 karolherbst: airlied: I think the issue is, that none of this is clearly defined anyway
20:13 karolherbst: but yeah.. no idea about the llvm stuff either
20:13 airlied: my original idea was to use target_align, so I'm happy to r-b your patch instead :-P
20:13 airlied: I just didn't want to change the meaning of it
20:13 karolherbst: yeah...
20:13 karolherbst: I am undecided
20:15 curro: airlied: right now target_align is just the alignment requirement of the target device, clover pads arguments set from the API to this alignment while formatting the kernel input buffer
20:19 airlied:is happy with karolherbst patch if curro is
20:19 karolherbst: the problem is, my patch still requires changes to the llvm backend
20:21 curro: karolherbst: does it?
20:21 karolherbst: yeah
20:21 karolherbst: I also had to change the spirv one
20:21 karolherbst: but maybe the llvm one already works around it?
20:21 karolherbst: dunno
20:21 karolherbst: but doubtful
20:23 curro: karolherbst: i guess it depends... there is a chance those devices were passing already passing address_bits/8 as target_align so no need to change anything if you're lucky ;)
20:23 airlied: karolherbst: my guess is the llvm backend is broken
20:24 airlied: and fails those tests anyways
20:24 karolherbst: ¯\_(ツ)_/¯
20:24 karolherbst: I have no way of testing it
20:25 airlied: yeah so I;m not caring about it
20:25 airlied: if there are major regressions with it, someone will screanm
20:25 curro: karolherbst: the part about aligning the local memory pointer to target_align seems backwards-compatible with the previous behavior, so i guess someone could just look up what the LLVM back-end was passing as target align
20:26 jenatali: karolherbst: That opcode you found corresponds to cl_mem_fence_flags get_fence(gentype *ptr) btw
20:27 airlied: it's passing the getABITypeAlignment which doesn't seem to be this
20:28 karolherbst: jenatali: yep
20:28 karolherbst: jenatali: test_generic_address_space function_get_fence ;)
20:29 jenatali: Looks like it's supposed to return 0x100 (WorkgroupMemory) and/or 0x200 (CrossWorkgroupMemory)
20:29 curro: airlied: so i bet it's the same as address_bits/8, in which case no need to change the llvm back-end itself
20:30 karolherbst: jenatali: probably
20:30 karolherbst: but what about shared?
20:30 karolherbst: uhm.. that's crossWorkgroup
20:30 karolherbst: I meant function
20:31 karolherbst: curro: it's more about that target_align shouldn't be set to the pointer types alignment
20:31 karolherbst: but what it points to
20:31 karolherbst: sure.. at least no regression with llvm
20:31 karolherbst: but also wouldn't fix the issue
20:32 jekstrand: karolherbst: Uh.... I guess my shaders didn't use that. :P
20:32 dcbaker[m]: PSA: my harddrive went poof, I have no idea when the next mesa release is going to happen
20:33 curro: karolherbst: yeah, that would definitely need to be done, but it's something you can fix from clover/llvm/, no need to change the llvm back-end itself
20:33 karolherbst: jekstrand: I can also ignore it and deal with it later :D
20:33 karolherbst: curro: ohh, right, I meant the clover/llvm code
20:33 jekstrand: karolherbst: If you tell me what CTS tests to run and how, I'll fix it quick.
20:33 jenatali: karolherbst: You don't need to fence function memory. No other thread can access it
20:34 karolherbst: jekstrand: test_generic_address_space function_get_fence
20:34 jenatali: Shared = Workgroup, global = CrossWorkgroup
20:34 karolherbst: jenatali: fair
20:35 jenatali: https://github.com/KhronosGroup/OpenCL-CTS/blob/5e84ad0c1911afc3e0e8fc946b4caf50a7432ab7/test_conformance/generic_address_space/base.h#L28
20:35 jekstrand: karolherbst: Ugh.... THat opcode is just gross.
20:36 jekstrand: karolherbst: Easy to implement though
20:36 jenatali: "current spec says get_fence can return any valid fence"
20:36 karolherbst: :D
20:36 karolherbst: yep
20:36 jenatali: Could just hardcode it to 0x300 and be done with it...
20:36 karolherbst: I guess
20:36 karolherbst: but I don't think it's hard for us to be better than that
20:37 jenatali: Nah, we've got enough compile-time knowledge to be able to lower it to a good constant most of the time, and then it's just a question of generating an if ladder or lowering to 0x300
20:37 jekstrand: karolherbst: How do I force a CL version?
20:37 jenatali: For the case where the compiler can't provide it's a single type of memory
20:37 karolherbst: ..
20:37 karolherbst: jekstrand: ^^
20:38 karolherbst: I htink we might want to merge those into one env var
20:38 karolherbst: having three is a bit pointless
20:40 jekstrand: karolherbst: Yeah....
20:46 jekstrand: Ugh... the SPIR-V validator doesn't like these CL 2.0 kernels....
20:46 karolherbst: :/
20:47 karolherbst: you mean the translator produces garbage?
20:47 karolherbst: or is it really the validator now?
20:47 jekstrand: Still working on it
20:48 jekstrand: Lots of hacking just to get them to run
20:51 jekstrand: What's get_fence()?
20:53 jenatali: It's supposed to return a value that you can pass to barrier() to appropriately synchronize the generic pointer
20:53 jekstrand: right
20:54 jekstrand: I suppose we'll have to implement generic barriers too. :-/
20:54 karolherbst: jekstrand: why?
20:54 Sachiel: generic everything
20:54 karolherbst: we know at compile time
20:54 karolherbst: and if it can be all three address spaces we just return that
20:55 jenatali: All two, in the case of barriers
20:55 karolherbst: jenatali: "no barrier needed" is also an option, no?
20:55 jenatali: Sure, it's a bitfield with 2 bits, one for shared and one for global, with both/neither also being options
20:59 karolherbst: but I think we can just do a plain analysis and we know the mem type just fence on that
20:59 karolherbst: otherwise fence on all
22:52 airlied: MrCooper, anholt : last few marge jobs seem to be goping wrong
22:53 airlied: my MR 6639 didn't start CI, and it definitely was pushed by marge
22:53 airlied: oh I lie
22:53 airlied: it did get kicked off
22:53 airlied: wierd it must have just been the previous two pipelines from others has issues
22:59 karolherbst: jekstrand: function_to_address_space throws some invalid nir error :)
22:59 karolherbst: https://gist.githubusercontent.com/karolherbst/3729a017fc6f31215a892fd9b84df492/raw/8572955b4b9e9ddaab9459105b886bcff01191c0/gistfile1.txt
23:00 karolherbst: but I think that's because it's a cast to generic
23:00 karolherbst: and we might just want to allow that?
23:00 karolherbst: dunno
23:00 jenatali: Looks like it's a cast from shared => global...
23:00 karolherbst: but CL allows this stupid casting to everything behavior
23:00 jenatali: vec1 64 ssa_17 = deref_var &gint (shared uint)
23:00 karolherbst: ahhh
23:00 karolherbst: right
23:00 jenatali: vec1 64 ssa_29 = deref_cast (uint *)ssa_17 (global uint) /* ptr_stride=0, align_mul=0, align_offset=0 */
23:00 karolherbst: the value isn't used though :/
23:00 anholt: airlied: polishing off the new deqp-runner branch, btw.
23:01 karolherbst: "vec1 64 ssa_70 = deref_cast (uint *)ssa_17 (global uint)" is though
23:01 anholt: triggering some flakes on t720, so getting a whole lot of runs trying to make sure they're all listed.
23:01 karolherbst: but casting also means "just cast, don't convert"
23:02 karolherbst: or do we have to convert?
23:02 karolherbst: no clue really
23:02 karolherbst: but I think I'd leave it
23:02 karolherbst: anyway
23:02 karolherbst: will comment on the MR
23:02 jenatali: "If a pointer into one address space is converted to a pointer into another address space, then unless the original pointer is a null pointer or the location referred to by the original pointer is within the second address space, the behavior is undefined."
23:03 jenatali: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#changes-to-C99 in the second block
23:03 karolherbst: mhhhh
23:03 karolherbst: how writes that spec...
23:03 karolherbst: unbelievable
23:03 karolherbst: *who
23:03 jenatali: jekstrand: Sounds like that deref validation needs to be removed :(
23:04 karolherbst: jenatali: so.. if you got hw which maps everything, then it has to be defined, no?
23:04 jenatali: Or at least be made non-fatal
23:04 karolherbst: at least when you cast shared to global
23:04 jenatali: karolherbst: Yeah if everything's in the same actual address space then it'd work
23:04 karolherbst: *sigh*
23:04 karolherbst: oh well
23:04 karolherbst: no way applications can rely on that
23:04 jenatali: Does nir_validate have warnings?
23:04 karolherbst: but maybe we want to be nice?
23:04 karolherbst: jenatali: why nir_validate?
23:05 jenatali: karolherbst: That's just where it's blowing up right now
23:05 karolherbst: ohh warnings as in warnings to the user
23:05 jenatali: Yeah, or warnings if someone's reading the nir and says "wtf is this doing?"
23:05 karolherbst: mhhh
23:05 karolherbst: we don't
23:05 jenatali: It'd be nice if we could step in and say "yeah we have no idea wtf this is doing either"
23:05 karolherbst: mhhh
23:07 karolherbst: basing opencl c on top of c was a mistake :D
23:07 karolherbst: uhm s/top//
23:08 karolherbst: uhm s/of//
23:15 jekstrand: karolherbst: Yeah... I was looking at that
23:15 jekstrand: karolherbst: Sorry. Annoying meetings. I'm back now
23:24 jekstrand: karolherbst: It looks like these tests are using __global variables
23:24 jekstrand: karolherbst: And NIR somehow thinks they're __local
23:26 karolherbst: heh
23:26 karolherbst: good point though
23:26 karolherbst: but my comment regards explicit casts still stands :p
23:26 karolherbst: *in regards to
23:26 jekstrand: karolherbst: Yeah, I'm not sure why that's breaking yet
23:27 karolherbst: jekstrand: becayse the assumption over overlapping addresses spaces doesn't hold
23:27 jekstrand: ?
23:27 jekstrand:is confused
23:27 karolherbst: you have a global*, but now it gets casts to local*
23:27 karolherbst: that's valid
23:28 karolherbst: undefined, but valid
23:28 jekstrand: right
23:28 karolherbst: although it sounds like there are some magic conditions where it can be defined
23:28 karolherbst: like if your address space is unified
23:28 karolherbst: and your global pointer indeed points into local memory
23:29 airlied: undefined means there are no magic conditions
23:29 karolherbst: airlied: "If a pointer into one address space is converted to a pointer into another address space, then unless the original pointer is a null pointer or the location referred to by the original pointer is within the second address space, the behavior is undefined."
23:29 karolherbst: specifically "or the location referred to by the original pointer is within the second address space"
23:29 airlied: yup so undefined means it doesn't matter if what it returns makes sense
23:29 airlied: it's not defined
23:30 airlied: but yeah maybe that's vague enough statement
23:30 karolherbst: sure, but what does this stupid sentence mean
23:30 airlied: it likely means doubles casts work
23:30 airlied: so doing a local->global->local is defined
23:30 jekstrand: Ok, figured out why it's making things shared. So that's fixed now.
23:30 karolherbst: maybe it is to define (global*)(local*)global...
23:30 karolherbst: no clue
23:30 airlied: karolherbst: yeah that's what I thin
23:31 karolherbst: jekstrand: cool
23:32 jekstrand: karolherbst: Now it's blowing up because NIR doesn't think nir_var_mem_global should be used for variables. :)
23:33 karolherbst: :)
23:33 karolherbst: yeah...
23:33 karolherbst: that's new in CL 2.0
23:33 karolherbst: we need to bind a buffer
23:34 karolherbst: not sure if CL 3.0 requires it though
23:34 karolherbst: so I guess maybe we want to tackle this one first?
23:34 karolherbst: maybe the other tests don't use those features
23:35 karolherbst: *sigh*
23:36 karolherbst: all the other tests use the fence stuff
23:36 karolherbst: ahh
23:36 karolherbst: "multiple_calls_same_function" crashes
23:36 rcdrone: slack
23:36 karolherbst: test_generic_address_space: ../src/compiler/nir/nir.h:1557: nir_deref_mode_is_in_set: Assertion `!(deref->modes & ~modes)' failed.
23:36 rcdrone: whoops, wrong window
23:37 jekstrand: karolherbst: That's the one I'm looking at right now.
23:37 jekstrand: karolherbst: I've got the new SPIR-V opcode implemented.
23:38 karolherbst: cool :)
23:38 jekstrand: karolherbst: It's the "I'm going to define my own global variable" that's causing me problems
23:38 karolherbst: yeah
23:38 karolherbst: the runtime needs to support it
23:38 jekstrand: For now, I've turned off copy_prop_vars which seems to prevent the deref_mode_is_in_set assertion above
23:38 karolherbst: and we might need to add another implicit arg
23:38 karolherbst: or something
23:38 karolherbst: it's really annoying
23:38 karolherbst: jenatali: did you implement it?
23:39 karolherbst: jekstrand: the thing is, it's just a global buffer declared inside the kernel and it has a static size
23:39 karolherbst: so we just need to bind a buffer to back it up
23:39 karolherbst: probably one for all used variables...
23:40 jenatali: karolherbst: No I haven't hooked up global variables yet
23:40 karolherbst: and we could have a nir_load_global_vars_memory_buffer_ptr or something
23:40 karolherbst: and offset it
23:40 jenatali: That'd work
23:40 karolherbst: the good thing is, the application can't access it
23:41 jekstrand: karolherbst: And, to make things more fun, you can initialize global variables. :)
23:41 karolherbst: _but_ no idea what happens if the kernel thinks taking the address and pushing it out is a good idea :P
23:41 karolherbst: jekstrand: yep
23:41 karolherbst: but there was something super stupid about it
23:41 jenatali: karolherbst: Same as taking the address of a function local and pushing that out...
23:41 karolherbst: jekstrand: apparently you do it once per program
23:41 karolherbst: not kernel
23:41 jekstrand: karolherbst: ?
23:41 jenatali: karolherbst: For C, you can only data-initialize. For C++ you can have constructors, yeah
23:41 karolherbst: you know what I meant :p
23:42 karolherbst: jenatali: sure, but data initialize happens still once per program or is that only for the constructors?
23:42 jenatali: Yeah it's once per program
23:42 karolherbst: yeah...
23:42 jenatali: Writing to the global variable needs to persist from one kernel execution to the next
23:42 karolherbst: jekstrand: fun, isn't it?
23:43 karolherbst: jenatali: at least the data is constant
23:43 jekstrand: karolherbst: What do you mean per-program?
23:43 karolherbst: so we can just preupload once we create the buffer
23:43 jenatali: Right
23:43 karolherbst: jekstrand: cl_program
23:43 jenatali: But for constructors you have to on-demand run the init kernel before any user kernel
23:43 karolherbst: you know, that thing you create cl_kernel objects from
23:43 jekstrand: Right, so we don't re-initialize every dispatch, we just initialize once when we create the program. That seems reasonable.
23:44 karolherbst: jekstrand: so if you create 5 kernels out of the program, they share the buffer cross invocation
23:44 karolherbst: jekstrand: nonono, it's worse
23:44 jekstrand: karolherbst: Oh....
23:44 karolherbst: yeah...
23:44 karolherbst: now you got it :)
23:45 karolherbst: anyway
23:45 jekstrand: So we probably have to scrape them straight out of the SPIR-V and not the NIR because we have to share them across a bunch of stuff.
23:45 karolherbst: the c++ constructor stuff is even worse
23:45 karolherbst: but you attach the buffer to the cl_program object instead of cl_kernel
23:45 karolherbst: and just fill it up with default data
23:45 karolherbst: and reuse it in every kernel
23:45 karolherbst: worse is the c++ stuff
23:45 karolherbst: where you create a kernel doing the constructor stuff
23:45 karolherbst: then you launch the normal kernels
23:45 jenatali: Yep
23:46 karolherbst: jekstrand: yeah, essentially
23:46 jenatali: jekstrand: Or, you could run the SPIR-V through vtn as a library and scrape it out of that
23:46 jekstrand: ok, that's kind-of terrible.
23:46 jekstrand: jenatali: Yeah, that could be done too.
23:46 jekstrand: Honestly, we're probably moving that direction eventually.
23:46 jenatali: Yeah
23:50 karolherbst: jekstrand: although we probably get away by keeping the order the same in each kernel and to never DCE them unless we assigned locations and made sure they are same between all kernels
23:50 jenatali: True, just never remove dead global variables
23:51 jekstrand: karolherbst: Yeah, that might work, actually.
23:54 airlied: jekstrand: you haven't reelase the CL C for those kernels yet have you?
23:54 jekstrand: airlied: Nope
23:55 jekstrand: karolherbst: I've hacked my way to some amount of progress but I think I'm done for now. I'll try and get a bit further tomorrow.
23:56 karolherbst: okay
23:56 karolherbst: jekstrand: do you have patches ready for the fence stuff though? or is the test just still failing
23:59 jekstrand: karolherbst: Pushed some hacks to the end of my MR
23:59 jekstrand: karolherbst: But, yeah, more stuff required for that test