00:22dominoeffect: HdkR: in the paradigm i represented there is no such latent concept as counting bits as in binary code. it's that you did not pay enough attention. 114+84+270 aka 114+354 is 468 ripping it out into finer granules is as told 232+118+118, now you technically count bits with some ways referring back orforth from 118 to 117 or 119 where on duplicates other elements are at 114 such line
00:22dominoeffect: 233+118+118−119−119−119=112 so 114+114-112-114=2, as those banks are assymmetric for multivalue/multicell access or not proportional, our first quantum computing arithmetic is multibank/multivalue assymmetric multiply, or you are in a loop or iteration and can multiply with ISA instr too, so 2*118=236, so counting bits and inverse out bits is just ridiculously fast and easy. You are
00:22dominoeffect: fucking nonsense guys i am not interested to deal with you, such person as Lynne and her heroics do not exist, the video decoder was updated by AI, and given out from huawei research labs by my private lab and was called DLVC deep learning based video decoder formulation through the screen scraping of framebuffer and deriving the needed code some semiotic experts call this as inferring
00:22dominoeffect: (google for modulus ponens and wikipedia) i did the same for opencl with boyi so those stacks came from china and south-korea , my code is way stronger than theirs, but perhaps too strong for general public, I break RSA next year, i have list of abusers to kill elsewhere, it seems you are not defined as per logic that gave you senses as for other human beings like me Mart Martin. Not
00:22dominoeffect: joss as a soon dead man abuser called me, I gave him opportunity after he had injured me for life, that he does couple of years in prison where we add a buddy to his cell, he gets anal every day until the nose bleeds every single day for attacking me uncouncious from behind the back and implanting a chip to my neck or if he does not do that, we torture and kill him. Once again I am no
00:22dominoeffect: longer interested to deal with you.
00:50HdkR: oh, what fun timing
07:22MrCooper: mareko: EGL doesn't support all the same functionality as GLX, e.g. nothing corresponding to GLX_OML_sync_control
07:25HdkR: I wonder if a new extension could fix that.
07:25HdkR: Extensions tend to fix missing functionality :)
08:50chema: * does anybody have issues with access to ssh.gitlab.freedesktop.org ? It seems that ssh service is down for me
08:51chema: does anybody have issues with access to ssh.gitlab.freedesktop.org ? It seems that ssh service is down for me. I've already reported an issue https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/2414 just case you are also affected.
08:55MrCooper: chema: being discussed on #freedesktop (infra issues in general)
08:56mlankhorst: chema: I have issues. :-)
09:01ccr:has coffee.
09:08MrCooper: should be better now
09:39eric_engestrom: karolherbst: yes, the branchpoint is planned for tomorrow, as in wednesday (since you asked 11h ago that might have been a different "tomorrow" ^^)
09:40karolherbst: yeah I uhm.. noticed too late it was past midnight 🙃
09:54eric_engestrom: FWIW there are still 2 MRs that I think authors want merged before the branchpoint: https://gitlab.freedesktop.org/mesa/mesa/-/milestones/51#tab-merge-requests
09:55eric_engestrom: if anyone wants to help, one MR is about NIR and the other one X11
10:42austriancoder: eric_engestrom: how can I add a MR to a milestone?
10:43eric_engestrom: austriancoder: in the right column of the MR, where all the metadata is, you have a "milestone" section; select it there
10:44austriancoder: eric_engestrom: nice
11:16karolherbst: it would be nice if iris/zink/radeonsi devs could review their part in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36007 but not sure I want it to block the release :)
11:20alyssa: karolherbst: "i will review code anywhere in tree"
11:20alyssa: *clicks MR*
11:21alyssa: "oh god not synchronization"
11:21karolherbst: :)
11:21karolherbst: don't worry, it's the simple kind!
11:32karolherbst:should review more nir MRs, subscribes to the label
11:33alyssa: karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36127/diffs?commit_id=abfebd961df8638f98ee57c1d9ae1f7eaaa085c6 :pleading:
11:36karolherbst: I should wire up semaphores on my macbook :D
11:37karolherbst: but I don't think anything uses those things yet, so whatever.. and the CL CTS tests are... well.. broken
11:37karolherbst: I'm way more interested in getting comments on this "little" MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24515 but it's way too late for this release cycle now :D
13:17sima: tzimmermann, you've forgotten the Fixes: lines in v2, I think without those stable bots won't pick it up
13:17sima: since at least some are in 6.15 already
13:17sima: or add cc: stable for those that need it
13:18tzimmermann: sima, let me check, but i don't think so
13:18sima: the last one is in 6.15
13:18sima: e8afa1557f4f963c9a511bd2c6074a941c30868
13:23sima: tzimmermann, I think some of the others are also in 6.15
13:23sima: tzimmermann, dim fixes in case of doubt
13:23tzimmermann: sima, just checked. you're right. the core patches went into 6.15 already
13:27tzimmermann: i'll send out an update in a bit
13:36sima: tzimmermann, could have done that while applying, either way is fine with me
17:30robclark: karolherbst: why does flush_events() pipe.flush().wait() (ie. the .wait() part).. this seems to be the same thread launching grids, so this introduces a stall
18:01karolherbst: robclark: no good reasons except I need to put the events into the proper state at some point and I'm doing it on the host atm
18:01karolherbst: well with cross queue dependencies it's kinda required because applications rely on it
18:02jenatali: FWIW I push that work into a thread pool
18:02robclark: hmm, you could keep a queue of already flushed but not waited fences and associated events.. then defer the wait until needed (cross queue) and cleanup until signaled?
18:02karolherbst: yeah same
18:03karolherbst: robclark: I wanted to associate fences with events at some point, it's just a bit annoying, because the application can also register callbacks
18:03karolherbst: and they might signal user events in those callbacks
18:03karolherbst: without waiting on anything
18:03karolherbst: so at some point I do have to know when the stuff is actually completed
18:04robclark: sure, you could always wait for pending fences if you get into a corner, but not doing it in the common case would help perf :-)
18:04karolherbst: yeah....
18:04jenatali: Yeah I've got a thread pool per device that basically serializes all queues down to something like a gallium context. Right now it serially calls flush + wait when it submits work
18:04jenatali: I think it's technically against spec though
18:04karolherbst: it was worse in the past where I waited after each event :D
18:05karolherbst: right...
18:05karolherbst: rusticl does the flush + wait after all events sent to the queue were processed (or for other reasons like before waiting on user events)
18:06karolherbst: it's not great.. but I don't really know of a lot of better ways of handling this.. well except I could check if nothing can observe the event
18:06jenatali: > When a task becomes ready, any other tasks that depend on it that can execute on the same device will also be marked ready. Technically, this is a violation of the CL spec for events, because it means that given task A and task B, where B depends on A, both A and B can be considered 'running' at the same time. The CL spec explicitly says that an event should only be marked running when previous events are 'complete', but this seems like a
18:06jenatali: more desireable design than the one imposed by the spec.
18:06karolherbst: easy
18:06karolherbst: you put them to running and complete after they completed :P
18:07karolherbst: the issue is more the callbacks on CL_COMPLETE
18:07jenatali: What's the issue there?
18:07karolherbst: the application could have added a completion callback that signals user events
18:08karolherbst: so you have to eventually put events into CL_COMPLETE state (and call those callbacks) without the application waiting on anything
18:08karolherbst: might even just flush + wait on the user event
18:09jenatali: Right, hence the worker thread that basically executes in lock-step with the device
18:09karolherbst: right
18:09karolherbst: that's what I do as well
18:09karolherbst: but as robclark points out, that causes stalls
18:10karolherbst: if there would be a primitive we could use where the GPU could tell the host it is done with something 🙃
18:10karolherbst: anyway.. I think a proper fix would require some gallium changes
18:11karolherbst: basically need the GPU to set the event and the host to be notified about it _somehow_
18:11jenatali: Oh, right. IIRC I've got 2 threads actually, one which kicks off work and one which watches for progress
18:11karolherbst: mhhh
18:12karolherbst: like something that just checks every second or so if the status changed?
18:12karolherbst: not that gallium has any APIs for that atm anyway
18:12robclark: karolherbst: you can wait on a fence from another thread besides the one that is calling pipe_context::whatever()
18:12jenatali: No, the one that kicks off work enqueues a fence signal, and then posts a message to the other thread to wait for the signal to complete
18:12jenatali: Yeah, that
18:12robclark: so you just need a second thread to do the waits and cleanups.. and then sync with that thread for cross-queue or whatever other edge cases
18:12karolherbst: mhhh
18:13karolherbst: yeah... probably
18:13karolherbst: more threads
18:15karolherbst: I just wished there would be a better way
18:15karolherbst: like drivers have access to a sequence number
18:15karolherbst: some drivers do
18:16karolherbst: iris has it
18:16jenatali: That's still just polling though?
18:16jenatali: How is that better?
18:16karolherbst: mapped
18:16karolherbst: but yeah, you would have to poll to get the recent number still...
18:16jenatali: WDDM works that way for all devices FWIW, i.e. all fences are timeline fences with that kind of sequence number
18:17karolherbst: would be nice if you could just wait on that number to change
18:17karolherbst: but...
18:17karolherbst: the real solution of course is that the GPU should simply be able to call host code :P
18:18karolherbst: but yeah... a second thread isn't necessarily a bad idea
18:19jenatali: "Wait on that number to change" == "wait for a fence signal"?
18:19karolherbst: mhhh
18:19jenatali: And technically the GPU can call host code by raising an interrupt ;)
18:20karolherbst: I should just add a future runtime and...
18:20karolherbst: in any case, I guess it needs another thread
18:21jenatali: You either accept that the thread submitting work to gallium waits for completion, or you put the completion processing on a separate thread, yeah
18:21jenatali: Those were the only solutions I came up with at least
18:23karolherbst: right...
18:28robclark: karolherbst: I do have userspace fences, ie. just a seqn.. freedreno's fence::wait() will check that before making a syscall
18:56jenatali: As it should be
19:52karolherbst: maybe I should write a patch... how hard could it be
20:51karolherbst: robclark: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/52b4f218e6445ef024b24333c8fcfb9020af84b5
20:51karolherbst: not a _great_ impl
20:51karolherbst: but might already help
20:51karolherbst: I want to improve it a bit...
20:51karolherbst: but anyway
20:52karolherbst: it's enough that test_basic seems to be happy
21:16robclark:looks
21:18karolherbst: mhh I think I'll need to rework error handling as well..
21:19karolherbst: though might be fine..
21:34robclark: it seems to work (at least for a case with no errors)
21:36karolherbst: but is it also faster or less stalling?
21:37karolherbst: I wonder if davinci resolve still works with that :D that reminds me.. I wanted to look into threaded compiling...
21:50robclark: a bit faster, so now I need to find the next bottleneck
21:50robclark: freedreno already does threaded compiling ;-)
21:52karolherbst: yeah.. but like.. that doesn't help when a lot of time is spent compiling C code to spir-V to nir :D
21:53karolherbst: also doesn't help that the compiler gets serialized to fetch the workgroup info
21:54karolherbst: anyway, glad to hear it's faster, I might clean it up tomorrow, but I guess it will have to wait for 25.3 now to land
22:13robclark: that's fine, at this point I think we'll be follow ToT or cherrypicking, since there are other things in flight ;-)
22:18karolherbst: heh, fair
22:19karolherbst: robclark: oh btw.. I am considering moving to gpu side timestamp queries for profiling.. not sure if that's properly supported with all drivers
22:20karolherbst: with iris get_timestamp and get_query_result_resource aren't well.. matching
22:20robclark: only thing that is sketchy for fd is qbo, like I mentioned before
22:21karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/rusticl/profiling/good?ref_type=heads
22:21robclark:still needs to write up a proposal for fw addition that would let us do ts qbo properly
22:21karolherbst: currently profiling stalls on the host, which isn't great :)
22:21robclark: right
22:21karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/ecbebf1f020a9416aaf674ecf65c5045d788b3df is the core change
22:22robclark: yeah, scaling (converting ticks to time) is the problem for us
22:22karolherbst: yeah...
22:22karolherbst: I've commited crimes to make it work for iris: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35281
22:23karolherbst: maybe something similar would work for you?
22:24karolherbst: I lose two low bits of precision with that which.. doesn't matter one bit
22:25robclark: other than the "on_gpu" part ;-)
22:25robclark: at least if I don't spin up a compute shader
22:26robclark: the sqe _could_ do it, but we'd need to add a new pm4 packet
22:26robclark: so capturing the value on the cpu is fine.. writing it to a pbo/resource is fine.. but for now we need to do the ticks->us conversion on the cpu
22:27karolherbst: mhhh right
22:27karolherbst: right we've talked about it, and with CL it's all fine, because there is no buffer mechanism in the API and the frontend could just apply a factor
22:27karolherbst: just need to know about that factor
22:28karolherbst: so `get_time` would just apply that one and everything should be fine
22:29robclark: yeah, if you made an optional pctx or pscreen callback to do the conversion, that would work
22:29karolherbst: mhh, does it have to be a callback?
22:29karolherbst: can't it just be a simple float?
22:30karolherbst: or is it more complex than that?
22:30robclark: _probably_? Upcoming kernel work to enable IFPC does change the counter that is used for cpu timestamp reads. I don't _think_ we'll need to apply an offset but not 100% sure yet
22:31robclark: callback would be more flexible to cover whatever cases
22:35karolherbst: I see
22:35karolherbst: also it's not like it's a hot path anyway
22:36robclark: right