IRC Logs of #radeon on irc.freenode.net for 2023-09-07

02:47 periontus: I little bit looked at the old slides too, you know technically EFLAGS can be emulated, you can stitch a texture together into scratch registers, and mark it under some filtering as well as wrap modes, but i do not intend to handle it here, simplest is just fixed mode border color, which likely can be just added to call interrupt , with scratch registers it's that every chip needs to have them. The floating point format and
02:47 periontus: normalization has not caused any confusion here though, partly exposed scratch registers could likely do the reverse too, get regs dumped to memory. So this would help to get more ciscish behavior too. But there are also parallel color , alhpha, depth, stencil and such tests, technically in driver much could be changed and shader model is deviated quite a lot of the reality, there are no read write restrictions, all regs are RW, its
02:47 periontus: higher level guidance.
03:04 periontus: but about samsung s22 there are three solutions to combat the overheating problem on RDNA2 which with passive cooling would hinder performance under adreno 730 or 750 whatever the other model was, heat is a product of resistance on the circuits, 1. in longer run is to pull it steady states. can combine with 2 and 3. which are 2. this was offered by Tomas Stellard, more parallelism than single wavefront per group. 3. this card is cross
03:04 periontus: domain checking hackable to enter that contiguous dispatching mode also.
03:06 periontus: But it's more over AMD's decision i suppose, maybe they deal with some intellectual property hiding or non exposure ways, and that portfolio maybe partly third-party ones.
03:08 periontus: but i call it off for today again, i am dealing with opener, choosing the fixed function driver plumbing to start my development of HWFakerBird, i need one more day to arrive into conclusion or decision.
03:11 periontus: On pc relative addressing there is nearly nothing needed to be decided, it's all a direct call absolute addressing corner case, which is to me little bit akward to be supported, but possible, out of noble thinking can be offered too, one day they want os to run on gpus etc.
03:11 periontus: though in reality pc relative mode works always, and absolute addressing can convert to pc relative
03:49 periontus: I have been apart from this conversation, to stitch a buffer you should have some drm/dbm/kms or such methods it should already be abstracted away so you can just unite EGL kms and drm to make some ways through ioctls and map that memory.
03:49 periontus: i had seen some demos
03:53 periontus: Yes this is not cross platform, dxgi/egl on windows and whatever egl backend osx has are different
03:53 periontus: I am not sure if this can be done in pure EGL
03:55 airlied: a
04:17 periontus: I am quickly going through the egl, theoretically it could be possible if you could say that one context buffer is backed by a pixmap, with a little bit hacking one would get to so called command buffer address, and just making texture pixmap backings shifted to it, would execute whatever you stored there. It's somewhat partly indicated or documented thing. glMapBuffer has to return the address then, but i am not sure it would, yeah it
04:17 periontus: would cause it's allowed on textures.
04:42 periontus: however you look at it, it technically boils down to finding hacky ways to dig up the command processor head and tail pointers, maybe it is not possible, maybe only driver knows them, so vulkan could do that though
04:57 periontus: I have not seen a driver that implements no arb sync, fences are needed to trace the address, this Demi Marie asked similar things
05:01 periontus: and this is safe bet that command buffers are all indirect, but it's one way incremental addressing it prepares for, unless the context is somehow restarted , than it jumps only to the beginning
05:07 periontus: however this is called, when driver handles no context indirection a true version, than some soft stitching or soft pinning hack is needed on egl, but i think it by default relocates, i.e copies the things, which definitely does not meet the performance requirements
05:07 periontus: zero copy hack needed
05:09 periontus: soft pinning is something that provides fast procedure or function based access to the MMU or virtual memory mapping
05:12 periontus: otherwise there is only one idea left, to make it conveniently based of some non-driver hack, if out of bound texture access could be handled back into the beginning of cp
06:00 periontus: so time to investigate the hw command buffers again, i have not done it for years, so to find something that issues command by means of jumps, you know those can be draw call related or geometry, cause draw calls lack the index
07:09 periontus: it's future music , i can not handle it for now, but it's still a scratch register thingy, so when interrupt is received it writes the scratch registers and issues the pm4 commands through indirect buffer, so that indirect buffer can be intruded into, any other chance i did not notice
07:14 periontus: basically it can just call itself one more time, and jump anywhere it wants
07:16 periontus: it needs a bit driver investigations as to how all gpus call it , probably on UBO updates and such stuff
07:16 periontus: uploads
11:05 penguin42:is trying to understand some OpenCL performance from rusticl and I'm looking at the IR and asm by looking at AMD_DEBUG=cs,asm,llvm
11:06 penguin42: are there any hints on trying to understand the flow a bit more - e.g can I take tha tIR and pass it to llvm on the commandline and play around to watch it flow through llvm?
12:47 periontus: i technically understand why indirect buffer is needed, the opengl API requires that data uploads are async to the main ring, so that cpu view at the ring could schedule more commands in.
13:01 periontus: and that also why it needs to give access to full typeX packets, cause even software context switch not only hw one, could switch away from it.
13:19 periontus: It's just that vulkan conflicts with backward compatibility, not every IOT embedded device has that driver, we'd want to support all the rendering on any device per slogan.
13:26 periontus: I think we have all thought about frankenstein drivers, where part of the contexts are on cpu sw, that is what r300 and gma945 my first gpus did, but generally new gpus could elect to do the same, if they offered some auxiliary IO, but since dma does not distinguish memory from IO, that can be just exposed, and if you pack things and handle through unrolled elimination technique such driver turns out to be very fast too.
13:29 periontus: i've be tempted to try this GUD dri module too, someone knew USB protocol very well it seems, this is great idea...so with a hack on surfaces like EGL based mentioned, this can be cross platformed.
14:21 periontus: Yeah i apologise, too capable ideas, but there is just enough to get easy victory, you want to send all batch of the context through auxiliary memory for usb display to program self-permuting procedure, hdmi or gpu dma , but that goes crazy usb is very good for it, but wifi would suite well too, or gprs even. Estonians have this elmo technology, remote driven cars without pilot in car, but office, but i could help them to back data, they require 4g,
14:21 periontus: but gprs would do more than enough, you just need to pack data, to get into conflict over this is where i back off, but yeah such technology is scary btw. it is so capable.
15:53 penguin42:tries reasking his q from prior to that spam; are there any hints on trying to understand the flow a bit more - e.g can I take tha tIR and pass it to llvm on the commandline and play around to watch it flow through llvm?
20:40 mediaim: I am fed up of your bad code, so to clarify how the modern support works, it's not a lot of work to offer that switch, but it's a lot of work to maintain the compilation of all the world binaries, to maintain the compatibility of compiler output and relation to the final end packages. How it works is that it takes cpu code with linear sweep binary rewriter, and it can consume pc relative addressing to produce a executable by hashing all the
20:40 mediaim: contiguous codepath presentation into couple of very short passes, cause it is a flattened output, but since it is a pre-compiler it can consume absolute addressing too, but in an event to parse absolute addressing it still converts to composition which is hashed, by caching the code definitions in a lookup table, it's type of all or nothing, cause if there is even a small bug, nothing would work, so in event it needs not maintaining , cause to go
20:40 mediaim: mainstream it needs that no bugs ever can happen that switch the output to something that would not work, so the event loop is one small part of this overall code compressed container. Its views through the prism so that it turns everything ok, but it can be offered as install time , cause the compiler is so fast, it would install only one hour to compile the whole system binaries needed. Technically this could be a distro that has such binary
20:40 mediaim: rewriting blink fast compiler integrated into package manager.
20:42 mediaim: IT can offer such support for enterprise performance special Red Hat for instance, perhaps security and performance enterprise optimization release.
21:03 mediaim: I intend to serve as project developer and leader only for half to one year the last which is too much for me, then everyone will take over and anyone who wants can switch to modern linux, windows and osx, just one module is a synchronization and presentation to human brain. it just puts that machine to our frequency that can be configured by parameters, how deep it caches, it never blocks or produces heat.
21:09 mediaim: it's simply cause i do not have resources especially timely ones to maintain it anyways, cause i need to treat yet some of my internal things too.
21:10 mediaim: i have the money and equipment but not time for such things as 10years deal with parroting people. It wastes my life.
21:17 mediaim: it should serve as some crisis resolution installer, cause it helps a lot when things go bad, and there is not so much resources to resolute the crisis. world is so big.
21:19 mediaim: one day wall street torns, another day earth quake, then unexpected war to continue with, you could possibly die if you do not prepare to resolute those.
21:21 mediaim: other than that, cheers again, at least i am working towards such things. but can remain offline while doing it.