02:15The_Doctors_Life: Does Nouveau support Vulkan?
02:26orbea: couldn't wait more than 10 minutes for an answer...
07:19interestedDude: well after every now and then someone gives sympathy to the doctors which i actually do not, after years of being humiliated they continued to sentence me on the meds, without legal reason to keep me in the custody trying to kill me with cubious amounts of meds, keeping all my documents, this would open up the door for mainstream morans to always t
07:19interestedDude: ry to always sentence in the stronger one saying that one is paranoid and let idiots to continue their brutal kill off attempts
07:19interestedDude: so this is not how it works, that docs can say due to you being paranoid we keen keep you in, it should work the other way around by laws, so they have to be sankionized for it
07:26interestedDude: allthough my life is gone those are so worrying notes for capable persons that quassimodos continue to wipe out from life people like, me and it has always been so in the past, some precedenses need to guard them, and this is why i am building a case there, hopefully our stars will not end up cracked up due born incapable half way handicaps
07:28interestedDude: they are not responsible what god chose to do despite what those ill persons who want them dead think about, bye
10:49interestedDude: robclark: i meant to optimize all the shader stages for different cards, developing a modified version of old well known theories, in case of memory fetch masking and maybe couple of ringbuffer operations in addition, this could be also done
12:10gsedej: hi! does 0f pstate works for OLD GTS250?
12:11gsedej: I can read pstate ... 0f: core 675 MHz shader 1458 MHz memory 900 MHz
12:31gsedej: which gpus are supported for pstate?
12:38gsedej: GTS 250 is Tesla. Does Tesla suport pstate reclocking?
12:39RSpliet: imirkin: for loads to/stores from local memory, does NVIDIA/TGSI decide to bypass the L1?
12:40RSpliet: gsedej: I implemented some code for that GPU I think. Let me double check which one that is
12:40RSpliet: oh, G92... I'm not confident about that one
12:41gsedej: ok, I taught so, just checking
12:41gsedej: I didnt have "*" when cat pstate
12:41RSpliet: I think I had it working on one card, but got reports from other users that it doesn't on theirs
12:42karolherbst: yeah, somebody here with a similiar card has issues
12:42RSpliet: so... well, just try and restart if your system crashes
12:42karolherbst: but we got it to work somewhat
12:42RSpliet: only with a very new kernel
12:42karolherbst: RSpliet: it doesn't work, cause the G92 is missconfigured
12:42RSpliet: timings slightly off?
12:42gsedej: i have 4.9, and mesa "git" (padokra ppa)
12:42RSpliet: gsedej: for PM only the kernel matters :-)
12:42karolherbst: RSpliet: no, it uses g8x stuff and the reclocking check is for g94+
12:43RSpliet: oh okay, that's probably a safety net to keep people from crashing their machine unintentionally :-P
12:43karolherbst: memory reclocking doesn't work cause the hswq script is too big
12:43karolherbst: engine reclocking works though I think
12:44RSpliet: that sounds unlikely... are we trying to cram too much data through the old interface?
12:45karolherbst: hswq script size is doubled for g94
12:45RSpliet: G92, according to envytools
12:45karolherbst: according to nouveau
12:45karolherbst: well sure
12:45karolherbst: but g92 is treated like a g86 in nouveau
12:45RSpliet: then maybe nouveau is wrong
12:45karolherbst: it is
12:46RSpliet: easy enough to fix, rename g94 to g92 and rewire, but needs double-checking from trace
12:47karolherbst: I did it for pcie already
12:47RSpliet: what unit is hwsq_size in?
12:48RSpliet: if bytes, seems to be a bit conservative. If double-words, sounds about right :-D
12:49RSpliet: judging by the upload loop seems to be dw
12:57gsedej: RSpliet, should be "GTS 250" supported for pstate in future?
13:33interestedDude: robclark: that was for ringbuffer engines, but talking about optimizing primitive generator, tessellator , rasterizer, clipping, blending shaders, i see that most of them can be traced with interrupts on amd and nvidia cards, since adrano is old imageon as i understood, then same applies to adreno as for AMD
13:34interestedDude: maybe with interrupt we could switch the shader and have called an optimized version, but without using an interrupt, i would not know other method to hite the latency of fixed function kernels on graphics
13:36RSpliet: gsedej_work: I wish I could say. Unfortunately I have very little time to sink into nouveau at the moment, and my primary focus has been on Fermi generation cards
13:43gsedej_work: ok, np
13:43interestedDude: on AMD cards we investigate RST_VTX/PIX_CNT would need to look if similar thing existed on NVIDIA cards, i do not find the information about what value the automatic counter will be reset to, it should had mentioned that in the docs
13:44interestedDude: is it either 0 value to start totally from the beginning or the needed last value for my case
13:44gsedej_work: RSpliet, btw, at "default" pstate I am able to play The Long Dark 20FPS on 1280x1024@low. This is good for that old GPU
13:44RSpliet: default speed is about half the max clocks I presume?
13:48gsedej_work: there was no "*"
13:49gsedej_work: and I already switched gpu
14:21RSpliet: ah, double-checked. pmoreau: NVIDIA seems to apply .gc caching strategy when data is to be transferred from global to local memory (eg. bypass the L1)
14:22RSpliet: sounds like an easy win for cache efficiency - hope TGSI realises this too
14:25interestedDude: There could be another way robclark i do not hope this to work, but when you have shared registers as backing lanes, from a shader you may be able to target the fixed function lanes to by using absolute address
14:25interestedDude: shuffle broadcasting them to something, same goes to iterations
14:26interestedDude: cause program counter can be manipulated directly on some gpu's imagine when you write all the lanes full of some command like setpc, wonder would it jump back from where it was immediately
14:39hakzsam: RSpliet: not sure, our caching strategy is not really smart :)
14:39RSpliet: hakzsam: I think the code emitter just takes the caching labels from TGSI one way or the other
14:40hakzsam: yeah, but it's highly hardware-dependent
14:40hakzsam: I'm pretty sure we could improve that...
14:40RSpliet: hakzsam: some bits are, but bypassing the L1 when you're just transferring from global to local (shared) mem is a pretty universal win presumably
14:41RSpliet: I don't know the front-end well enough unfortunately to check whether that's what we do
14:42RSpliet: but the _from_tgsi translates pretty blindly :-)
14:42hakzsam: yep :)
14:43hakzsam: RSpliet: like this https://cgit.freedesktop.org/~hakzsam/mesa/commit/?h=atomic_fixes&id=a5fd753118b1887ccada05aca6b0e1da090220a3
14:44hakzsam: more investigation is needed though
14:44hakzsam: but I noticed this few weeks ago
14:45RSpliet: yeah, makes a lot of sense
14:46hakzsam: would be nice to test if it improves perf
14:46hakzsam: but apps which use atomic counters are pretty rare
14:46RSpliet: oh yeah...
14:46hakzsam: and I don't have time to write my own benchmark :)
14:47hakzsam: [totally busy with the gm107 sched data calculator]
14:47RSpliet: I guess for global->local transfers it's easy to write a peephole opt if we need to... but it sounds like the GLSL->TGSI translation should have more knowledge than TGSI->NV50IR or later :-)
14:48RSpliet: perf is easily compared by checking the L1 hit/miss counters (thanks! ;-))
15:26interestedDude: programming the program counter, neccessarily would not have to go through the lanes, though it would highly make sense when it did
15:27interestedDude: gnu guy is flooding here it seems
15:27interestedDude: probably wanting to say my statements is a flood, gnustomp why not just buzz off and don't come back
16:15orbea: interestedDude: come on...it was only 8 join/quit. Hiding join/quit spam on your end is a better solution.
16:15orbea: probably was his bouncer doing something silly
16:28interestedDude: orbea: ok not really well aware about irc possibilities , there are quite many i do not like to, i mean hell with that
16:30interestedDude: orbea: but the thing with optimizations in the compiler, is that there are not so many well efficient ones possible, things poil down to either masking or managing to do with lanes something in more parallel way, for instance instead of bitwise operation to use a permute, or try to use some parallel convolution using again this cross-lane data shar
16:31interestedDude: the thing is two of the highest ranked opts are what i mentioned here, the instruction scheduling with masks or opcodes, or very good results can do avoiding divergence but also register reuse possibilities do good too
16:32interestedDude: the last one i have not specifically read much about, but on some load register reuse reported 4x improvement under pressure, i old times back pasted the pdf
16:32interestedDude: but it did not give much code, though had some formulas in it, it was dutch report
16:33orbea: i'm not sure why you are tellnig me this... I was just trying to say there are nicer ways to handle it than telling someone to leave and not come back. :)
16:35interestedDude: boil, but yeah for radeon there are some dead code ellimination and couple other llvm passes, that can enhance, generally the coda analysis and modify passes can not do wonders
16:37interestedDude: i am just telling if you want the performance guys then go for the real and also easy thing to provide the perf, i mean nowdays when reclocking is handled
16:40interestedDude: or some mean how pointers i.e indirect addressing can work on the ciruit, well there is just a decoder add-on for it when user posts a 64bit address..the source and dest regs will be contained in the address, taking around 16 bit , well pc is max 48bit
16:41interestedDude: and it just uses this information to do indirection in the regfile, so one address of the reg will point against the other, this is very cool feature
16:42interestedDude: i.e it is like a pointer against a register, so it never even remotely relates to any memory operation and hence is very fast register operation
17:05interestedDude: because all new gpus support that feature, not sure yet about lane shuffling, then loading the mask costs 8registers approximately
17:06interestedDude: then you do not use fetching them from memory, but you point the virtual addresses against register file, and fetch from there
23:30barteks2x: I have a small issue with nvidia optimus, when I close something that use the nvidia gpu, it resets my screen calibration (I set it using xcalib)
23:31barteks2x: any way to fix that?
23:34karolherbst: barteks2x: uhh well we don't do nvidia support in here :p
23:35karolherbst: there is #nvidia for that
23:35barteks2x: so what nouveau supports?
23:35barteks2x: I use nouveau driver
23:35karolherbst: ohh I see
23:35karolherbst: never used xcalib myself
23:36karolherbst: I could check it out tomorrow
23:36karolherbst: you could also create a bug otherwise I might forget about it
23:36barteks2x: if I find where
23:37barteks2x: I don't think it's related specifically to xcalib, but it may be so I said what I used to set it
23:38karolherbst: might be X related
23:38karolherbst: or xrandr
23:39karolherbst: barteks2x: well if I don't forget about it, I will test it tomorrow, actually have to sleep now, because it is getting late here
23:40barteks2x: should I create bug report for it?
23:40karolherbst: no idea
23:40karolherbst: maybe against xcalib
23:40karolherbst: somewhere on the freedesktop bugzilla
23:43karolherbst: anyway, I am off now
23:43barteks2x: that's interesting, didn't happen this time. Could be that it happens only when I use xcalib while something using nvidia gpu is running
23:43karolherbst: barteks2x: remind me if I don't come back at you
23:44karolherbst: barteks2x: might be
23:44barteks2x: I will do some more tests
23:44karolherbst: barteks2x: if you have any more info, just write it here and add my name to it, we have a log I usually read the next day