07:26 tomeu: kherbst: ok, will take that into account
07:27 tomeu: kherbst: btw, I see that PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE in clover_support_nir_target is a constant, instead of using vram_size
07:27 tomeu: guess there's a branch somewhere with more compute stuff?
07:27 kherbst: yeah
07:28 kherbst: tomeu: https://gitlab.freedesktop.org/karolherbst/mesa/commit/8339e2ed1f986344e04dc9cacfad1e082d7ff8f1
07:29 kherbst: but there isn't that much more which is usefull anymore
07:29 kherbst: most of the commits would have to be rewritten
07:56 tomeu: thanks!
10:28 mrsinisalu: MrPooper is very obsessively sharp guy, say like cryptomwk instead of playing with the brain neurons and doing sane stuff, he at AMD logs my activity, like usually i behave this and sometimes this, have you allready built something that predicts how i think VOLKOV BRAINPAIN over markov sanechain?
10:33 mrsinisalu: So anyways back to the point! I introduced some of the compression of the command stream in a known primitive fashion
10:35 mrsinisalu: The point is you'll have to do some resource computation and delay computation, highly easy arithmetic calculations.
10:36 mrsinisalu: to be able to plug such things in, the code should not be very difficult.
10:36 mrsinisalu: for the starters i also said, that actually VLIW is slightly nicer on this method than that of SIMD.
10:41 mrsinisalu: most important is -- how many queue entries chip has aka num_maxTG*num_maxTG and also maybe how many ALUs it has to able to run decompressor in parallel for various opcodes
10:44 mrsinisalu: and for VLIW chips for instance cypress it goes like this: there are 32waves on 16wide VLIW's SIMD, queues are hence 4*32*32 so it is calculated num_maxTG*num_maxTG*bundle_width on VLIW instead
10:45 mrsinisalu: so it is 2048 Per one SIMD
10:45 mrsinisalu: now alus per SIMD can be taken all in parallel on cypress chip so those are obviously 16*4 hence
10:46 mrsinisalu: 64 can be done in parallel on SINGLE CU/SIMD VLIW chips
10:46 mrsinisalu: those specs are lower for pure SIMD aka hw scoreboarding chips quite abit
10:52 mrsinisalu: next thing what is also preferable to be aware about or to know is how many VGPR or SGPR registers based oc occupancy a single TG -- Thread group would have
10:53 mrsinisalu: this calculation is again a simple one, so if the cluster of 16wide/lane VLIW SIMD processor has 256000bytes aka 256kb register file for instance
10:53 mrsinisalu: and maximum TG count is 32
10:54 mrsinisalu: you divide 256000 with 4 then 64 and then 32
10:55 mrsinisalu: result should be 32VGPRs of float or 16 as double
10:56 mrsinisalu: if you have fewer obviously that will hinder parallelism but you will have more then 32 VGPRs actually 8*4 vector registers
10:57 mrsinisalu: now the compression and scheduling i talked about on #dri-devel this has more variations on how to do this, many ways suite
11:00 mrsinisalu: before any of the coding is being started those 5mentioned things should be something that a programmer is aware about.
11:02 mrsinisalu: that is why hw journalists provide some of the cool info on the web, so a developer would have some less nervy start to the stuff, however scheduling and compression bits were not given on those journals or articles
11:03 mrsinisalu: as we know allready i did sniff out those bits intelligently from miaow hunks and multi2sim and attilagpu and such, mostly miaow in case -- as i was simulating some of the flow and debugged it a bit
11:11 mrsinisalu: and yep, for compressing and redirecting the queue entries to be executed in the corresponding functional units aka ALUs on chip
11:12 mrsinisalu: i suggested the fast pipe of subtract on two's complement system, this is equivalent of bitwise AND and OR basically two gate delays
11:13 mrsinisalu: the correct responce or answe comes with just ingoring the carry on twos complement subtract
11:14 mrsinisalu: i think all hw does it this way
11:16 mrsinisalu: I also referenced a turkish uni. page for this, i would had thought all were aware of this, however this is not seeming to be a case.
11:17 mrsinisalu: http://eem.eskisehir.edu.tr/egermen/EEM%20232/icerik/Week%206%20Arithmetic%20Functions.pdf just read page32 and forward from this pretty nice tutorial or paper or article or slides, whatever
14:09 juri_: wow. that was.. something.
15:50 FLHerne: Sometimes I'd swear someone trained one of these OpenAI text-generation bots on the graphics-dev channel logs
15:51 FLHerne: It's got the same characteristic: the words are right, the grammar is right, the sentences are structured coherently -- and yet there's no meaning...
19:15 frobos: I'm using kernel 4.14.118 and decided to test nouveau for a while. It seems that the driver stability improved since my last usage. I'm running the system for more than a day and no system freeze yet
19:15 frobos: tested on several games and reclocked to max, all seems fine
19:17 frobos: for this Kepler NVE4 it seems to be good