00:39 karolherbst: imirkin_: ohhh I kind of understood the issue with the 0x10 alignment in tgsi. we don't know the component size of what the variable index is based on
00:40 imirkin_: :)
00:40 karolherbst: but in nir we know :)
00:41 imirkin_: well
00:41 imirkin_: in tgsi the variable index is well-defined -- it's over the temps
00:41 imirkin_: which are vec4's
00:41 karolherbst: well right
00:41 imirkin_: now we coudl be smart and determine that some array is actually a vec2
00:42 karolherbst: but that lead to x4 increases lmem usage
00:42 imirkin_: and back it better in lmem
00:42 karolherbst: in the worst case
00:42 imirkin_: i just never got around to it, i think
00:42 imirkin_: i did have some nice wins with some other packing efforts i made
00:42 karolherbst: ahh
00:42 imirkin_: but i don't think i ever saw non-vec4's being indirectly accessed
00:42 karolherbst: in practise?
00:42 imirkin_: in real shaders - obviously piglit has them
00:42 karolherbst: I see
00:42 imirkin_: i didn't look _that_ hard...
00:43 karolherbst: well maybe people figure out that kind of stuff is rather slow on nv hardware and don't use variable arrays at all? dunno. I kind of never saw stuff like that
00:44 Guest83: what are you working on guys?
00:45 imirkin_: yeah, it's pretty rare
00:45 imirkin_: the big win in packing was from fffb559129dd1ae978ec7f9ba30b4ae97a5ebbcc
00:46 imirkin_: and e231f59b6d5b12035a8041305c3a732d39a39c19
00:46 Guest83: i know almost how they work on LLVM layer, but i think indirect addressing in TGSI, since are extensive docs, should be fairly easy too
00:46 Guest83: does hw nvIR or whatever you have finalize those correctly from tgsi?
00:48 karolherbst: imirkin_: huh, the first one: they don't usually start at 0?
00:48 imirkin_: karolherbst: an array can start anywhere in tgsi temp space
00:49 karolherbst: I see
00:49 imirkin_: which was tied to where it sat in lmem
00:49 karolherbst: in nir I lowered the variables to registers :) and they get their slow inside lmem if they are an array which are variable accessed :)
00:49 karolherbst: otherwise they are just ssa values
00:50 karolherbst: *slot
00:50 imirkin_: erm
00:50 imirkin_: huh?
00:50 karolherbst: so I guess I don't have to care about both of those issues anyway :D
00:50 imirkin_: what if it's, say,
00:50 imirkin_: if () { a = 1 } else { a = 2 } situation?
00:50 imirkin_: doesn't a become a register?
00:51 karolherbst: it does
00:51 karolherbst: but
00:51 karolherbst: but an array one
00:51 karolherbst: not
00:51 imirkin_: so how do you decide where in lmem to stick it?
00:51 karolherbst: I don't
00:51 karolherbst: becuase it is a reg backed up by a value, not lmem
00:51 karolherbst: only if it is an array, it gets lmem
00:51 imirkin_: right....
00:52 imirkin_: and where in lmem do you place the array?
00:52 karolherbst: I start at 0 and reserve the needed space
00:52 karolherbst: the second var will start at whatever size the previous one was
00:52 imirkin_: right
00:52 karolherbst: *had
00:52 imirkin_: so effectively same thing i did :)
00:52 karolherbst: yeah :)
00:52 karolherbst: but less painful
00:52 imirkin_: but then you have to have a lookup table
00:52 imirkin_: to know which array starts where
00:52 karolherbst: right
00:53 imirkin_: so ... pretty much identically painful
00:53 karolherbst: arrayToLemem[reg->index]
00:53 karolherbst: done
00:53 imirkin_: previous to that change, we just used the tgsi indices
00:53 Guest83: you are spamming the channel pretty cruely even though all the details about arrays i tried to give, however i have more
00:53 karolherbst: imirkin_: right
00:53 karolherbst: imirkin_: I was more refering to the 0 rebasing thing
00:54 imirkin_: well, that's what you're doing
00:54 karolherbst: but yeah, in practise it should be fairly the same idea
00:54 imirkin_: except instead of rebasing, you're basing :)
00:54 imirkin_: since you have nothing to start with
00:54 karolherbst: :)
00:56 karolherbst: ohh by the way, did you write the fix after I found that memory kill bug in metro?
00:57 karolherbst: no idea if that is fixed by now
01:01 imirkin_: which one?
01:02 karolherbst: well, where the host just runs out of memory
01:03 Guest83: seems like circulating dejavu about pciBARs endless ramble not comprehending the names of opencl nor cuda address spaces etc. me inspecting the the logs and being banned , everything is in endless loop without progress, seems like nuts are in your heads instead of brains
01:13 Guest83: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/373854.html indirect addressing is only a bit tricky most the magic is revealed there, but not all the usage cases, they are studied with testcases, three important test cases are with those
01:20 gesturazing: i solved the cloaking stuff to some degree, could be done even better, but next thing we do is i dispatch some guys to new york to take care of this troll, couple punches and hopefully his brain will functioan a bit more, and hell buzz off from that channel for once
03:06 karolherbst: imirkin_: mhh, MemoryOpt doesn't like me packing the arrays :/
03:12 karolherbst: duh....
03:12 karolherbst: mul u32 $r1 $r1 0x0000000c + ld u64 $r2d l[$r1+0x30]
03:13 karolherbst: before that we always shl by 4, wo it is _always_ aligned to 0x10
03:13 karolherbst: but....
03:13 karolherbst: pain
03:21 imirkin: yes.
03:22 imirkin: i think that MemoryOpt is making invalid assumptions about the relative offsets
03:22 imirkin: or rather ... they used to be valid ;)
03:22 imirkin: perhaps have a flag in the nv50_ir_prog_info which specifies whether it can merge indirects or not
03:23 imirkin: a sophisticated compiler would keep track of pointer alignment
03:23 karolherbst: well
03:23 karolherbst: we could just check the instruction of the indirect
03:24 karolherbst: if it is a shl by at least 0x4 or a mul & 0x10 or whatever
03:24 karolherbst: but then we could still be a bit smarter
03:24 imirkin: yeah, but there's other ways of knowing the alignment is correct
03:24 imirkin: but yeah
03:24 imirkin: this is why gcc has __align stuff :)
03:24 karolherbst: :D
03:24 karolherbst: this sounds like a horrible painful topic
03:25 imirkin: [and i'm sure llvm does too]
03:25 imirkin: yes.
03:25 imirkin: esp since merging loads gains little it seems
03:25 karolherbst: well
03:25 karolherbst: after disabling it, the shader actually lost 2 instructions :)
03:25 karolherbst: well for ld/st with indirects
03:25 imirkin: yeah
03:25 imirkin: well that can happen since loads have to be on register boundaries
03:26 karolherbst: and you can throw away the indirect
03:26 imirkin: i.e. load 128 has to go to a quad-wide reg, etc
03:26 karolherbst: ohh
03:26 imirkin: which can cause extra movs in some cases
03:26 karolherbst: right
03:27 karolherbst: the test is a smart one though, two vec3[4] and selects those with a <4 check
03:27 karolherbst: and does a -4 for the second vec3 then
03:27 karolherbst: I am sure you can catch a lot of issues with that
03:28 karolherbst: but I am actually wondering what would happen with out of bounds accesses
03:28 karolherbst: or does it only matter if you go out of bound of "memory"?
03:28 karolherbst: I mean for robustness for example
03:28 imirkin: if it's undefined you can return whatever
03:29 imirkin: i don't think you have to return 0 for robustness there
03:29 karolherbst: mhhh
03:29 karolherbst: I hope no application depends on stuff like that....
03:29 karolherbst: I mean, feeding crappy indexes and depend on the packing of the compiler to do the right stuff
03:29 imirkin: indirect accesses are rare.
03:30 karolherbst: doesn't make such bugs less annoying to figure out
03:30 karolherbst: I was just wondering anyway
03:33 imirkin: btw, i presume no objections to that maxwell+ textureGrad patch?
03:33 imirkin: haven't seen any issues
03:33 imirkin: ?
03:34 karolherbst: well, I didn't run a full piglit without the nir stuff yet
03:34 karolherbst: so no idea how reliable that is
03:34 imirkin: want me to hold on to it for a while? or push?
03:34 imirkin: i.e. are you planning on testing? or not? if not i'll just push
03:34 karolherbst: if I don't forget I will do a full run tomorrow
03:35 imirkin: k
03:35 imirkin: i'll push tmrw night if you don't find issues then
03:35 karolherbst: sounds good
06:53 bazzy: well that's strange. Was gonna do some mmiotracing tonight. Brought the monitor downstairs so I could watch a movie while doing it. noticed i wasn't getting a signal on hotplug. Set my monitor to force DVI instead of "auto detect" -- nothing; had to enable the monitor via xrandr. Now there is no flicker for the past 15 minutes+
06:54 bazzy: I've set it back to auto-detect and will see if it flickers. seems as if it won't though.
06:57 rhyskidd: can i get a review on remaining patches 1 & 2 in this series? https://patchwork.freedesktop.org/series/34796/
07:15 bazzy: I can even hot plug the monitor without getting flicker. Did I receive some kind of magical noveau update Christmas miracle?
07:15 bazzy: s/hot plug/unplug/
07:16 bazzy: dmesg and Xorg logs are reporting noveau
07:16 gnarface: my first guess is just that maybe forcing a specific resolution with xrandr fixed it somehow
07:17 gnarface: maybe auto-detect was grabbing an invalid one, or just one the monitor didn't like as much
07:17 gnarface: wouldn't be the first time
07:17 gnarface: if you run xrandr without any options, it spits out the current config
07:18 gnarface: maybe comparing that before/after state will provide some clues
07:19 gnarface: if it's a different physical cable and/or display though, i wouldn't rule that out either
07:20 gnarface: even just whether it's plugged in or not before cold power-on (and to which port) is something that sadly, has mattered in some cases, even though it shouldn't
07:21 bazzy: gnarface: good idea! all i did was `xrandr --output DP-1 --auto` I usually never had to do this in XFCE4. Now i'm using i3wm and those luxuries are passed. (which I don't mind). I am also using a different power cable. Still odd, as I remember nvidia drivers working under prior circumstances. So there could still be something to be gleaned from mmiotrace. (if the problem shows itself again)
07:22 gnarface: i doubt the *power cable* is the issue, but anything is possible
07:22 bazzy: gnarface: you're opening my eyes to all the different control variables to this experiment
07:22 gnarface: if the power cable IS an issue, it's probably also a fire hazard ...
07:22 gnarface: (definitely wouldn't be the first time for that either, but usually it's an all-or-nothing thing)
07:28 bazzy: for the record, my xrandr output from the day i filed the flicker bug matches current xrandr output. (when just running `xrandr`)
07:29 bazzy: I wonder if the additional amount of electrical appliances / using a powerstrip from the other room had an effect
07:31 bazzy: or the fact that the monitor was disabled on initial hotplug.
07:32 bazzy: I'm going to try rebooting while it's connected.
07:32 bazzy: brb
07:34 bazzy: it's flickering again
07:34 bazzy: the tranquil Christmas miracle has passed
07:34 bazzy: I've been given a clue
07:35 bazzy: unplugging didn't crash / make my LVDS screen flicker like crazy. *whew*
07:35 bazzy: still flickering after replug (enabled on replug)
07:36 bazzy: I'll continue my reports in a file
08:18 bazzy: https://paste.pound-python.org/show/wUatSkyaJCYW6esomO4U/ -- hastily logging some experiments regarding screen flicker bug
08:19 bazzy: unfortunately, I wasn't able to reproduce the flicker free scenario I accidently stumbled into tonight
08:37 bazzy: I think it has something to do with xrandr caching the display somehow. eg it says "DP-1 disconnected (normal left inverted right x axis y axis)" and I'm not sure how to make it a "fresh connection" to see if that makes a difference (my reboot tests didnt seem to help)
08:38 bazzy: ugh this is just motivation to go ahead with the mmiotracing
09:48 karolherbst: bazzy: well, such issues are painful to debug...
09:48 karolherbst: imirkin: no idea why, but codegen/RA decided to do things like this: "export b128 # o[0xa0] $r13q" and I have no clue why. Any ideas?
09:53 karolherbst: mhh, output is a "varying mat3 dst_matrix[3];"
09:54 karolherbst: could be some packing super messup though
09:55 karolherbst: but I don't see why RA would even care
10:03 pmoreau: karolherbst, imirkin: It doesn’t really surprise me that you have sched ops for every instruction: in Volta each lane in a warp has its own PC, making it possible to start running an if statement with part of the warp, wait on a memory request and start running the else branch with the other part of the warp.
10:04 pmoreau: If you had sched ops only every 3 instructions, you would need to keep a stack of previous sched information, which would be messy.
10:25 karolherbst: pmoreau: ohh right, with Volta we got instruction level preemption now
10:32 karolherbst: pmoreau: I will send out a RFC for the nir stuff today
10:33 karolherbst: hopefully
11:31 pmoreau: karolherbst: Cool! Hopefully I’ll do a better job at reviewing it than your reclocking series -_-"
13:19 AndrewR: 16:08:23 up 6 days, 54 min - good ...but I have feeling I better not to set this machine to max CPU clocks (3.8Ghz). Right now it runs as 1.4Ghz * 4 cores, and speeds up to 3.4Ghz if I need more speed for compilations, etc. My previous attempts usually lead to some hang either at resume from S3 , or 'randomly'. May be I should recompile kernel with powersave policy, so it will wake up with min CPU clocks....
13:20 AndrewR: also, for me LKML is down :/
15:51 karolherbst: I say ship it :)
15:52 feaneron: https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-GP108-Firmware \o/
15:52 karolherbst: ohhh
15:52 karolherbst: next nouveau news incoming though (I guess?)
16:01 feaneron: Pascal gpus have no reclocking support at all? or do they just lack automatic reclocking?
16:01 pmoreau: feaneron: No card has automatic reclocking ;-)
16:01 karolherbst: pmoreau: happy reviewing
16:01 pmoreau: Pascal GPUs have no reclocking support at all IIRC
16:02 pmoreau: karolherbst: Yup, thanks, added it to the todo list already :-D
16:02 karolherbst: ahh cra
16:02 karolherbst: p
16:02 feaneron: duh, stupid feaneron
16:02 karolherbst: I wanted to sent it as a RFC
16:02 pmoreau: feaneron: No worries :-)
16:02 pmoreau: Boooh karolherbst ! :-p
16:03 karolherbst: :(
16:03 pmoreau: I don’t think it’s that bad though :-)
16:04 karolherbst: well
16:04 karolherbst: the messages are
16:04 karolherbst: having TODOs in those are usually not great
16:07 karolherbst: I am now wondering: ubos or geometry?
16:07 pmoreau: I didn’t remember there were so many files involved for hw accel
16:07 karolherbst: :D
16:08 karolherbst: pmoreau: well, most of mesa is involved in hw accel ;)
16:08 pmoreau: I’m talking about the fw drop for GP108
16:08 imirkin: karolherbst: ubo's are going to be a whole lot more useful if you want to test stuff.
16:08 pmoreau: They didn’t modify Mesa AFAICT
16:08 imirkin: pmoreau: they've broken a bunch of stuff out as separate files
16:08 imirkin: pmoreau: also there's more work involved with GM20x+
16:08 karolherbst: imirkin: mhh.. I kind of hope that I get a crappy geometry shader support quite fast though
16:09 imirkin: karolherbst: good luck with that.
16:09 pmoreau: imirkin: OK, thanks
16:09 karolherbst: imirkin: I fear for my life now...
16:10 karolherbst: pmoreau: well basically it cuts down to this: a lot of sigs
16:10 karolherbst: three engines
16:10 imirkin: actually it may not be so bad -- it's definitely annoying to get it all right, but perhaps the input ir stuff is straightforward
16:10 karolherbst: that nvdec crap + sec2
16:11 karolherbst: and acr is kind of the pre loader
16:11 imirkin: i forget where the difficulties lie
16:11 karolherbst: I think I know
16:11 karolherbst: or well
16:11 karolherbst: currently I have no idea how to handle those stream-ids
16:11 imirkin: i guess it was mostly input loading lowering stuff that was tricky back in the day
16:11 imirkin: those are just args to emit/cut
16:11 karolherbst: ohh
16:11 imirkin: immediates
16:11 karolherbst: k
16:12 karolherbst: that sounds easy then
16:12 imirkin: see how tgsi does it and do the same thing
16:12 karolherbst: most tests involving geometry shaders is crashing now, because of asserts, you can't just move up from there really
16:12 karolherbst: yeah
16:13 karolherbst: for ubos I have to figure out how that constant buffer mapping works out
16:13 imirkin: the thing with GS is that inputs become 2d things
16:13 karolherbst: I doubt it is always in c1 like the one tests I was looking at
16:13 imirkin: i.e. you load attribute X of input vertex Y
16:13 karolherbst: I also saw that restart op
16:13 karolherbst: weird stuff
16:13 imirkin: that's all part of emit/cut
16:14 imirkin: i think cut == restart
16:15 karolherbst: imirkin: did you saw my comment about those weird outputs?
16:16 imirkin: no
16:16 karolherbst: no idea why, but codegen/RA decided to do things like this: "export b128 # o[0xa0] $r13q" and I have no clue why. Any ideas?
16:16 imirkin: like why it's using a misaligned reg?
16:16 karolherbst: yeah
16:16 imirkin: i have no idea. you must have done something horrible to it
16:16 karolherbst: :D
16:17 karolherbst: that was my guess, but this doesn't help
16:17 imirkin: it tries really hard to align those.
16:17 imirkin: are you trying to export it in one go?
16:17 imirkin: or is memoryopt merging it?
16:18 imirkin: (hint: let memoryopt merge things, don't manually create > 32-bit loads except for stuff like int64/fp64)
16:18 karolherbst: memoryopt
16:18 karolherbst: but mhh, I think I fixed it? let me check something
16:18 karolherbst: or maybe not
16:18 karolherbst: I have to find the test again
16:20 karolherbst: interesting
16:20 karolherbst: I think I know the issue
16:21 karolherbst: mhh, okay no, I don't know it
16:22 karolherbst: but the most weird thing is not that one export, but the next one
16:22 karolherbst: export b128 # o[0xb0] $r14q (8)
16:57 imirkin_: oh
16:57 imirkin_: er no. i dunno
16:57 imirkin_: i'd need to debug it. and i have neither the time nor desire
16:57 karolherbst: :)
16:57 karolherbst: well
16:58 karolherbst: at some point I may figure it out
18:57 karolherbst: nice, geometry shaders are working fine :)
18:58 bazzy: I've got my nv mmio trace, and used envytools' demmio on it :). I made a Gentoo ebuild for envytools to manage the sys-deps
18:58 karolherbst: bazzy: there is one already
18:58 karolherbst: x11 overlay
18:58 bazzy: oh well I needed the ebuild practice
18:59 bazzy: karolherbst: you use Gentoo?
18:59 karolherbst: yeah, on two of my machines
18:59 bazzy: good to know
18:59 imirkin_:too
19:19 bazzy: is it normal to see a fair amount of "unknown instructions" in the demmio?
19:19 bazzy: they are all to PGRAPH.CTXCTL_DATA
19:20 bazzy: it's interspersed with known ones. aside from that register, everything else appears to be known (at a glance)
19:24 karolherbst: bazzy: those are repeate instructions
19:24 karolherbst: *repeat
19:25 karolherbst: those are weird string operations on x86 hardware
19:25 bazzy: let's be sure. Here are a couple examples
19:25 bazzy: [0] MMIO32 W 0x400328 0x0045004d PGRAPH.CTXCTL_DATA <= 0000000b: 0045004d ??? [unknown: ??????4d] [unknown instruction]
19:25 karolherbst: ohh
19:25 karolherbst: this is something else right
19:25 karolherbst: mhh
19:25 karolherbst: mostly garbage
19:25 karolherbst: we never know if the upload real data or binaries
19:26 karolherbst: or scripts or whatever
19:26 imirkin_: bazzy: iirc yes
19:26 bazzy: here's is a "known" followed by unknown
19:26 karolherbst: bazzy: luck
19:26 bazzy: [0] MMIO32 W 0x400328 0x0070009d PGRAPH.CTXCTL_DATA <= 0000000a: 0070009d set pm1
19:26 imirkin_: bazzy: i don't think that decoder has kept up with reality
19:26 bazzy: [0] MMIO32 W 0x400328 0x0045004d PGRAPH.CTXCTL_DATA <= 0000000b: 0045004d ??? [unknown: ??????4d] [unknown instruction]
19:27 bazzy: well my hardware is from ~2009 ^_^
19:27 imirkin_: i stand by my comment :po
19:28 bazzy: i haven't grossed it over yet, but at least the lines between plugging in the mDP cable and activating the screen seem to be free of unknowns
19:29 imirkin_: i assume you used the 'MARK' functionality?
19:29 bazzy: yes of course
19:29 imirkin_: like a pro!
19:29 bazzy: all marks are tagged with the string "bazz" as well
19:30 bazzy: i forgot they would be tagged MARK
19:31 bazzy: i marked from boot -> VT, VT -> X, plugin mDP, activate screen via xrandr, deactivate via xrandr, and unplug mDP cable
19:32 bazzy: I will upload an xz of the log to the bug tracker, assuming ~4MB file is permissible?
19:33 karolherbst: hopefully
20:00 bazzy: Re: screen flicker, to whom it may concern ( imirkin_: pmoreau gnarface ?) I've uploaded a de-mmiotrace with trace marks to https://bugs.freedesktop.org/show_bug.cgi?id=88272#c8
20:00 bazzy: Could someone guide me what part of nouveau source I should be refering to?
20:00 imirkin_: hopefully the mmiotrace, not the demmio output?
20:01 bazzy: I have both on disk, but only uploaded the demmio output.
20:01 imirkin_: hm ok
20:01 bazzy: I'll upload the raw mmiotrace (xz) to be sure
20:02 imirkin_: xz -9 is your friend :)
20:04 bazzy: imirkin_: OK :)
20:04 pmoreau: bazzy: I would say, most likely https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nv50_display.c and https://github.com/skeggsb/nouveau/tree/master/drm/nouveau/nvkm/engine/disp
20:05 imirkin_: the other option is subdev/fb
20:13 bazzy: I uploaded the raw mmiotrace @ https://bugs.freedesktop.org/show_bug.cgi?id=88272
20:45 pmoreau: karolherbst, imirkin_: Regarding the NIR series, should we start naming new stuff with nvir instead of nv50_ir?
20:45 karolherbst: pmoreau: I would rather do a complete renaming
20:45 karolherbst: a mix is always unpleasent
20:45 pmoreau: I could look into renaming the existing code.
20:51 imirkin_: pmoreau: yeah, renaming sounds fine to me.
20:51 imirkin_: can also move it into src/compiler
20:55 pmoreau: outside of src/gallium/drivers/nouveau/codegen? like the Intel compiler?
20:56 imirkin_: yeah
20:56 pmoreau: OK
20:56 imirkin_: basically mv codegen src/compiler/nvidia
20:56 imirkin_: or ... whatever the convention is
20:56 pmoreau: So move all the RA and other stuff but keep the emitting part in codegen?
20:57 imirkin_: i just meant the whole thing
20:57 imirkin_: anyways, can be done later.
20:57 karolherbst: I would call it nouveau :)
20:57 karolherbst: not nvidia
20:57 imirkin_: the precedent is to use the manufacturer, but wtvr
20:57 karolherbst: really?
20:57 imirkin_: i.e. it's src/amd not src/radv
20:58 karolherbst: I see
20:58 imirkin_: and i guess src/nvidia/compiler rather than src/compiler/nvidia? dunno.
20:58 karolherbst: in src/compiler is just generic stuff
20:58 mwk: wouldn't nvidia imply being "official"?
20:59 karolherbst: I think the idea was to move opengl unrelated stuff into src/$name
20:59 imirkin_: right yeah, so it'd require a bit of splitting.
20:59 karolherbst: well, before we work on vulkan we won't need to split
20:59 karolherbst: and the person working on vulkan, will have to split anyway :)
20:59 karolherbst: problem sovled
20:59 karolherbst: *solved
21:00 imirkin_: yeah, might be more of a pain than it's worth right now to split it up
21:00 imirkin_: ok, so just codegen -> nvir and nv50_ir -> nvir
21:00 karolherbst: robclark: how would you call it if you moved ir3 to top level?
21:01 imirkin_: but keep it all where it is
21:03 robclark: karolherbst, not quite sure I understand the question
21:03 karolherbst: robclark: well, like the src/amd and src/intel stuff
21:03 robclark: you mean outside of gallium?
21:03 karolherbst: how would you name the directory?
21:03 karolherbst: yeah
21:04 robclark: hmm, src/freedreno/ir3 or src/ir3 I guess?
21:04 karolherbst: well we were wondering, because everybody used the vendor names so far, but in our cases this wouldn't be really appropiate, right?
21:05 robclark: I thing "nouveau" would be fine
21:06 robclark: the ones that use vendor names are ones that used vendor names in the driver in the first place :-P
21:06 imirkin_: why is it any more appropriate for amd?
21:06 robclark: well, I mean amd was never using non-vendor names for things
21:07 robclark: I didn't want to call "freedreno" as "adreno" since that is a (tm) (or (c)?)
21:08 robclark: and presumably same reason for nouveau vs calling the gallium driver "nvidia"?
21:08 RSpliet: yeah I think vendor name is only used when it's contributed by the actual vendor?
21:08 imirkin_: ... like radv?
21:09 karolherbst: well I wouldn't use a vendor name
21:09 karolherbst: except that vendor contributes a lot
21:09 robclark: I guess radv is getting a bit close to vendor name but at least it isn't a hostile vendor ;-)
21:10 karolherbst: for me it doesn't matter how hostile the vendor is
21:10 RSpliet: or bipolar
21:10 robclark: heheh
21:10 karolherbst: at least there is some form of fun with nvidia
21:12 pmoreau: imirkin_: On the other hand, I could see the point of doing the renaming + the move all at once, to avoid renaming/moving twice.
21:20 karolherbst: mhh, now the world knows
21:21 pmoreau: :-D
21:21 karolherbst: took long enough through
21:21 karolherbst: *though
21:24 pmoreau: imirkin_: Ping on https://patchwork.freedesktop.org/patch/191743/
21:26 karolherbst: pmoreau: mhh actually
21:26 pmoreau: Yes?
21:26 karolherbst: pmoreau: I may want to add a few more to those
21:26 imirkin_: pmoreau: why is that pass where it is?
21:26 imirkin_: i.e. why isn't it done as part of LegalizeSSA?
21:26 karolherbst: pmoreau: that's kind of the question I had in mind as well ;)
21:27 pmoreau: Having another look
21:27 karolherbst: we need a place where we can translate everything the hardware can't do away
21:27 karolherbst: but
21:27 karolherbst: still the issue remains about optimisations
21:27 karolherbst: maybe we should do it once pre SSA and once post SSA and be able to move translations back and forth
21:27 karolherbst: well some pre and some post
21:28 karolherbst: or maybe even only do it conditionalle
21:28 karolherbst: *conditionally
21:28 karolherbst: like if we get a mul with two immediates, we don't have to lower it away
21:29 karolherbst: well, dunno
21:30 pmoreau: imirkin_: Indeed, it should probably be part of LegalizeSSA.
21:31 karolherbst: pmoreau: could you make it so that we can have LegalizeSSA things valid for all chipsets? Or find a place where we can just call comming lowerings
21:32 pmoreau: What do you mean by “comming lowerings”? Common ones?
21:32 karolherbst: well
21:32 karolherbst: currently we don't really share anything between nv50 and nvc0+ here, just something to keep in mind I mean
21:35 pmoreau: True, I would need to duplicate the handling of 64-bit MUL/MAD otherwise.
21:36 karolherbst: those shaders@point-vertex-id tests...
21:36 karolherbst: why do they fail
21:37 karolherbst: ohh, they use edgeflag... right
21:43 pmoreau: I guess we could have a LegalizeSSA class, from which both NV50LegalizeSSA and NVC0LegalizeSSA derive, and implement in it the common code?
21:43 karolherbst: maybe
21:43 karolherbst: but then you are end up calling in the sub classes anyway
21:43 karolherbst: except you overload all methods
21:43 imirkin_: or just call the helper in both
21:43 imirkin_: easy enough
21:43 karolherbst: ;)
21:43 karolherbst: I think we should have a lowering helper
21:44 karolherbst: which is always conform to SSA
21:44 pmoreau: Lowering helper works as well
21:51 karolherbst: mhh, sampler and texture offsets, I might want to implement those as well :)
21:52 karolherbst: but somehow I didn't got my head around those yet...
21:52 imirkin_: texture offsets are just another arg
21:52 imirkin_: this stuff has to be fed in in a very precise order
21:52 imirkin_: so look at the tgsi logic
21:52 imirkin_: for that order
21:53 imirkin_: build util already has the post-ra helper
21:53 imirkin_: might as well have the pre-ra helper
21:53 karolherbst: I thought I would have to do something like with the offsets thing as well?
21:53 karolherbst: or are those really just args?
21:54 imirkin_: yeah
21:54 imirkin_: just args
21:54 imirkin_: i forget where they go in the flow of things
21:54 imirkin_: but all that logic is EXTREMELY fragile
21:54 karolherbst: :)
21:54 karolherbst: I noticed
21:54 imirkin_: so you HAVE to do it in the *precise* order that tgsi does it in
21:54 karolherbst: yeah, I noticed
21:54 imirkin_: since then each generation of nvidia hw wants its TEX arguments in a different order
21:55 imirkin_: so there's all kinds of hacks around reordering them
21:55 imirkin_: and combining
21:55 imirkin_: etc
21:55 karolherbst: sounds... aweful
21:55 imirkin_: oh, right - actually the offsets might go on the side
21:55 imirkin_: hold on
21:55 karolherbst: maybe having something like tex->setTextureOffset would be the better solution here ;)
21:55 imirkin_: ok, right
21:55 karolherbst: well, they are going into the args at least, so much I know
21:55 imirkin_: so TexInstruction
21:55 imirkin_: has an offset[4][3]
21:55 imirkin_: which you should stick stuff into
21:55 karolherbst: yeah
21:56 karolherbst: I do that for "offsets" already
21:56 imirkin_: normally you only use offsets[0][i]
21:56 imirkin_: i don't think nir supports the 4-argument textureGatherOffsets equivalent
21:56 imirkin_: so you don't have to worry about that
21:56 imirkin_: but there's a TG4 option to have 4 separate offsets for each texel gathered
21:57 imirkin_: (why, you ask? no clue.)
21:57 karolherbst: uhhh
21:57 imirkin_: and nir doesn't support it coz it doesn't have to
21:57 imirkin_: it just lowers it into 4 tg4's
21:57 imirkin_: and moves on with life
21:58 imirkin_: which is what intel and amd hw want anyways
21:58 imirkin_: (and adreno)
21:58 imirkin_: afaik nvidia is the only one that supports that in hw directly
22:03 karolherbst: okay
22:14 pmoreau: karolherbst: Just playing a bit with the loops again: increasing func->loopNestingBound when I generate a back edge does solve the register being reused for something else, but I am not getting the correct result, probably something else is wrong.
22:15 pmoreau: But I’ll need to keep a better track of the ongoing CFG.
22:17 pmoreau: BTW, can’t remember whether I pinged you or not about it already today, but there might be some movement for upstreaming the SPIR-V backend in clang/LLVM.
22:17 pmoreau: karolherbst: -^
22:19 karolherbst: I think your test is wrong
22:20 karolherbst: what result do you get, 50?
22:21 karolherbst: I think I got a different result on nvidia and some other opencl driver, not quite sure
22:27 pmoreau: I get 20 :-D
22:27 pmoreau: But yes, the expected result is wrong, I think it should be 55.
22:28 pmoreau: Clearly 20 can’t be right: without the if statement (so, only adding numbers from 0 to 10, I get 45 which is the correct result.
22:34 pmoreau: Ah, I think the remaining bug in the loop comes from my out-of-SSA pass.