00:01 imirkin_: i dunno - i think the from_tgsi thing is actually pretty simple
00:01 imirkin_: it's a lot of lines of code, but most of it is just enum <-> enum conversions
00:01 karolherbst: true
00:02 karolherbst: the CFG stuff is kind of messy though
00:02 karolherbst: in nir this is much more straightforward
00:02 karolherbst: or at least you know your context if you parse a nir_if/nir_loop node
00:02 karolherbst: in TGSI you have to push bbs on stacks and have a state machine and so on
00:02 karolherbst: don't need this with nir
00:03 imirkin_: heh
00:03 imirkin_: ok, but in tgsi it's still all pretty simple
00:03 karolherbst: most things are
00:04 karolherbst: well, with nir you don't needs those op -> type conversions
00:05 karolherbst: so this removes quite a lot of lines as well
00:06 karolherbst: well, nir is pretty strict about typing and you usually know what types the src/dests are, so there is no need to have a static lookup for that
00:06 imirkin_: so is tgsi
00:06 imirkin_: it's just convenient to have shared handlers
00:06 imirkin_: which take all kinds of ops
00:33 karolherbst: mhh, texelfetch is complicated
00:36 karolherbst: ohhh
00:36 karolherbst: meh
00:37 karolherbst: imirkin_: do we really need those 3 components per coord component?
00:37 karolherbst: in TGSI we get this IMM[0].xyy
00:37 karolherbst: but
00:37 karolherbst: in nir I just get a 2 component value, not three
00:38 imirkin_: you need it for texture3d
00:38 airlied: or ms
00:38 karolherbst: well right
00:38 imirkin_: mmmm you can't do a ms-based offset
00:38 karolherbst: but
00:38 imirkin_: i assume he's talking about offsets
00:38 karolherbst: the point is, we always set three
00:38 karolherbst: with tgsi
00:38 imirkin_: convenience.
00:38 karolherbst: mhh
00:38 karolherbst: I see
00:39 imirkin_: only 2 are used
00:39 imirkin_: but the offsets are defined as 3-component things, and that's how they're printed
00:39 airlied: imirkin_: he said coord
00:39 imirkin_: and yet he probably meant offsets ;)
00:39 karolherbst: I mean both
00:40 karolherbst: we apply the offset to the coords sources
00:40 imirkin_: since offsets are 3 components while coords are 4
00:40 karolherbst: and have this 2 dim array
00:40 karolherbst: texi->offset[s][c].set(offset);
00:40 imirkin_: right
00:40 imirkin_: so ... the reason i did that
00:40 imirkin_: was that for nv50, the offset has to be an immediate
00:41 karolherbst: I see
00:41 imirkin_: you can't reliably track it to the immediate in tgsi
00:41 karolherbst: ohh, okay
00:42 karolherbst: mhh something is still funny though, I don't get that fourth source :(
00:42 karolherbst: well in the SSA form
00:42 imirkin_: it all gets lowered
00:43 karolherbst: I know
00:43 imirkin_: pre-ssa
00:43 imirkin_: for nvc0
00:43 karolherbst: I guess I miss something and that's why those offset value doesn't get calculated/added
00:43 imirkin_: like why is texutreOffset() a thing?
00:44 imirkin_: the offsets are specified in texels
00:44 imirkin_: while the coords are normalized
00:44 imirkin_: i dunno what the practical use-case is for that
00:44 imirkin_: i just know how it works :)
00:44 karolherbst: I read something about raw data access
00:44 imirkin_: nah, you'd do that with a samplerBuffer
00:45 karolherbst: even when we only had glsl-1.30?
00:45 imirkin_: GL 3.0 is what introduced those
00:45 imirkin_: http://docs.gl/sl4/textureOffset
00:45 imirkin_: check the version table at the bottom
00:46 karolherbst: right, so glsl 1.30
00:46 imirkin_: and there's a texelFetchOffset
00:46 imirkin_: but of course txf coords are already integers
00:46 imirkin_: and there's no filtering
00:46 imirkin_: so the use-case seems weaker.
00:46 imirkin_: esp since in GL 3.0, the offset had to be an immediate
00:47 imirkin_: actually i think that might be true in GL 4.x as wel
00:47 imirkin_: only the offset for textureGatherOffset can be non-const
00:48 karolherbst: mhh, fun, the $r value gets adjusted accordingly for me
00:48 imirkin_: anyways, i did have a reason for taking a reference
00:48 imirkin_: but by now i don't remember what it is without digging through commit logs
00:48 karolherbst: ....
00:48 karolherbst: I forgot to set .useOffsets
00:48 imirkin_: yeah, that might help.
00:49 karolherbst: do I really need to fill up all 3 slots or would it be enough that I just set as many offsets components I have?
00:51 karolherbst: well it crashes when I set less
00:51 karolherbst: so I guess I stick with three
00:52 imirkin_: karolherbst: f3aa999383074d666d6e3f3506e66b0c937904ca
00:53 imirkin_: right, so they needed to be valueref's
00:53 imirkin_: so i had to stick them *somewhere*
00:53 karolherbst: ahh
00:53 karolherbst: I see
00:53 imirkin_: just some short 3.5y ago
00:53 karolherbst: that explains
00:54 karolherbst: I am kind of happy we have piglit :) makes working on this really nice
00:54 imirkin_: and while you don't have to set a value
00:54 imirkin_: i do think you have to set all of them, so do it with NULL
00:54 imirkin_: not sure.
00:55 karolherbst: I can just repeat with the last one
00:55 karolherbst: at least that's what tgsi does
00:55 karolherbst: or gives us
00:55 imirkin_: or set it to null.
00:55 imirkin_: i.e. .set(NULL)
00:55 karolherbst: mhh, makes the code more complicated though
00:55 karolherbst: or I return NULL for non existing components
00:56 imirkin_: for (; i < 3; i++) foo[i].set(NULL);
00:56 karolherbst: but this might cause other big big troubles
00:56 karolherbst: yeah... that's a line I need more than what I have now
00:56 karolherbst: also it is a 2d array
00:56 karolherbst: not 1d
00:56 imirkin_: no. it'll be fine.
00:57 imirkin_: the offsets are in reference to a single face.
00:57 karolherbst: I mean I have a function to get the nir sources in general
00:57 imirkin_: ah
00:57 karolherbst: and I have an argument for the component
00:57 karolherbst: I would rather assert if I access something non existing
00:57 karolherbst: bebcause that should never happen
00:59 karolherbst: well, the code doesn't like null either
01:10 imirkin_: yeah, i've never tested it
01:17 karolherbst: I need a TexTarget to component count ignoring array flag
01:18 imirkin_: target.getDims() iirc
01:19 imirkin_: getDim()
01:19 karolherbst: ahh
01:19 karolherbst: then I need to build a TexInstruction::Target object, but yeah, should make things easier
01:22 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp#n943
01:22 imirkin_: first number is "dim", second number is "argc"
01:22 karolherbst: yeah, I already locked it up
01:23 karolherbst: it works as well :)
01:26 karolherbst: I should probably sleep now, tomorrow is also a day to finish up all the other tex types....
01:27 karolherbst: allthough I am sure I didn't even finish any of those I worked on today :D
03:02 TheXzoron: a_/join
03:31 bazzy: note: i missed all msgs from 17:30 to 19:48. If you sent me anything at that time, please resend. ( eg. imirkin_ rperier gnarface )
03:33 imirkin: chan is logged
03:33 imirkin: see title.
03:41 bazzy: oh, thanks
03:43 bazzy: that's a nice feature :)
09:30 karolherbst: interesting, 'spec@glsl-1.30@execution@tex-miplevel-selection texturegrad cube' fails on pascal
09:33 karolherbst: the cubearray and cubeshadow one as well
14:22 imirkin: karolherbst: yeah, that's the textureGrad thing :)
14:22 imirkin: fails on kepler too without those patches that move it to lane0 i think
17:51 npnth: I'm getting a slow, persistent memory leak, and /sys/kernel/debug/kmemleak is full of lines like this: http://sprunge.us/VWCh .
17:53 npnth: I'm running mainline kernels, but based on my past experience, this is probably already known and fixed somewhere in a dev branch. If that's right, could I please be pointed to the appropriate patch?
17:53 imirkin_: hm, well - it's unclear that it's a bug
17:54 imirkin_: a process ("X") is allocating GEM objects and holding on to them
17:54 npnth: imirkin_: The leak doesn't go away when I close X.
17:54 imirkin_: hmmmmmm
17:54 imirkin_:doesn't remember how GEM objects work
17:54 imirkin_: iirc they're global, so that sounds like correct behavior. although horrible. perhaps that's not how GEM objects work.
17:54 npnth: If this doesn't sound familiar, I can open a full bug report with dmesgs and lsacpis and all that.
17:55 imirkin_: yeah, open a bug about it
17:55 npnth: Will do.
17:55 imirkin_: i assume this isn't on some ancient kernel right?
17:55 npnth: imirkin_: Nope, compiled last night.
17:55 npnth: Ditto for libdrm, mesa.
17:56 imirkin_: like a 2.6.32 kernel compiled last night? :)
17:56 npnth: Nah, it's 4.15 rc4 :)
19:21 dupondje: https://pastebin.com/raw/G3cbRX66
19:22 dupondje: do I create some bugreport for these things so it gets tracked? :_
19:22 dupondje: :)
19:24 imirkin_: what's the problem?
19:25 dupondje: imirkin_: well it works, but errors are never good? :)
19:26 imirkin_: meh
19:26 imirkin_: they're also not necessarily bad
19:26 imirkin_: we're a lot more verbose than the nvidia blob about reporting errors from the gpu
19:26 imirkin_: iirc some set of errors is triggered by the nvidia blob firmware -- not sure what gpu you have, as you appear to have carefully cut that out
19:27 imirkin_: 22554 not being there is definitely a bit surprising
19:27 dupondje: [ 1.218763] nouveau 0000:01:00.0: NVIDIA GM107 (117360a2)
19:27 imirkin_: iirc that's the workarounds reg?
19:27 imirkin_: might be gone on maxwell though
19:27 imirkin_: in which case we should stop reading it
19:27 imirkin_: drm/nouveau/nvkm/subdev/fb/ramgk104.c: ram->pmask = nvkm_rd32(device, 0x022554);
19:27 imirkin_: hrm
19:28 imirkin_: whereas on maxwell we're supposed to do
19:28 imirkin_: const u32 mask = nvkm_rd32(device, 0x021c14);
19:29 imirkin_: maybe :) skeggsb would know for sure
19:29 dupondje: hehe, bugreport to fix: 7 minutes
19:29 dupondje: gotto love open source :)
19:30 imirkin_: well - questionable. but yeah ... you can file that bug
19:36 dupondje: imirkin_: https://bugs.freedesktop.org/show_bug.cgi?id=100423
19:36 dupondje: seems like there is an existing one
19:50 imirkin_: ah, cool
22:56 imirkin_: skeggsb: OOPS! :)
22:57 imirkin_: [for the memory leak]
22:57 skeggsb: yes, that's what happens when you write code without sleeping ;)
22:57 imirkin_: or while sleeping
22:57 skeggsb: that too
22:57 imirkin_: probably a happy combination
22:58 skeggsb:is living in fear of use-after-free bugs showing up now
22:58 skeggsb:is also very pessimistic these days
22:58 imirkin_: yeah, should just drop all free's
22:58 imirkin_: that way you won't get any use-after-free issues
22:58 skeggsb: perfect!
22:58 imirkin_: why do people even bother with those, anyways
22:59 imirkin_: just causes issues
22:59 skeggsb: my main system hasn't died yet though, so, fingers crossed
22:59 imirkin_: usually when you say things like that
22:59 imirkin_: is precisely the moment it explodes
22:59 skeggsb: OOPS!
22:59 skeggsb: ;)
22:59 imirkin_: but you can't cheat it -- if you're saying it just to make it oops, then it doesn't work
23:00 imirkin_: you have to mean it :)
23:00 skeggsb: unfortunately true, especially for hunting down race conditions.. they magically disappear when you start looking
23:00 RSpliet: surely we should just make the kernel do garbage collection. Scan all stacks and regs for references, mark, sweep... what could possibly be the problem?
23:01 skeggsb: that wouldn't suck at all!
23:01 imirkin_: RSpliet: yeah, and just lock while you do that
23:01 imirkin_: RT systems will be *real* happy
23:01 RSpliet: heh
23:01 RSpliet: RT on Linux
23:01 skeggsb: in fairness, the RT people are never happy
23:01 imirkin_: yeah, but this will make them extremely unhappy
23:01 imirkin_: as opposed to their general lack of happiness
23:02 imirkin_: (so ... goal achieved!)
23:02 RSpliet: at least it would once and for all determine Linux isn't suitable for HRT systems
23:31 karolherbst: skeggsb: wanna do some vulkan stuff?
23:31 skeggsb: karolherbst: what kind of stuff? i still need to do some kernel-side improvements :P
23:32 karolherbst: writing the userspace bits :D
23:32 karolherbst: I think I got enough done on NIR so that somebody could start with the other things for vulkan
23:32 skeggsb: i'll see if i can't get *something* out there after the xmas break ;)
23:32 karolherbst: nice
23:33 skeggsb: i have a branch somewhere where i started, just need to find it :P
23:33 karolherbst: I still need to implement some features like ubos and that kind of stuff
23:33 karolherbst: but
23:33 karolherbst: basic texturing is already done
23:33 karolherbst: and the 32bit alu should be finished as well
23:33 imirkin_: i've been waiting for an ioctl that can do explicit BO placement
23:34 imirkin_: pretty sure that's a hard requirement for VK
23:34 imirkin_: separately i've been building up the courage to rewrite all of the guts of the nouveau mesa driver
23:34 imirkin_: like ... all the bo/buffer/etc handling
23:34 skeggsb: imirkin_: \o/ yes please!
23:35 imirkin_: and get rid of stupid libdrm stuff, etc
23:35 karolherbst: imirkin_: I think I wilmove the 64bit translation out of from_tgsi
23:35 karolherbst: because it has no relevance to tgsi
23:35 imirkin_: "64-bit translation"?
23:35 karolherbst: that we can't do certain ops with 64bit values
23:36 imirkin_: meh.
23:36 karolherbst: and have to lower it away before going into codegen
23:36 karolherbst: well
23:36 imirkin_: wtvr.
23:36 karolherbst: I baiscally have to duplicate all the tgsi stuff
23:36 imirkin_: i don't see that as a problem.
23:36 karolherbst: well, why should we do the same thing twice?
23:36 imirkin_: it'll always be subtly different
23:36 karolherbst: well
23:36 karolherbst: no
23:36 imirkin_: and then you want to make a change, etc
23:36 robclark: karolherbst, I thought there was some nir lowering for 64b.. although tbh 64b has been least of my concerns so far..
23:36 karolherbst: some things yes
23:36 karolherbst: but some most things, no
23:36 karolherbst: robclark: well, we have 64bit support
23:36 imirkin_: i dunno. copy/paste is pretty easy.
23:37 robclark: ahh.. fancy..
23:37 karolherbst: robclark: but we can't do every alu instruction in 64bit
23:37 karolherbst: and some only half
23:37 imirkin_: there are even shortcuts for it :)
23:37 karolherbst: well right
23:37 karolherbst: but the API is different
23:37 imirkin_: robclark: there's basically zero 64-bit int support on nvidia
23:37 karolherbst: so I need to translate it over to the nir stuff
23:37 imirkin_: robclark: basically just int64 <-> float64
23:37 karolherbst: right, the int stuff is annoying
23:37 imirkin_: and a handful of similar items
23:38 imirkin_: that would be a MAJOR disaster to handle without hw support
23:38 robclark: ok, well I can't guarantee that the nir lowering supports pick/choose what stuff to lower.. that might be a reasonable addition.. otoh moving things out of tgsi->nvir might be even more reasonable ;-)
23:38 karolherbst: imirkin_: 64bit fmax
23:38 karolherbst: we can't do that as well ;)
23:38 karolherbst: well
23:38 imirkin_: fmax?
23:38 karolherbst: not in one instruction
23:38 imirkin_: that's 64-bit double stuff
23:38 imirkin_: not int
23:38 karolherbst: right, but as an example of 64bit float stuff which is annoying to do
23:39 imirkin_: yeah
23:39 karolherbst: but double mul si perfectly fine I think, right?
23:39 karolherbst: *is
23:39 imirkin_: yep
23:39 karolherbst: robclark: ;) see
23:39 imirkin_: rcp/rsq aren't great either
23:39 karolherbst: right
23:39 karolherbst: but I would rather lower those into the right things not in the IR translators
23:40 imirkin_: for rcp/rsq, yeah
23:40 karolherbst: but before going into SSA or so
23:40 imirkin_: the question is basically...
23:40 imirkin_: "would it be helpful for the optimizer to keep the operation intact or not"
23:40 karolherbst: well, it makes kind of sense
23:40 karolherbst: except nir is the last IR we have to support
23:40 imirkin_: if the answer is "yes", then it has to be lowered after the ssa opt passes
23:40 imirkin_: if the answer is "no" then it should be lowered before
23:40 karolherbst: right
23:40 karolherbst: but currently we lower a lot in from_tgsi
23:40 imirkin_: eh - depends whether SPIR-V lands first or not :p
23:41 karolherbst: and I would move it a bit deeper into codegen
23:41 karolherbst: :D
23:41 karolherbst: well, maybe we need to support other things
23:41 karolherbst: who knows
23:41 karolherbst: anyway
23:41 karolherbst: because I need it for nir anyway
23:41 imirkin_: really? i thought there was basically none, with the exception of a handful of weirdo tgsi-only ops like "LIT"
23:41 karolherbst: I would just move it
23:41 karolherbst: there are quite a lot
23:41 karolherbst: a lot of 64bit ones
23:41 imirkin_: hmmm maybe
23:41 karolherbst: basically all 64 bit ones
23:42 karolherbst: check the file
23:42 karolherbst: it is full of special code for doubles and 64bit ints
23:42 karolherbst: well maybe not full
23:42 karolherbst: but there is quite a lot
23:43 karolherbst: I pass half of arb_gpu_shader_int64 for example
23:43 karolherbst: and I already added some special handling for some of those
23:43 imirkin_: first half is always easiest ;)
23:43 karolherbst: welll
23:43 karolherbst: the issue is that 64bit is special
23:43 imirkin_: fscking 64-bit shifts :p
23:43 karolherbst: right
23:43 karolherbst: but as I said
23:44 karolherbst: if that would be not inside from_tgsi in the first place, I wouldn't need to bother about those now
23:44 imirkin_: well, i'm not TOTALLY opposed to moving appropriate bits of logic around
23:44 karolherbst: well
23:44 imirkin_: but it can't be done willy-nilly
23:44 karolherbst: I shouldn't break stuff, right
23:45 karolherbst: I would just move it up one step
23:45 imirkin_: unfortunately there can be a lot of subtlety involved which is hidden away behind a particular implementation
23:45 karolherbst: right
23:45 karolherbst: stuff like mix/max is fairly trivial though
23:45 imirkin_: e.g. this way xyz sequences aren't generated, which causes abc problems
23:46 karolherbst: mhh, I think I already run into a few such issues
23:46 imirkin_: sometimes it's happy coincidence, other times design
23:46 karolherbst: I already had some fun with a merge having an immediate value :)
23:46 imirkin_: right
23:47 imirkin_: which makes sense in principle, but in practice "don't do that"
23:47 karolherbst: robclark: well, the point is, that there are special extensions to ops in nv hw
23:47 karolherbst: robclark: most of it we really don't want to express in NIR I think
23:47 karolherbst: maybe we want
23:47 karolherbst: depends on the situation
23:47 imirkin_: the nv isa is very flaggy
23:47 karolherbst: comparing two 64bit values is very strange in a way
23:48 imirkin_: how *do* i do that? i forget
23:48 karolherbst: two sets
23:48 imirkin_: i remember it wasn't obvious
23:48 karolherbst: and some magic
23:48 imirkin_: no, u64seq seems pretty straightforward
23:49 imirkin_: the u64min/max - those are fun :)
23:49 karolherbst: right
23:49 karolherbst: already did those though
23:49 karolherbst: the carry is the thing
23:49 imirkin_: right, so the subop generates a flag
23:49 robclark: so drive-by comment.. take it or leave it (or take it with a grain of salt because I'm not codegen expert by any means).. but any sort of moving stuff into common bits out of tgsi->nvir or nir->nvir could be broken up into two steps (ie. extract out and use in nir->nvir, and then later change tgsi->nvir.. although if that approach is loosing information backend finds useful maybe just copy/paste isn't the end of the world and if tgsi
23:49 robclark: eventually goes away the duplicated code problem solves itself
23:49 imirkin_: and the other one consumes it
23:49 karolherbst: this got me quite hard: low->setFlagsSrc(2, flag);
23:50 karolherbst: low is the minmax for the lower bits
23:50 karolherbst: why 2 though?
23:50 karolherbst: ohh wait
23:50 karolherbst: it is src
23:50 karolherbst: not def
23:50 imirkin_: robclark: mmm ... that sounds a little trickier - it's more of a flow thing, i.e. you generate one IR, then transform it
23:50 karolherbst: how did I got this right/
23:50 karolherbst: ?
23:50 imirkin_: karolherbst: yeah, the low thing consumes the flag
23:50 imirkin_: there's also a "MED" version
23:51 imirkin_: i guess for >64-bit ints
23:51 karolherbst: fun
23:51 karolherbst: robclark: well, we have different stage in codegen
23:51 imirkin_: which both consumes the previous and generates the next $c
23:51 karolherbst: robclark: basically pre SSA, post SSA and post RA
23:51 karolherbst: and most of the opts are done post SSA of course
23:52 karolherbst: but I guess mhh
23:52 karolherbst: well
23:52 karolherbst: I would move those things into pre SSA for now
23:52 karolherbst: and write the code in a way it is compatible to SSA
23:53 karolherbst: robclark: both (tgsi and nir) are just making a "stupid" to nvir translation and then we run the full codegen stuff on top of that
23:54 karolherbst: so moving stuff doesn't really mean loosing information except you move it somewhere deep down into codegen
23:55 imirkin_: robclark: a more likely pattern is to move stuff from the tgsi -> nvir adapter into nvir proper
23:55 karolherbst: yeah, this as well
23:55 karolherbst: well
23:55 imirkin_: karolherbst: looking back at some of the stuff i ended up doing in from_tgsi, you're totally right - that stuff belongs in a legalizessa step or something
23:55 karolherbst: we don't need to move stuff actually
23:55 imirkin_: esp the min/max stuff
23:55 karolherbst: tgsi -> nvir could still do its thing
23:55 imirkin_: i think i was in "bang on keyboard until it works" mode, and never finished cleaning up
23:56 karolherbst: and nir -> nvir would rely on the moved copied bits
23:56 karolherbst: and at some point we can remove the lowering from tgsi -> nvir
23:56 karolherbst: actually I think I will do it this way
23:56 karolherbst: so I won't touch tgsi -> nir
23:56 karolherbst: but nir -> nvir already makes use of the new stuff
23:57 karolherbst: well "new"