00:01imirkin_: i dunno - i think the from_tgsi thing is actually pretty simple
00:01imirkin_: it's a lot of lines of code, but most of it is just enum <-> enum conversions
00:02karolherbst: the CFG stuff is kind of messy though
00:02karolherbst: in nir this is much more straightforward
00:02karolherbst: or at least you know your context if you parse a nir_if/nir_loop node
00:02karolherbst: in TGSI you have to push bbs on stacks and have a state machine and so on
00:02karolherbst: don't need this with nir
00:03imirkin_: ok, but in tgsi it's still all pretty simple
00:03karolherbst: most things are
00:04karolherbst: well, with nir you don't needs those op -> type conversions
00:05karolherbst: so this removes quite a lot of lines as well
00:06karolherbst: well, nir is pretty strict about typing and you usually know what types the src/dests are, so there is no need to have a static lookup for that
00:06imirkin_: so is tgsi
00:06imirkin_: it's just convenient to have shared handlers
00:06imirkin_: which take all kinds of ops
00:33karolherbst: mhh, texelfetch is complicated
00:37karolherbst: imirkin_: do we really need those 3 components per coord component?
00:37karolherbst: in TGSI we get this IMM.xyy
00:37karolherbst: in nir I just get a 2 component value, not three
00:38imirkin_: you need it for texture3d
00:38airlied: or ms
00:38karolherbst: well right
00:38imirkin_: mmmm you can't do a ms-based offset
00:38imirkin_: i assume he's talking about offsets
00:38karolherbst: the point is, we always set three
00:38karolherbst: with tgsi
00:38karolherbst: I see
00:39imirkin_: only 2 are used
00:39imirkin_: but the offsets are defined as 3-component things, and that's how they're printed
00:39airlied: imirkin_: he said coord
00:39imirkin_: and yet he probably meant offsets ;)
00:39karolherbst: I mean both
00:40karolherbst: we apply the offset to the coords sources
00:40imirkin_: since offsets are 3 components while coords are 4
00:40karolherbst: and have this 2 dim array
00:40imirkin_: so ... the reason i did that
00:40imirkin_: was that for nv50, the offset has to be an immediate
00:41karolherbst: I see
00:41imirkin_: you can't reliably track it to the immediate in tgsi
00:41karolherbst: ohh, okay
00:42karolherbst: mhh something is still funny though, I don't get that fourth source :(
00:42karolherbst: well in the SSA form
00:42imirkin_: it all gets lowered
00:43karolherbst: I know
00:43imirkin_: for nvc0
00:43karolherbst: I guess I miss something and that's why those offset value doesn't get calculated/added
00:43imirkin_: like why is texutreOffset() a thing?
00:44imirkin_: the offsets are specified in texels
00:44imirkin_: while the coords are normalized
00:44imirkin_: i dunno what the practical use-case is for that
00:44imirkin_: i just know how it works :)
00:44karolherbst: I read something about raw data access
00:44imirkin_: nah, you'd do that with a samplerBuffer
00:45karolherbst: even when we only had glsl-1.30?
00:45imirkin_: GL 3.0 is what introduced those
00:45imirkin_: check the version table at the bottom
00:46karolherbst: right, so glsl 1.30
00:46imirkin_: and there's a texelFetchOffset
00:46imirkin_: but of course txf coords are already integers
00:46imirkin_: and there's no filtering
00:46imirkin_: so the use-case seems weaker.
00:46imirkin_: esp since in GL 3.0, the offset had to be an immediate
00:47imirkin_: actually i think that might be true in GL 4.x as wel
00:47imirkin_: only the offset for textureGatherOffset can be non-const
00:48karolherbst: mhh, fun, the $r value gets adjusted accordingly for me
00:48imirkin_: anyways, i did have a reason for taking a reference
00:48imirkin_: but by now i don't remember what it is without digging through commit logs
00:48karolherbst: I forgot to set .useOffsets
00:48imirkin_: yeah, that might help.
00:49karolherbst: do I really need to fill up all 3 slots or would it be enough that I just set as many offsets components I have?
00:51karolherbst: well it crashes when I set less
00:51karolherbst: so I guess I stick with three
00:52imirkin_: karolherbst: f3aa999383074d666d6e3f3506e66b0c937904ca
00:53imirkin_: right, so they needed to be valueref's
00:53imirkin_: so i had to stick them *somewhere*
00:53karolherbst: I see
00:53imirkin_: just some short 3.5y ago
00:53karolherbst: that explains
00:54karolherbst: I am kind of happy we have piglit :) makes working on this really nice
00:54imirkin_: and while you don't have to set a value
00:54imirkin_: i do think you have to set all of them, so do it with NULL
00:54imirkin_: not sure.
00:55karolherbst: I can just repeat with the last one
00:55karolherbst: at least that's what tgsi does
00:55karolherbst: or gives us
00:55imirkin_: or set it to null.
00:55imirkin_: i.e. .set(NULL)
00:55karolherbst: mhh, makes the code more complicated though
00:55karolherbst: or I return NULL for non existing components
00:56imirkin_: for (; i < 3; i++) foo[i].set(NULL);
00:56karolherbst: but this might cause other big big troubles
00:56karolherbst: yeah... that's a line I need more than what I have now
00:56karolherbst: also it is a 2d array
00:56karolherbst: not 1d
00:56imirkin_: no. it'll be fine.
00:57imirkin_: the offsets are in reference to a single face.
00:57karolherbst: I mean I have a function to get the nir sources in general
00:57karolherbst: and I have an argument for the component
00:57karolherbst: I would rather assert if I access something non existing
00:57karolherbst: bebcause that should never happen
00:59karolherbst: well, the code doesn't like null either
01:10imirkin_: yeah, i've never tested it
01:17karolherbst: I need a TexTarget to component count ignoring array flag
01:18imirkin_: target.getDims() iirc
01:19karolherbst: then I need to build a TexInstruction::Target object, but yeah, should make things easier
01:22imirkin_: first number is "dim", second number is "argc"
01:22karolherbst: yeah, I already locked it up
01:23karolherbst: it works as well :)
01:26karolherbst: I should probably sleep now, tomorrow is also a day to finish up all the other tex types....
01:27karolherbst: allthough I am sure I didn't even finish any of those I worked on today :D
03:31bazzy: note: i missed all msgs from 17:30 to 19:48. If you sent me anything at that time, please resend. ( eg. imirkin_ rperier gnarface )
03:33imirkin: chan is logged
03:33imirkin: see title.
03:41bazzy: oh, thanks
03:43bazzy: that's a nice feature :)
09:30karolherbst: interesting, 'email@example.com@execution@tex-miplevel-selection texturegrad cube' fails on pascal
09:33karolherbst: the cubearray and cubeshadow one as well
14:22imirkin: karolherbst: yeah, that's the textureGrad thing :)
14:22imirkin: fails on kepler too without those patches that move it to lane0 i think
17:51npnth: I'm getting a slow, persistent memory leak, and /sys/kernel/debug/kmemleak is full of lines like this: http://sprunge.us/VWCh .
17:53npnth: I'm running mainline kernels, but based on my past experience, this is probably already known and fixed somewhere in a dev branch. If that's right, could I please be pointed to the appropriate patch?
17:53imirkin_: hm, well - it's unclear that it's a bug
17:54imirkin_: a process ("X") is allocating GEM objects and holding on to them
17:54npnth: imirkin_: The leak doesn't go away when I close X.
17:54imirkin_:doesn't remember how GEM objects work
17:54imirkin_: iirc they're global, so that sounds like correct behavior. although horrible. perhaps that's not how GEM objects work.
17:54npnth: If this doesn't sound familiar, I can open a full bug report with dmesgs and lsacpis and all that.
17:55imirkin_: yeah, open a bug about it
17:55npnth: Will do.
17:55imirkin_: i assume this isn't on some ancient kernel right?
17:55npnth: imirkin_: Nope, compiled last night.
17:55npnth: Ditto for libdrm, mesa.
17:56imirkin_: like a 2.6.32 kernel compiled last night? :)
17:56npnth: Nah, it's 4.15 rc4 :)
19:22dupondje: do I create some bugreport for these things so it gets tracked? :_
19:24imirkin_: what's the problem?
19:25dupondje: imirkin_: well it works, but errors are never good? :)
19:26imirkin_: they're also not necessarily bad
19:26imirkin_: we're a lot more verbose than the nvidia blob about reporting errors from the gpu
19:26imirkin_: iirc some set of errors is triggered by the nvidia blob firmware -- not sure what gpu you have, as you appear to have carefully cut that out
19:27imirkin_: 22554 not being there is definitely a bit surprising
19:27dupondje: [ 1.218763] nouveau 0000:01:00.0: NVIDIA GM107 (117360a2)
19:27imirkin_: iirc that's the workarounds reg?
19:27imirkin_: might be gone on maxwell though
19:27imirkin_: in which case we should stop reading it
19:27imirkin_: drm/nouveau/nvkm/subdev/fb/ramgk104.c: ram->pmask = nvkm_rd32(device, 0x022554);
19:28imirkin_: whereas on maxwell we're supposed to do
19:28imirkin_: const u32 mask = nvkm_rd32(device, 0x021c14);
19:29imirkin_: maybe :) skeggsb would know for sure
19:29dupondje: hehe, bugreport to fix: 7 minutes
19:29dupondje: gotto love open source :)
19:30imirkin_: well - questionable. but yeah ... you can file that bug
19:36dupondje: imirkin_: https://bugs.freedesktop.org/show_bug.cgi?id=100423
19:36dupondje: seems like there is an existing one
19:50imirkin_: ah, cool
22:56imirkin_: skeggsb: OOPS! :)
22:57imirkin_: [for the memory leak]
22:57skeggsb: yes, that's what happens when you write code without sleeping ;)
22:57imirkin_: or while sleeping
22:57skeggsb: that too
22:57imirkin_: probably a happy combination
22:58skeggsb:is living in fear of use-after-free bugs showing up now
22:58skeggsb:is also very pessimistic these days
22:58imirkin_: yeah, should just drop all free's
22:58imirkin_: that way you won't get any use-after-free issues
22:58imirkin_: why do people even bother with those, anyways
22:59imirkin_: just causes issues
22:59skeggsb: my main system hasn't died yet though, so, fingers crossed
22:59imirkin_: usually when you say things like that
22:59imirkin_: is precisely the moment it explodes
22:59imirkin_: but you can't cheat it -- if you're saying it just to make it oops, then it doesn't work
23:00imirkin_: you have to mean it :)
23:00skeggsb: unfortunately true, especially for hunting down race conditions.. they magically disappear when you start looking
23:00RSpliet: surely we should just make the kernel do garbage collection. Scan all stacks and regs for references, mark, sweep... what could possibly be the problem?
23:01skeggsb: that wouldn't suck at all!
23:01imirkin_: RSpliet: yeah, and just lock while you do that
23:01imirkin_: RT systems will be *real* happy
23:01RSpliet: RT on Linux
23:01skeggsb: in fairness, the RT people are never happy
23:01imirkin_: yeah, but this will make them extremely unhappy
23:01imirkin_: as opposed to their general lack of happiness
23:02imirkin_: (so ... goal achieved!)
23:02RSpliet: at least it would once and for all determine Linux isn't suitable for HRT systems
23:31karolherbst: skeggsb: wanna do some vulkan stuff?
23:31skeggsb: karolherbst: what kind of stuff? i still need to do some kernel-side improvements :P
23:32karolherbst: writing the userspace bits :D
23:32karolherbst: I think I got enough done on NIR so that somebody could start with the other things for vulkan
23:32skeggsb: i'll see if i can't get *something* out there after the xmas break ;)
23:33skeggsb: i have a branch somewhere where i started, just need to find it :P
23:33karolherbst: I still need to implement some features like ubos and that kind of stuff
23:33karolherbst: basic texturing is already done
23:33karolherbst: and the 32bit alu should be finished as well
23:33imirkin_: i've been waiting for an ioctl that can do explicit BO placement
23:34imirkin_: pretty sure that's a hard requirement for VK
23:34imirkin_: separately i've been building up the courage to rewrite all of the guts of the nouveau mesa driver
23:34imirkin_: like ... all the bo/buffer/etc handling
23:34skeggsb: imirkin_: \o/ yes please!
23:35imirkin_: and get rid of stupid libdrm stuff, etc
23:35karolherbst: imirkin_: I think I wilmove the 64bit translation out of from_tgsi
23:35karolherbst: because it has no relevance to tgsi
23:35imirkin_: "64-bit translation"?
23:35karolherbst: that we can't do certain ops with 64bit values
23:36karolherbst: and have to lower it away before going into codegen
23:36karolherbst: I baiscally have to duplicate all the tgsi stuff
23:36imirkin_: i don't see that as a problem.
23:36karolherbst: well, why should we do the same thing twice?
23:36imirkin_: it'll always be subtly different
23:36imirkin_: and then you want to make a change, etc
23:36robclark: karolherbst, I thought there was some nir lowering for 64b.. although tbh 64b has been least of my concerns so far..
23:36karolherbst: some things yes
23:36karolherbst: but some most things, no
23:36karolherbst: robclark: well, we have 64bit support
23:36imirkin_: i dunno. copy/paste is pretty easy.
23:37robclark: ahh.. fancy..
23:37karolherbst: robclark: but we can't do every alu instruction in 64bit
23:37karolherbst: and some only half
23:37imirkin_: there are even shortcuts for it :)
23:37karolherbst: well right
23:37karolherbst: but the API is different
23:37imirkin_: robclark: there's basically zero 64-bit int support on nvidia
23:37karolherbst: so I need to translate it over to the nir stuff
23:37imirkin_: robclark: basically just int64 <-> float64
23:37karolherbst: right, the int stuff is annoying
23:37imirkin_: and a handful of similar items
23:38imirkin_: that would be a MAJOR disaster to handle without hw support
23:38robclark: ok, well I can't guarantee that the nir lowering supports pick/choose what stuff to lower.. that might be a reasonable addition.. otoh moving things out of tgsi->nvir might be even more reasonable ;-)
23:38karolherbst: imirkin_: 64bit fmax
23:38karolherbst: we can't do that as well ;)
23:38karolherbst: not in one instruction
23:38imirkin_: that's 64-bit double stuff
23:38imirkin_: not int
23:38karolherbst: right, but as an example of 64bit float stuff which is annoying to do
23:39karolherbst: but double mul si perfectly fine I think, right?
23:39karolherbst: robclark: ;) see
23:39imirkin_: rcp/rsq aren't great either
23:39karolherbst: but I would rather lower those into the right things not in the IR translators
23:40imirkin_: for rcp/rsq, yeah
23:40karolherbst: but before going into SSA or so
23:40imirkin_: the question is basically...
23:40imirkin_: "would it be helpful for the optimizer to keep the operation intact or not"
23:40karolherbst: well, it makes kind of sense
23:40karolherbst: except nir is the last IR we have to support
23:40imirkin_: if the answer is "yes", then it has to be lowered after the ssa opt passes
23:40imirkin_: if the answer is "no" then it should be lowered before
23:40karolherbst: but currently we lower a lot in from_tgsi
23:40imirkin_: eh - depends whether SPIR-V lands first or not :p
23:41karolherbst: and I would move it a bit deeper into codegen
23:41karolherbst: well, maybe we need to support other things
23:41karolherbst: who knows
23:41karolherbst: because I need it for nir anyway
23:41imirkin_: really? i thought there was basically none, with the exception of a handful of weirdo tgsi-only ops like "LIT"
23:41karolherbst: I would just move it
23:41karolherbst: there are quite a lot
23:41karolherbst: a lot of 64bit ones
23:41imirkin_: hmmm maybe
23:41karolherbst: basically all 64 bit ones
23:42karolherbst: check the file
23:42karolherbst: it is full of special code for doubles and 64bit ints
23:42karolherbst: well maybe not full
23:42karolherbst: but there is quite a lot
23:43karolherbst: I pass half of arb_gpu_shader_int64 for example
23:43karolherbst: and I already added some special handling for some of those
23:43imirkin_: first half is always easiest ;)
23:43karolherbst: the issue is that 64bit is special
23:43imirkin_: fscking 64-bit shifts :p
23:43karolherbst: but as I said
23:44karolherbst: if that would be not inside from_tgsi in the first place, I wouldn't need to bother about those now
23:44imirkin_: well, i'm not TOTALLY opposed to moving appropriate bits of logic around
23:44imirkin_: but it can't be done willy-nilly
23:44karolherbst: I shouldn't break stuff, right
23:45karolherbst: I would just move it up one step
23:45imirkin_: unfortunately there can be a lot of subtlety involved which is hidden away behind a particular implementation
23:45karolherbst: stuff like mix/max is fairly trivial though
23:45imirkin_: e.g. this way xyz sequences aren't generated, which causes abc problems
23:46karolherbst: mhh, I think I already run into a few such issues
23:46imirkin_: sometimes it's happy coincidence, other times design
23:46karolherbst: I already had some fun with a merge having an immediate value :)
23:47imirkin_: which makes sense in principle, but in practice "don't do that"
23:47karolherbst: robclark: well, the point is, that there are special extensions to ops in nv hw
23:47karolherbst: robclark: most of it we really don't want to express in NIR I think
23:47karolherbst: maybe we want
23:47karolherbst: depends on the situation
23:47imirkin_: the nv isa is very flaggy
23:47karolherbst: comparing two 64bit values is very strange in a way
23:48imirkin_: how *do* i do that? i forget
23:48karolherbst: two sets
23:48imirkin_: i remember it wasn't obvious
23:48karolherbst: and some magic
23:48imirkin_: no, u64seq seems pretty straightforward
23:49imirkin_: the u64min/max - those are fun :)
23:49karolherbst: already did those though
23:49karolherbst: the carry is the thing
23:49imirkin_: right, so the subop generates a flag
23:49robclark: so drive-by comment.. take it or leave it (or take it with a grain of salt because I'm not codegen expert by any means).. but any sort of moving stuff into common bits out of tgsi->nvir or nir->nvir could be broken up into two steps (ie. extract out and use in nir->nvir, and then later change tgsi->nvir.. although if that approach is loosing information backend finds useful maybe just copy/paste isn't the end of the world and if tgsi
23:49robclark: eventually goes away the duplicated code problem solves itself
23:49imirkin_: and the other one consumes it
23:49karolherbst: this got me quite hard: low->setFlagsSrc(2, flag);
23:50karolherbst: low is the minmax for the lower bits
23:50karolherbst: why 2 though?
23:50karolherbst: ohh wait
23:50karolherbst: it is src
23:50karolherbst: not def
23:50imirkin_: robclark: mmm ... that sounds a little trickier - it's more of a flow thing, i.e. you generate one IR, then transform it
23:50karolherbst: how did I got this right/
23:50imirkin_: karolherbst: yeah, the low thing consumes the flag
23:50imirkin_: there's also a "MED" version
23:51imirkin_: i guess for >64-bit ints
23:51karolherbst: robclark: well, we have different stage in codegen
23:51imirkin_: which both consumes the previous and generates the next $c
23:51karolherbst: robclark: basically pre SSA, post SSA and post RA
23:51karolherbst: and most of the opts are done post SSA of course
23:52karolherbst: but I guess mhh
23:52karolherbst: I would move those things into pre SSA for now
23:52karolherbst: and write the code in a way it is compatible to SSA
23:53karolherbst: robclark: both (tgsi and nir) are just making a "stupid" to nvir translation and then we run the full codegen stuff on top of that
23:54karolherbst: so moving stuff doesn't really mean loosing information except you move it somewhere deep down into codegen
23:55imirkin_: robclark: a more likely pattern is to move stuff from the tgsi -> nvir adapter into nvir proper
23:55karolherbst: yeah, this as well
23:55imirkin_: karolherbst: looking back at some of the stuff i ended up doing in from_tgsi, you're totally right - that stuff belongs in a legalizessa step or something
23:55karolherbst: we don't need to move stuff actually
23:55imirkin_: esp the min/max stuff
23:55karolherbst: tgsi -> nvir could still do its thing
23:55imirkin_: i think i was in "bang on keyboard until it works" mode, and never finished cleaning up
23:56karolherbst: and nir -> nvir would rely on the moved copied bits
23:56karolherbst: and at some point we can remove the lowering from tgsi -> nvir
23:56karolherbst: actually I think I will do it this way
23:56karolherbst: so I won't touch tgsi -> nir
23:56karolherbst: but nir -> nvir already makes use of the new stuff
23:57karolherbst: well "new"