00:33Lyude: Hm, so I'm working on the clockgating stuff for kepler+, and since it looks like the stuff for doing clockgating on all of those gens is going to be more or less the same I want to do the same kind of code organization nvkm/engine/gr/gf100.c has, e.g. providing a bunch of generation specific hooks on top of struct nvkm_gr (of course in this case, it would be struct nvkm_therm instead since we're working in
00:33Lyude: nvkm/subdev/therm). So, I notice that nouveau has the convention for subdevs of providing a function such as nvkm_therm_new_() that handles allocating these structures and initializing everything in them. Now, the problem I'm running into is that I would like to be able to hook into nvkm_therm_new_() for everything except the actual kalloc() call that allocated the memory for the structure, and call that
00:33Lyude: code from a new nvkm_therm_gf100_new_() func. So, I figured I could split out nvkm_therm_new_() into one function for allocating the structure, and one for initializing it, but I can't seem to find any other files that actually do this. So, what is the actual convention I should be following for doing something like this?
00:35skeggsb: that's done in heaps of places
00:35Lyude: ah, i might have just not been grepping for the right thing then
00:37skeggsb: gf100_subdev_new() is the top-level function called by device for a given chipset, gf100_subdev_new_() is a "caller passes its own function pointer struct in, function allocates struct + fill in initial values"
00:37skeggsb: if you want to do it separately, have the top-level function allocate, and have a gf100_subdev_ctor() do common initialisation
00:38skeggsb: do NOT ever do a nvkm_subdev_CHIPSET_blah() name either :P
00:38skeggsb: CHIPSET_subdev_blah() is the correct form
00:40skeggsb: summary: _new() == entrypoint, _new_() == allocate+initialise, called from an entry-point, _ctor() is just the initialise part
00:40Lyude: ahhhhhhhh, gotcha
00:40skeggsb: it depends on what you need to do / what is possible to share as to which you decide to use
00:40Lyude: it still boggles my mind how consistent all of this styling stuff is
00:42Lyude: hopefully the kernel driver I eventually write someday for mali ends up being structured in the same manner
00:42airlied: Lyude: after the 2nd/3rd rewrite maybe :-P
00:43Lyude: that one is definitely going to end up being weird since we're starting with bifrost, and i'm assuming if we ever end up doing the earlier gens it's going to mean we'll be adding chipset support backwards chronologically
00:49Lyude: btw skeggsb what does ctor actually stand for?
00:50skeggsb:had every intention of documenting the general design of nvkm when he "rewrote" it last, but, other stuff kept being more important
00:50skeggsb: should do that some day
00:52airlied: Lyude: also helps if your hw interface is sane :-P
00:52Lyude: that's gonna be a fun one to deal with
00:53Lyude: we still haven't figured out how the heck to decode the command packets yet
00:53Lyude: can't imagine the rest of the GPU is much better, lol
00:58imirkin: i thought mali was at least partly worked out via lima?
00:58imirkin: or is this for mali-t?
00:59Lyude: imirkin: yeah it's worked out, but only partly
01:00Lyude: p sure more has been figured out about t200/t400
01:00Lyude: but they started going with a unified shader design for t600
01:02airlied: did they change the command submission as well?
01:02Lyude: airlied: not sure
01:20imirkin: Lyude: well, you're probably nowhere close to 'there' yet, but when the time comes, i'd be happy to help with GL driver feature bringup - i've helped both freedreno and, to a lesser extent, etnaviv with some stuff
01:22airlied: just give him a triangle :-P
01:22airlied: well maybe a shaded triangle
01:25Lyude: imirkin: you're more then welcome to help! #BiOpenly is the channel btw
02:05imirkin: Lyude: why that name? how does it relate to mali?
02:10airlied: imirkin: bifrost is the core name
02:11imirkin: ah, i see
06:58karolherbst: imirkin: maybe we should only split mad on fermi+? But maybe this complicates things too much again
08:05dboyan: hakzsam: Could you take a look at https://lists.freedesktop.org/archives/mesa-dev/2017-June/158734.html ?
08:05hakzsam: yeah, I will do
08:06dboyan: thanks, I want to know if the path I took was correct
08:06hakzsam: sure, I will have a look later today :)
11:19dboyan: imirkin: I plan to construct DAG according to def-use chains in bb. I'm curious how it is possible that a Value have several defs
11:21dboyan: Also, I noticed that serial is generated from LocalCSE pass, I assume that it should be regenerated after scheduling pass?
11:28karolherbst: dboyan: an Instruction writing to several registers
11:28karolherbst: ohh, wait you mean a Value
11:30karolherbst: dboyan: remember that the same structures are also used post SSA, where each Values isn't assigned just once anymore
11:31karolherbst: dboyan: think of if clauses: $p0 mov $r0 0x1; not $p0 mov $r0 0x2; mov $r1 $r0 // $r0 should have two defs here if I didn't get it wrong
12:37imirkin: dboyan: can't happen in SSA
12:38imirkin: dboyan: however it can happen pre-SSA and can happen during RA (and obviously after RA)
12:44dboyan: ah, I see. I was overlooking non-SSA forms.
12:46imirkin: during RA, merged defs get literally added together too
12:46imirkin: i mean if you have foo = merge(a, b, c, d)
12:46imirkin: then foo.defs will get a.defs, b.defs, etc added into it
12:50dboyan: I think I'm going to use dependency tracking logic similar to that of i965. So the framework would be: 1) Take instructions from bb 2) build dag 3) take instructions out of bb according some policies
12:51dboyan: I think I don't have to take out phi nodes if it's in ssa?
12:52imirkin: you can't move phi nodes
12:52imirkin: at least not for now.
12:54dboyan: Yeah, they are just at the beginning of each bb, if i see it correctly.
12:56dboyan: Actually the first step in ir3's scheduling is to move phis and inputs to the head of each bb
13:00imirkin: phi's are always at the start of each bb
15:03karolherbst: Sophira: long time no see
17:09fitzgen: given that myself and a coworker are both using gallium .4 llvmpipe with mesa, what might cause one of us to get gl 2.1 and gles 2.0 vs the other to get gl 3.0 and gles 3.0?
17:10fitzgen: my coworker's `LIBGL_ALWAYS_SOFTWARE=1 glxinfo` https://pastebin.com/wwL4ZCZd vs mine https://pastebin.com/KNkDKRj6
17:15prg: one of you compiled with --enable-texture-float, the other one didn't?
17:16fitzgen: prg: and the one who used `--enable-texture-float` is the one who gets gles 3?
17:17prg: should be required for gl 3.x at least, not sure about gles but seems possible
17:18fitzgen: ok. the fedpkg's configure line seems to be suggesting that --enable-texture-float is used by default in fedora, but I can try building from source as a sanity check
18:01imirkin_: fitzgen: could be a silly difference somewhere ... like you're using xf86-video-nouveau with NoAccel while your coworker is using xf86-video-modesetting (similarly without acceleration), but the latter might expose GLX_ARB_create_context_profile while the former doesn't. or vice-versa. or something else entirely.
18:02imirkin_: fitzgen: either way, not a nouveau issue if you're using llvmpipe.
18:02fitzgen: imirkin_: so the --enable-texture-float configure flag is a red herring?
18:02imirkin_: float textures are the usual culprit
18:02imirkin_: of not getting GL 3.0 / GLES 3.0
18:02imirkin_: however it's not the only culprit
18:03imirkin_: the other culprits almost never happen though
18:03imirkin_: but the fact that you're using e.g. nomodeset could precipitate one of the odd things to happen
18:03fitzgen: imirkin_: I removed the nomodeset from my grub config
18:03imirkin_: since then you can't even use xf86-video-modesetting...
18:04imirkin_: and iirc you got GL to work with nouveau, right?
18:04imirkin_: iirc they apply a patch
18:04imirkin_: which limits the swrast's to GL 2.x
18:04imirkin_: while enabling it on hw drivers
18:04fitzgen: hm ok
18:05karolherbst: imirkin_: why would they do this?
18:05imirkin_: karolherbst: patents
18:05karolherbst: ohhhh I see
18:05imirkin_: IANAL, but i believe the theory is that nvidia & co have already paid the patent fees
18:05karolherbst: makes sense
18:05fitzgen: none of these patch names seems related
18:06fitzgen: so I guess I get to dig through their contents
18:06imirkin_: patent licensing is a landmine inside a landmine though, so ... i don't pretend to understand it
18:06imirkin_: fitzgen: do you have a list?
18:06fitzgen: er actually that last one...
18:07fitzgen: mentions texutre
18:07imirkin_: f and g are close together.
18:07imirkin_: on QWERTY keyboards at least
18:07fitzgen: From 00bcd599310dc7fce4fe336ffd85902429051a0c Mon Sep 17 00:00:00 2001
18:07fitzgen: From: Igor Gnatenko <firstname.lastname@example.org>
18:07fitzgen: Date: Sun, 20 Mar 2016 13:27:04 +0100
18:07fitzgen: Subject: [PATCH 2/4] hardware gloat
18:07fitzgen: Signed-off-by: Igor Gnatenko <email@example.com>
18:08fitzgen: my bad
18:08fitzgen: accidentally hit shift+enter
18:08fitzgen: which skips the pastebin
18:12imirkin_: which disables float RT's for llvmpipe and softpipe
18:13fitzgen: ok, so delete this patch and rebuild
18:13imirkin_: and as you may have guessed, GL 3.0 and GLES 3.0 require float RT's
18:14imirkin_: were you having trouble with nouveau, or just want to play with llvmpipe?
18:16fitzgen: imirkin_: both
18:17fitzgen: imirkin_: I need gl es >= 3 for the application I'm hacking on, and I'm getting intermittent crashes, which are easiest to debug with rr, which forces software rendering for deterministic record/replay
18:19imirkin_: hm ok
18:19imirkin_: nouveau should be pretty deterministic
18:20imirkin_: anyways, as far as GL ES 3.0 goes, nouveau should be *pretty* conformant
18:20imirkin_: the only bits that aren't presently supported are, iirc, the "invariant" varying qualifier
18:21fitzgen: servo runs fine with nouveau, at least so far
18:21imirkin_: could be some special GM10x bugs of course... although those are much more likely to be misrenders than crashes
18:39karolherbst: imirkin_: you remember those weird issues in talos principle? It also uses precise....
18:44karolherbst: imirkin_: and it seems like I fixed those flickers there as well, will verify later
18:44imirkin_: iirc those flickers were fixed a long time ago
18:45imirkin_: by some magical change that i never figured out what caused any differences
18:45imirkin_: could easily have been some rejiggering of mul/add stuff
18:45imirkin_: iirc it was kayden's change to change the way functions were inlined in glsl
18:45karolherbst: imirkin_: ohhh, okay
18:45karolherbst: I remember
18:45karolherbst: so now we have an explenation
18:47imirkin_: not really
18:47imirkin_: but we have a possible theory
18:47imirkin_: also explains why it never happened on tesla - tesla only has muladd, no fma
18:48imirkin_: iirc it didn't happen on fermi either though, just kepler. or maybe both? dunno, it's been a while.
18:48karolherbst: no idea
18:48karolherbst: never checked fermi
18:48karolherbst: just tesla and kepler
18:49karolherbst: but there are a lot of things which might alter the generated binary
18:49karolherbst: pmoreau also had this one hack which chose a different code path and it was fixed as well
19:14karolherbst: mhhhh, I think I don't respect "precision highp float;" stuff in the glsl->tgsi translation right now
19:15tobijk: whats the firmware situation for pascal, thermal mangement and reclocking btw? is it theoretically doable with the released firmware?
19:15karolherbst: that's actually the reason why those CTS test fail
19:15karolherbst: tobijk: same as always
19:15tobijk: karolherbst: so no? :D
19:15tobijk: i'm really out of the loop
19:15karolherbst: don't expect things to change next year
19:15karolherbst: I don't
19:16tobijk: mh well maybe if nvidia releases the next generation
19:17karolherbst: fun, glsl doesn't handle highp
19:17karolherbst: allthough "highfp" is the default
19:18tobijk: imirkin_: had you time to remember what was the problem with tess and clip? maybe i could figure someting out, but i'd need a starting point :/
19:34imirkin_: i did not.
19:58karolherbst: imirkin_: I looked a bit deeper into that CTS precision test and I think we might have a problem doing mul+add=mad in general: https://gist.github.com/karolherbst/80c0f36c7dbaa059fa357aff17af7c4f
19:58karolherbst: the only mads generated are from the calculations for weightedSum.w
19:58karolherbst: and with those mads it fails the test
19:59karolherbst: only if we split the mad up it succeds
19:59karolherbst: I've added the tgsi as well
21:27fitzgen: does rpmfusion have a mesa with --enable-texture-float ?
21:27fitzgen:is having tons of trouble building from source
21:27imirkin_: you might want to ask in a distro support channel... we generally don't do distro support here
21:28imirkin_: too many distros with too much crazy
21:28fitzgen: ok, thanks'
21:31RSpliet: fitzgen: mesa from Fedora has texture float enabled, see https://kojipkgs.fedoraproject.org//packages/mesa/17.0.5/3.fc25/data/logs/x86_64/build.log
21:31imirkin_: he wants it without their "disable float rt's on swrast" patch
21:31fitzgen: RSpliet: but they have a patch that disables it for llvmpipe
21:34RSpliet: fitzgen: lyude built the most recent package, he's the best person to help you out I bet (either here or in #fedora-devel)
21:37RSpliet: also, I don't think mesa has a "disable float rt's on swrast" patch, looking through the patches that are applied ( https://koji.fedoraproject.org/koji/rpminfo?rpmID=9909359 )
21:39fitzgen: RSpliet: is it not https://koji.fedoraproject.org/koji/fileinfo?rpmID=9909359&filename=0002-hardware-gloat.patch ?
21:39RSpliet: oh, scrap that, the "0002-hardware-gloat.patch" does exactly that.
21:39RSpliet: yeah, was already halfway into typing that
23:09Lyude: RSpliet: pst, it's she :P
23:10Lyude: also, anyone know if there's a way that I can make nouveau dump all of the mmio registers it's reading to/from for a certain subdev?
23:11karolherbst: Lyude: not really
23:11imirkin_: Lyude: yeah
23:11karolherbst: Lyude: well in theory yes, but there are some nasty registers having some effects on read
23:11imirkin_: Lyude: there are trace levels above 'trace' although you have to build your kernel for i
23:11karolherbst: so allthough this might sounds like a good idea, reading out certain regs could mess up the driver/GPU
23:12imirkin_: Lyude: and then you can enable that super-high debug level for just the one subdev
23:12imirkin_: (paranoid? i forget the names...)
23:12karolherbst: imirkin_: but this doesn't include all register reads, does it?
23:12imirkin_: it includes *everything*
23:12imirkin_: reads, writes, everything
23:12Lyude: karolherbst: I don't need to actually read the registers, just need nouveau to say something in dmesg when it writes to things
23:12imirkin_: it's compiled out by default since otherwise stuff can't get inlined
23:12karolherbst: ohh, mhh I see
23:13karolherbst: I check
23:15karolherbst: mhh I doubt this exists, except ioread32 has a logging functionality internally
23:17karolherbst: there are two higher levels though: paranoia and spam
23:18skeggsb: yeah, they don't really do anything
23:18karolherbst: I just checked
23:18imirkin_: spam used to print stuff
23:18imirkin_: i guess that got nixed in one of the rewrites?
23:18skeggsb: pretty much, yeah
23:18skeggsb: i figured it was pointless, we have mmiotrace...
23:20imirkin_: skeggsb: btw, you probably missed this bug as it was miscategorized... https://bugs.freedesktop.org/show_bug.cgi?id=101368
23:20imirkin_: skeggsb: i've gone as far as i can with debugging it
23:21karolherbst: maybe that's the one regression we know already?
23:22karolherbst: doubtfull though because it's different
23:22skeggsb: i haven't seen that specific bug, but i'm aware of an issue that could be related... i played with it for quite a while, but couldn't manage to reproduce
23:22karolherbst: skeggsb: I can
23:23karolherbst: and your patch fixes it
23:23skeggsb: i identified several things that are horribly wrong with the pmu/falcon stuff on gt21x though, which could be related
23:23skeggsb: karolherbst: which patch?
23:23karolherbst: the one your wrote on the bug
23:23skeggsb:is not on that bug
23:23skeggsb: oh, the one ilia linked to?
23:23karolherbst: I even replied on the bug and poked you on IRC
23:24skeggsb: that's not the same issue
23:24karolherbst: ahh, I see
23:24karolherbst: but how is that going by the way?
23:24skeggsb: it's been sent to -fixes, and i seen greg queue them up for stable
23:24karolherbst: nice :)
23:25karolherbst: but it could be the same issue somehow
23:25skeggsb: more awesome if it didn't happen to begin with
23:25karolherbst: 4.10.13 was fine
23:25karolherbst: and your patch was also added for 4.10.14 or so?
23:26skeggsb: yeah, he's seeing the issue on 4.11, not any 4.10.x
23:26karolherbst: ohh wait, 4.10 is EOL
23:29karolherbst: I should sleep already...
23:31karolherbst: I could check on my tesla machine
23:31Lyude: karolherbst: before you go
23:31Lyude: how do I turn up the perf level for nouveau?
23:32RSpliet: Lyude: /sys/kernel/debug/dri/<node>/pstate
23:32RSpliet: lists a number of perflvls, the identifier it starts with can be written to that debugfs node to set the respective perflvl
23:32karolherbst: except the last line
23:32karolherbst: the last line is like the status line
23:32Lyude: cool, getting close enough to having some real code for full blcg on kepler that I want to try crashing things now
23:33RSpliet: Lyude: great stuff!
23:33Lyude: still probably need to do some stuff on reclocking though, it looks like some blcg stuff gets turned off/on (one of the two) on reclocking operations, or at least something related to the pmu
23:34RSpliet: (and my apologies for getting gender wrong... don't take it personal! :-) )
23:35Lyude: oh it's np! just letting you know :)
23:35RSpliet: which cards did you observe this on? I recall something like this on... GT215-ish?
23:36Lyude: both fermi and kepler seem to do it according to the vbios traces
23:37Lyude: trying to find the exact one I actually saw it happening in
23:39Lyude: hm, I migfht be misremembering now that I look at it
23:58imirkin_: skeggsb: well, that GT218 guy seems willing to test stuff, so if you need more info ... :)