00:26feep: tried to load modded minecraft... hardlocked
00:26feep: to be fair, this is not very surprising
00:26imirkin_: do the mods involve threading?
00:27feep: it's plausible
00:29feep: also NvBoost appears to be doing nothing, how do I know if it ostensibly worked?
00:30imirkin_: just lets you use higher cstates iirc
00:30feep: this is on a gt 645m, nve7.. presumably they just don't have that?
00:31imirkin_: dunno, karol would probably have to look at your vbios
00:32feep: boost entry and rated tdp entry is the same
00:33feep: but it's a bit weird because pstate lists it as 708Mhz when running, and max listed is 1417
00:33feep: idk if that's even right tho.
00:33imirkin_: pastebin pstate file
00:34feep: to clarify, this is nouveau
00:34feep: to clarify, this is optimus, ie. DRI_PRIME offloading
00:35feep: https://gist.github.com/e623ef5f1fe2aed3244a8dad893de317 pstate https://gist.github.com/730a60b5a58db2df2e7306fe0bd89cd0 nvbios dump
00:35feep: ah nm, it's correct.
00:36feep: notebookcheck lists my card as not having boost.
00:41feep: trying a second time to start minecraft
00:49Lyude: RSpliet: doh; figured it out: I had a fb clkgate programming step previously + one in gr, but I completely forgot I had an fb one
00:49Lyude: so that spot you found was the correct spot that apparently I'd already found, .ol
00:50Lyude: i suppose that's what happens when you stop working on something for a couple months...
00:55feep: hm, now it dies before it gets to the menu :( nvm I guess
00:57imirkin_: hm, the "turbo boost" entry is entry 0
00:57imirkin_: which should get you 780mhz?
00:59imirkin_: not sure how to read all the stuff together though
07:52karolherbst: Lyude: doing some piglit runs today with your clock gating patches
10:09karolherbst: skeggsb: mhh, you should run a full piglit run like multiple times until you get a lot of errors
10:09karolherbst: Mesa: User error: GL_OUT_OF_MEMORY in glTexStorage3D :/
10:11karolherbst: /home/karol/Dokumente/repos/piglit/bin/tex3d-maxsize is fixed be reloading nouveau
10:11karolherbst: I guess we are leaking vram now
10:27karolherbst: duh... and I also made a mistake in my madsp patch
10:39karolherbst: also pascal CTS: https://trello.com/c/wkJVv5cS/17-cts-master-460x-khronos-mustpass-gl45-master-status
10:43pmoreau: Not too bad at all!
10:44karolherbst: could have been worse
10:45karolherbst: I think I would still rather fix most of the stuff for kepler first
10:46karolherbst: because to be honest, kepler and maxwell are by far the only GPUs where we actually have a chance to get decent perf in the near future
11:06karolherbst: pmoreau: if you have some time, it would be cool if you could express everything missing/broken in your current spir-v thing as little tests in your repository
11:07karolherbst: then I can simply work on that and try to fix those issues
11:07karolherbst: or is there a list of known issues?
11:21pmoreau: karolherbst: I have tasks on my Trello about some broken things, there is the async_work_copy test that is failing, and for missing features, you can have a look at https://phabricator.pmoreau.org/w/mesa/spirv_support/ and https://phabricator.pmoreau.org/w/mesa/opencl_through_spirv_current_status/ (for a higher level view)
11:22karolherbst: ohh, you added support for clCreateProgramWithSource?
11:23pmoreau: Going to finish some cleaning in the apartment + changing my desk configuration, and then work on clCreateProgramWithBinary (need to fix a few things) and wrap up a second version of the clover series, before cleaning up the whole nv_ir_from_spirv.cpp file.
11:23karolherbst: well, if all program creation methods are supported, I would just go ahead and run the CTS and fix issues I find there
11:24pmoreau: Well, it’s hacky and uses KhronosGroup/SPIRV-LLVM repo, which is based on LLVM 3.6. So no change of upstreaming that patch.
11:24karolherbst: I would mainly only fix issues inside codegen and don't touch your spirv_to_nvir stuff
11:24karolherbst: well right
11:24karolherbst: if we are able to run the OpenCL CTS that is fine by me
11:24karolherbst: then we can just add missing features in codegen
11:24pmoreau: Sounds good. But for the CTS, it’s going to miss some SPIR-V opcodes.
11:24karolherbst: like instructions missing or stuff like that mad24 thing
11:25karolherbst: well, if I get 50% fails, I just choose to fix the one I like :p
11:25pmoreau: I started running the CTS, see https://trello.com/c/mHuV36bJ/20-get-testbasic-from-the-opencl-cts-to-pass
11:26pmoreau: If you could try to fix the loop, plus look into the images after that (I did a bit of work, but not enough to get them working), that would be awesome and should get a very close to 100% on the basic tests.
11:26karolherbst: well robclark suggested that we might switch over to nir in the long term, but even then I would keep that direct spir-v to nvir apporach alongside. I could imagine that we have more information and need less CPU time overall if we directly convert stuff
11:26karolherbst: not quite sure how we want to do the cl to nvir in the end
11:27karolherbst: pmoreau: mhh, well, I would not like to touch your spir_to_nvir file if you say you gonna clean it up still
11:27pmoreau: For the loop, I have a simple test in control_flow/loop_with_if (IIRC) which fails due to buildLiveSets failing to compute the proper live range of a reg. Could be an error in the codegen code, or me not setting the proper edges.
11:27karolherbst: as long as there are other important issues
11:27karolherbst: so codegen only issue most likely
11:28karolherbst: I see
11:28karolherbst: yeah well
11:28karolherbst: well, I would focus on adding missing instructions for now
11:28karolherbst: but I will just check how well that CTS run turns out
11:29karolherbst: and just go from there
11:29pmoreau: Have a look at the images then. Some things are missing in codegen I think, like uploading some information.
11:29pmoreau: But it was more than a year ago that we looked into it with hakzsam_ so I might misremember.
11:30karolherbst: images seems to be borked on pascal anyway
11:30karolherbst: I got a lot of crashes in KHR-GL45.shader_image_load_store.basic-allTargets-* tests
11:31karolherbst: and we also have non working 3d images on kepler
11:31karolherbst: nice, no regressions with my madsp patch :)
11:36karolherbst: imirkin: I did a piglit run with my madsp patch on nve6 (allthough I wrote nve4 in the mail... silly me) and fixed a little issue. Should be fine now
14:41pmoreau: tobijk: The third source does not distinguish between unsigned or signed, we only need to specify how many bits. That’s why there is no ‘U’.
14:44tobijk: pmoreau: oh right, i should read the defines :D
14:44karolherbst: tobijk: and you should understand the commit messages :p
14:45tobijk: (skipped that this time) :/
14:45pmoreau: No worries :-)
14:55imirkin: karolherbst: seems generally fine, but i'll poke at the encodings a bit, hope you don't mind
14:55karolherbst: please do
14:55imirkin: since it's not a perfect match for what the old code was doing, but the old code wasn't 100% used, so who knows
14:55karolherbst: I didn't test gk110
14:55imirkin: i have a GK208 here
14:55karolherbst: the old code tried to be smart about something it shouldn't be in the first place
14:56imirkin: i just mean that only a handful of actual values were ever passed to the macro
14:56imirkin: but the emitter handled all kinds of junk
14:56karolherbst: I know
14:56imirkin: so ... i just want to play around with the opcode ;)
14:56imirkin: i suspect you covered it, but since it's not actively fixing anything, doesn't seem as urgent
14:56karolherbst: just keep in mind, that the old code produced equlivalent instructions for (0, 0, 3) and (1, 1, 0)
14:57feep: weird question. is there a way to run a gpu so that it produces maximal heat, but doesn't get damaged? this room is really cold even with the heating on max
14:57karolherbst: feep: clock to highest and run furmark
14:57feep: XD this is a laptop tho, so I'm worried it'll hurt itself
14:58feep: I guess that's rather intrinsic in "as hot as possible"
14:58karolherbst: well furmark increases the power consumption to 70W here on my GPU
14:58karolherbst: and it is a laptop one
14:58karolherbst: 80W is max
15:00karolherbst: imirkin: well my main intentation was to clean up that code and make it actually usable and understandable. I doubt we will need it that much, but I would prefer the suggested version over the old one
15:01feep: oh wow yeah, there goes the temperature
15:01feep: interestingly furmark runs at 11fps on intel, 15fps on nouveau
15:01karolherbst: slow GPU you have their
15:01feep: gt 645m
15:01karolherbst: did you clock to the highest clocks?
15:02feep: and yeah it's at 0xf
15:02feep: core 708 mem 1800
15:02karolherbst: feep: the sptate file should specify higher clocks, right?
15:02feep: though that's with a lightly patched kernel, otherwise it won't run at all
15:02feep: karolherbst: AC: core 708 MHz memory 1800 MHz
15:02feep: that's the max listed, both in pstate and by the vendor
15:02karolherbst: then your GPU has no boost entries I guess
15:03karolherbst: except you boot with nouveau.config=NvBoost=2 already
15:03feep: the vendor also lists no boost
15:03karolherbst: and if the 0f line also says 708, then it should be the max
15:04feep: this is on 4.14.1, mesa 17.3, using PRIME offloading
15:04feep: and it does push temp up to 80°C, which is okay
15:04feep: sensors lists "high" as 95, so I feel like this gpu is still being underutilized
15:05feep: though 80 is probably pretty sustainable for heating
15:11feep: 443 points nouveau, 292 points intel
15:12karolherbst: feep: furmark is highly a memory constraint benchmark and because your GPU has only DDR3 there isn't much of a difference
15:12feep: ah, that makes sense.
15:13karolherbst: feep: I am sure if you clock to 0a, you get a big perf impact
15:13feep: I mean sure
15:13feep: btw, any news on the ~mysterious~ 0x0f entry
15:15feep: (to recap: my timing mapping table has a 0x0f entry in the second row (ramcfg=1), first column; however, the timing table only has twelve entries, and only six are not zeroes
15:16feep: if I just skip the weird timing mapping table entry, everything works fine
15:17karolherbst: feep: well right, we need to take a deeper look at those tables anyway
15:17karolherbst: it just won't happen any time soon
15:17feep: ah, fair enough
15:17feep: think I should write up a placeholder patch?
15:17karolherbst: maybe we will have some proper things in two or three months
15:17feep: just so reclocking basically works for now
15:17karolherbst: feep: won't really help, because you don't know what other cards you might break with it
15:17feep: that's true. :/
15:17karolherbst: right and that's the painful part
15:17feep: config flag?
15:18karolherbst: might be an idea
15:18karolherbst: aks skeggsb first though
15:18feep: tbf, I don't think this is a problem for many people.
15:18feep: might even be just this laptop :P
15:19karolherbst: yeah, I doub this as well
15:19feep: the person who filed the bug has the exact same model https://bugs.freedesktop.org/show_bug.cgi?id=91523
15:19karolherbst: this still makes it even more painful to change anything there
15:20feep: eventually they'll all be broken and you won't have to worry anymore :P
15:20feep: hm, turn flag on by default for this specific card?
15:20karolherbst: that's the problem
15:20feep: is there a way to do things like that, card specific fixups?
15:20karolherbst: it is kind of hard to actually detect a specific card
15:20karolherbst: there are some subvendors/submodel ids
15:20karolherbst: but I have no idea how much we can trust those
15:21karolherbst: most of the time it is good enough
15:21karolherbst: but who knows
15:21feep: "lenovo gk107m" seeeems pretty unambiguous
15:21karolherbst: and then it is
15:21karolherbst: I am sure it isn't the only gk107m lenovo has
16:44pmoreau: If we have `mad u64 %r3 %r0 %r1 %r2` where %r0 is a power of 2 immediate, is it legal to swap %r0 and %r1 to get `shladd u64 %r3 %r1 log2(%r0) %r2`?
16:46RSpliet: pmoreau: don't see why not
16:47pmoreau: Oh yeah, it should only be with floating points that swapping inputs to a multiplication might not be legal/recommended.
16:50pmoreau: Eh, the log2 function no longer seem to work for u64: log2(0x4) -> 0x22 :o
16:56imirkin: pmoreau: except that shladd u64 isn't a thing
16:56imirkin: otherwise yes, it's legal
16:56imirkin: you'd have to add a splitter for the shladd u64
16:56pmoreau: Should be doable
16:56imirkin: note that SM35 adds nice ops for this
16:56imirkin: but SM20/SM30 ... suck
16:57imirkin: i had to account for this in the OP_SHL/SHR impls for int64
16:57imirkin: (i totally forget the details, just the pain)
16:59pmoreau: Thanks! I’ll have a look.
17:00imirkin: i left some comments.
17:00imirkin: not sure how to decompose that into a SHLADD... can it output a carry bit?
17:02imirkin: btw, that algo relies on OR but it could just as well rely on ADD. and SM50+ has an IADD3 :)
17:02imirkin: although ... one of the args has to be an imm i think? hm.
17:02pmoreau: No clue. I think I’ll leave it as-is for now, and keep a note to fix it at some point.
17:03imirkin: might be easiest ;)
17:46pmoreau: imirkin: Was the SHL u64 implemented recently?
17:47pmoreau: (or merged to master recently)
17:47imirkin: like ... 6 months ago
17:47imirkin: depends on when recent is ;)
17:47imirkin: whenever we added int64 support
17:47imirkin: (i added? i forget tbh)
17:47pmoreau: It would have been less than that, so that’s not the issue then.
17:48imirkin: commit 1e4f5988edd2fb9eafcf5010498b0e93bae1ae26 and its parents
17:48imirkin: commit 61d7676df779829e713cdbc2569f7ab50492078d added the shl stuff
17:49pmoreau: I rebased for the last time mid November, and when I rebased today, I am hitting some new issues, which go away if I disable ConstantFolding.
17:50imirkin: mmmm.... i might have pushed something recently
17:50imirkin: and a few of its parents
17:50pmoreau: Let’s have a look
17:51imirkin: shouldn't affect constfolding directly, but it's all stuff that's downstream of that
17:53imirkin: oh, i did change mod around
17:53imirkin: i wonder if i broke mod u64
17:55pmoreau: I’m not using mod, so that should be OK.
17:56pmoreau: But somehow, I end up with an `add u32 %r97 neg 0x00000002 0x00000020` which triggers an assert when emitting the code, as src0 should not be an immediate.
17:56imirkin: yeah, i definitely broke mod
17:57imirkin: (for 64-bit)
17:57karolherbst: pmoreau: interesting
17:58karolherbst: pmoreau: that should be opted to a mov anyway
17:58pmoreau: Here is the program before and after constant folding: https://hastebin.com/quhokedigo.pl (I disable the other optimisations)
18:00karolherbst: pmoreau: this makes no sense...
18:01hakzsam_: karolherbst: pmoreau, images should work on fermi, kepler and maxwell at least, 3D images are unsupported though
18:01imirkin: oh good. we can't get a 64-bit mod.
18:01imirkin: (something lowers it away in glsl ir or something)
18:02imirkin: 3d images are unsupported on kepler and earlier
18:02imirkin: they work on maxwell
18:02pmoreau: hakzsam_: When we looked at it (at XDC 2016), there was something missing to get it working with SPIR-V, like some format not being uploaded by Nouveau; does Nouveau do bindless nowadays?
18:02karolherbst: well, pascal says no
18:02hakzsam_: no bindless
18:03imirkin: pmoreau: i think that's after that expandMAD thing
18:03hakzsam_: pmoreau: wait, you mean AB_bindless_texture?
18:03imirkin: pmoreau: er, expand 64->32 bit ops thing
18:03imirkin: pmoreau: i have patches for bindless, but they're incomplete
18:03pmoreau: I think SPIR-V “expects” bindless, which is what was missing.
18:04imirkin: spir-v expects separate texture/sampler bindings
18:04imirkin: which nouveau should be able to support
18:04pmoreau: I should have another look at it, some days
18:05imirkin: either way, most of the problems i ran into were on the driver level
18:05imirkin: the compiler patches were pretty trivial
18:05imirkin: have a look in my 'cts' branch
18:12pmoreau: imirkin: You are right, it’s not in the ConstantFolding pass.
18:31rhyskidd: imirkin: thanks for that comment. I'll wait for any other review feedback and then respin the series
19:46rhyskidd: so this was a pleasant surprise with v4.15-rc1: GP107M isn't hitting that MMIO timeout fault at 409800
19:47rhyskidd: karolherbst: do you want me to try on top of v4.15-rc1 this patch? https://gist.github.com/karolherbst/d82372046148582d8204f26d62af670b
19:47rhyskidd: ^^ i am disabling runtime pm, so imagine won't link a sleep cycle too much ...
19:50karolherbst: we kind of know what is going on and skeggsb thought the current code actually does this in some way already
19:51rhyskidd: haven't really stressed it, but glxgears did at least appear to render via DRI_PRIME
19:52karolherbst: well, right
19:53karolherbst: not getting that "MMIO timeout fault at 409800" and prime offloaded OpenGL kind of goes hand in hand here
20:28rhyskidd: perhaps -- but it's still an improvement over 4.14 where i was getting the "MMIO timeout fault at 409800" with prime offloaded OpenGL ... https://paste.debian.net/998798/
20:53Lyude: karolherbst: nice! let me know how it goes
20:53karolherbst: Lyude: pretty good so far
20:53Lyude: I'm still working on getting kepler2 to work with BLCG, although it looks like it's just a simple mistake of me enabling CG_CTRL too early
22:23rhyskidd: has anyone here got intel-gpu-tools working on nouveau?
22:24rhyskidd: i saw tagr had some older wip patches
22:25pmoreau: I doubt I tried. Also, I’m not even sure I tried it on Intel.
22:25karolherbst: rhyskidd: you mean the tools or the test?
22:25imirkin_: there's been talk, but i'm not aware of anyone actually runnign i
22:26karolherbst: intel runs them afaik
22:38rhyskidd: the tests, given they exercise the drm interfaces --- i can see some works been going on to definitely make them useful beyond i965 hw
22:38rhyskidd: yeh, they run them on their CI
22:39rhyskidd: and anholt had a fork running on vc4 which was apparently helpful finding weird corner cases
22:40karolherbst: well I don't think there is so much intel specific code in there actually
22:40karolherbst: I am sure they can be ran on nouveau in less than a day
22:41karolherbst: they actually want to rename i
22:41rhyskidd: just to igt?
22:41karolherbst: well, intel shouldn't be inside the name anymore
22:41karolherbst: maybe it will be igt, where i doesn't stand for intel
22:54imirkin_: it's a lot of non-intel-specific tests, in addition to some intel-specific ones
22:54imirkin_: it does seem like the sort of thing that should be mostly plug & play for nouveau
22:55rhyskidd: that's my thinking too
22:55imirkin_: of course a lot of the automation apparently relies on being able to read out crc's for the scanned out fb
22:56imirkin_: i suspect nvidia hw from the semi-modern era has this
22:56imirkin_: but it'll take some snooping to find it
22:57imirkin_: somewhere in the SOR/PIOR registers
22:57imirkin_: or crtc