00:26 feep: welp
00:26 feep: tried to load modded minecraft... hardlocked
00:26 feep: to be fair, this is not very surprising
00:26 imirkin_: do the mods involve threading?
00:27 feep: it's plausible
00:29 feep: also NvBoost appears to be doing nothing, how do I know if it ostensibly worked?
00:30 imirkin_: just lets you use higher cstates iirc
00:30 feep: this is on a gt 645m, nve7.. presumably they just don't have that?
00:31 imirkin_: dunno, karol would probably have to look at your vbios
00:32 feep: boost entry and rated tdp entry is the same
00:33 feep: but it's a bit weird because pstate lists it as 708Mhz when running, and max listed is 1417
00:33 feep: idk if that's even right tho.
00:33 imirkin_: pastebin pstate file
00:34 feep: to clarify, this is nouveau
00:34 feep: er
00:34 feep:facepalms
00:34 feep: to clarify, this is optimus, ie. DRI_PRIME offloading
00:35 feep: https://gist.github.com/e623ef5f1fe2aed3244a8dad893de317 pstate https://gist.github.com/730a60b5a58db2df2e7306fe0bd89cd0 nvbios dump
00:35 feep: ah nm, it's correct.
00:36 feep: notebookcheck lists my card as not having boost.
00:36 feep: :/
00:41 feep: trying a second time to start minecraft
00:49 Lyude: RSpliet: doh; figured it out: I had a fb clkgate programming step previously + one in gr, but I completely forgot I had an fb one
00:49 Lyude: so that spot you found was the correct spot that apparently I'd already found, .ol
00:50 Lyude: i suppose that's what happens when you stop working on something for a couple months...
00:55 feep: hm, now it dies before it gets to the menu :( nvm I guess
00:57 imirkin_: hm, the "turbo boost" entry is entry 0
00:57 imirkin_: which should get you 780mhz?
00:59 imirkin_: not sure how to read all the stuff together though
07:52 karolherbst: Lyude: doing some piglit runs today with your clock gating patches
10:09 karolherbst: skeggsb: mhh, you should run a full piglit run like multiple times until you get a lot of errors
10:09 karolherbst: *fails
10:09 karolherbst: Mesa: User error: GL_OUT_OF_MEMORY in glTexStorage3D :/
10:11 karolherbst: /home/karol/Dokumente/repos/piglit/bin/tex3d-maxsize is fixed be reloading nouveau
10:11 karolherbst: I guess we are leaking vram now
10:27 karolherbst: duh... and I also made a mistake in my madsp patch
10:39 karolherbst: also pascal CTS: https://trello.com/c/wkJVv5cS/17-cts-master-460x-khronos-mustpass-gl45-master-status
10:43 pmoreau: Not too bad at all!
10:44 karolherbst: yeah
10:44 karolherbst: could have been worse
10:45 karolherbst: I think I would still rather fix most of the stuff for kepler first
10:46 karolherbst: because to be honest, kepler and maxwell are by far the only GPUs where we actually have a chance to get decent perf in the near future
11:06 karolherbst: pmoreau: if you have some time, it would be cool if you could express everything missing/broken in your current spir-v thing as little tests in your repository
11:07 karolherbst: then I can simply work on that and try to fix those issues
11:07 karolherbst: or is there a list of known issues?
11:21 pmoreau: karolherbst: I have tasks on my Trello about some broken things, there is the async_work_copy test that is failing, and for missing features, you can have a look at https://phabricator.pmoreau.org/w/mesa/spirv_support/ and https://phabricator.pmoreau.org/w/mesa/opencl_through_spirv_current_status/ (for a higher level view)
11:22 karolherbst: ohh, you added support for clCreateProgramWithSource?
11:23 pmoreau: Going to finish some cleaning in the apartment + changing my desk configuration, and then work on clCreateProgramWithBinary (need to fix a few things) and wrap up a second version of the clover series, before cleaning up the whole nv_ir_from_spirv.cpp file.
11:23 karolherbst: well, if all program creation methods are supported, I would just go ahead and run the CTS and fix issues I find there
11:23 karolherbst: well
11:24 pmoreau: Well, it’s hacky and uses KhronosGroup/SPIRV-LLVM repo, which is based on LLVM 3.6. So no change of upstreaming that patch.
11:24 karolherbst: I would mainly only fix issues inside codegen and don't touch your spirv_to_nvir stuff
11:24 karolherbst: well right
11:24 karolherbst: but
11:24 karolherbst: if we are able to run the OpenCL CTS that is fine by me
11:24 karolherbst: then we can just add missing features in codegen
11:24 pmoreau: Sounds good. But for the CTS, it’s going to miss some SPIR-V opcodes.
11:24 karolherbst: like instructions missing or stuff like that mad24 thing
11:25 karolherbst: well, if I get 50% fails, I just choose to fix the one I like :p
11:25 pmoreau: I started running the CTS, see https://trello.com/c/mHuV36bJ/20-get-testbasic-from-the-opencl-cts-to-pass
11:25 karolherbst: cool
11:26 pmoreau: If you could try to fix the loop, plus look into the images after that (I did a bit of work, but not enough to get them working), that would be awesome and should get a very close to 100% on the basic tests.
11:26 karolherbst: well robclark suggested that we might switch over to nir in the long term, but even then I would keep that direct spir-v to nvir apporach alongside. I could imagine that we have more information and need less CPU time overall if we directly convert stuff
11:26 karolherbst: not quite sure how we want to do the cl to nvir in the end
11:27 karolherbst: pmoreau: mhh, well, I would not like to touch your spir_to_nvir file if you say you gonna clean it up still
11:27 pmoreau: For the loop, I have a simple test in control_flow/loop_with_if (IIRC) which fails due to buildLiveSets failing to compute the proper live range of a reg. Could be an error in the codegen code, or me not setting the proper edges.
11:27 karolherbst: well
11:27 karolherbst: as long as there are other important issues
11:27 karolherbst: ahh
11:27 karolherbst: so codegen only issue most likely
11:28 karolherbst: I see
11:28 pmoreau: Possibly
11:28 karolherbst: yeah well
11:28 karolherbst: well, I would focus on adding missing instructions for now
11:28 karolherbst: personally
11:28 karolherbst: but I will just check how well that CTS run turns out
11:29 karolherbst: and just go from there
11:29 pmoreau: Have a look at the images then. Some things are missing in codegen I think, like uploading some information.
11:29 karolherbst: right
11:29 pmoreau: But it was more than a year ago that we looked into it with hakzsam_ so I might misremember.
11:29 karolherbst: i
11:30 karolherbst: images seems to be borked on pascal anyway
11:30 karolherbst: I got a lot of crashes in KHR-GL45.shader_image_load_store.basic-allTargets-* tests
11:31 karolherbst: and we also have non working 3d images on kepler
11:31 pmoreau: OK
11:31 pmoreau: bbiab
11:31 karolherbst: nice, no regressions with my madsp patch :)
11:36 karolherbst: imirkin: I did a piglit run with my madsp patch on nve6 (allthough I wrote nve4 in the mail... silly me) and fixed a little issue. Should be fine now
14:41 pmoreau: tobijk: The third source does not distinguish between unsigned or signed, we only need to specify how many bits. That’s why there is no ‘U’.
14:44 tobijk: pmoreau: oh right, i should read the defines :D
14:44 karolherbst: tobijk: and you should understand the commit messages :p
14:45 tobijk: (skipped that this time) :/
14:45 pmoreau: No worries :-)
14:55 imirkin: karolherbst: seems generally fine, but i'll poke at the encodings a bit, hope you don't mind
14:55 karolherbst: please do
14:55 imirkin: since it's not a perfect match for what the old code was doing, but the old code wasn't 100% used, so who knows
14:55 karolherbst: I didn't test gk110
14:55 karolherbst: well
14:55 imirkin: i have a GK208 here
14:55 karolherbst: the old code tried to be smart about something it shouldn't be in the first place
14:56 imirkin: i just mean that only a handful of actual values were ever passed to the macro
14:56 imirkin: but the emitter handled all kinds of junk
14:56 karolherbst: I know
14:56 imirkin: so ... i just want to play around with the opcode ;)
14:56 karolherbst: yeah
14:56 imirkin: i suspect you covered it, but since it's not actively fixing anything, doesn't seem as urgent
14:56 karolherbst: just keep in mind, that the old code produced equlivalent instructions for (0, 0, 3) and (1, 1, 0)
14:57 feep: weird question. is there a way to run a gpu so that it produces maximal heat, but doesn't get damaged? this room is really cold even with the heating on max
14:57 karolherbst: feep: clock to highest and run furmark
14:57 feep: XD this is a laptop tho, so I'm worried it'll hurt itself
14:58 feep: I guess that's rather intrinsic in "as hot as possible"
14:58 karolherbst: well furmark increases the power consumption to 70W here on my GPU
14:58 karolherbst: and it is a laptop one
14:58 karolherbst: 80W is max
15:00 karolherbst: imirkin: well my main intentation was to clean up that code and make it actually usable and understandable. I doubt we will need it that much, but I would prefer the suggested version over the old one
15:01 feep: oh wow yeah, there goes the temperature
15:01 feep: interestingly furmark runs at 11fps on intel, 15fps on nouveau
15:01 karolherbst: slow GPU you have their
15:01 feep: gt 645m
15:01 karolherbst: did you clock to the highest clocks?
15:01 karolherbst: mhh
15:02 feep: and yeah it's at 0xf
15:02 feep: core 708 mem 1800
15:02 karolherbst: feep: the sptate file should specify higher clocks, right?
15:02 feep: though that's with a lightly patched kernel, otherwise it won't run at all
15:02 feep: karolherbst: AC: core 708 MHz memory 1800 MHz
15:02 feep: that's the max listed, both in pstate and by the vendor
15:02 karolherbst: ahh
15:02 karolherbst: then your GPU has no boost entries I guess
15:03 karolherbst: except you boot with nouveau.config=NvBoost=2 already
15:03 feep: yep
15:03 feep: the vendor also lists no boost
15:03 karolherbst: okay
15:03 karolherbst: and if the 0f line also says 708, then it should be the max
15:03 feep: yeah
15:04 feep: this is on 4.14.1, mesa 17.3, using PRIME offloading
15:04 feep: and it does push temp up to 80°C, which is okay
15:04 feep: sensors lists "high" as 95, so I feel like this gpu is still being underutilized
15:05 feep: though 80 is probably pretty sustainable for heating
15:11 feep: 443 points nouveau, 292 points intel
15:12 karolherbst: feep: furmark is highly a memory constraint benchmark and because your GPU has only DDR3 there isn't much of a difference
15:12 feep: ah, that makes sense.
15:13 karolherbst: feep: I am sure if you clock to 0a, you get a big perf impact
15:13 feep: XD
15:13 feep: I mean sure
15:13 karolherbst: well
15:13 feep: btw, any news on the ~mysterious~ 0x0f entry
15:15 feep: (to recap: my timing mapping table has a 0x0f entry in the second row (ramcfg=1), first column; however, the timing table only has twelve entries, and only six are not zeroes
15:15 feep: )
15:16 feep: if I just skip the weird timing mapping table entry, everything works fine
15:17 karolherbst: feep: well right, we need to take a deeper look at those tables anyway
15:17 karolherbst: it just won't happen any time soon
15:17 feep: ah, fair enough
15:17 feep: think I should write up a placeholder patch?
15:17 karolherbst: maybe we will have some proper things in two or three months
15:17 feep: just so reclocking basically works for now
15:17 karolherbst: feep: won't really help, because you don't know what other cards you might break with it
15:17 feep: that's true. :/
15:17 karolherbst: right and that's the painful part
15:17 feep: config flag?
15:18 feep: NvSkipBadTimingMappingEntries
15:18 karolherbst: might be an idea
15:18 karolherbst: aks skeggsb first though
15:18 feep: tbf, I don't think this is a problem for many people.
15:18 feep: might even be just this laptop :P
15:19 karolherbst: yeah, I doub this as well
15:19 feep: the person who filed the bug has the exact same model https://bugs.freedesktop.org/show_bug.cgi?id=91523
15:19 karolherbst: yeah
15:19 karolherbst: well
15:19 karolherbst: this still makes it even more painful to change anything there
15:20 feep: eventually they'll all be broken and you won't have to worry anymore :P
15:20 feep: hm, turn flag on by default for this specific card?
15:20 karolherbst: that's the problem
15:20 feep: is there a way to do things like that, card specific fixups?
15:20 karolherbst: it is kind of hard to actually detect a specific card
15:20 karolherbst: there are some subvendors/submodel ids
15:20 karolherbst: but I have no idea how much we can trust those
15:21 karolherbst: most of the time it is good enough
15:21 karolherbst: but who knows
15:21 feep: "lenovo gk107m" seeeems pretty unambiguous
15:21 karolherbst: and then it is
15:21 karolherbst: I am sure it isn't the only gk107m lenovo has
15:22 feep: bbiab
16:44 pmoreau: If we have `mad u64 %r3 %r0 %r1 %r2` where %r0 is a power of 2 immediate, is it legal to swap %r0 and %r1 to get `shladd u64 %r3 %r1 log2(%r0) %r2`?
16:46 RSpliet: pmoreau: don't see why not
16:47 pmoreau: Oh yeah, it should only be with floating points that swapping inputs to a multiplication might not be legal/recommended.
16:50 pmoreau: Eh, the log2 function no longer seem to work for u64: log2(0x4) -> 0x22 :o
16:56 imirkin: pmoreau: except that shladd u64 isn't a thing
16:56 imirkin: otherwise yes, it's legal
16:56 imirkin: you'd have to add a splitter for the shladd u64
16:56 pmoreau: Should be doable
16:56 imirkin: note that SM35 adds nice ops for this
16:56 imirkin: but SM20/SM30 ... suck
16:57 imirkin: i had to account for this in the OP_SHL/SHR impls for int64
16:57 imirkin: (i totally forget the details, just the pain)
16:58 imirkin: NVC0LegalizeSSA::handleShift
16:59 pmoreau: Thanks! I’ll have a look.
17:00 imirkin: i left some comments.
17:00 imirkin: not sure how to decompose that into a SHLADD... can it output a carry bit?
17:02 imirkin: btw, that algo relies on OR but it could just as well rely on ADD. and SM50+ has an IADD3 :)
17:02 imirkin: although ... one of the args has to be an imm i think? hm.
17:02 pmoreau: No clue. I think I’ll leave it as-is for now, and keep a note to fix it at some point.
17:03 imirkin: might be easiest ;)
17:46 pmoreau: imirkin: Was the SHL u64 implemented recently?
17:47 pmoreau: (or merged to master recently)
17:47 imirkin: like ... 6 months ago
17:47 imirkin: depends on when recent is ;)
17:47 imirkin: whenever we added int64 support
17:47 imirkin: (i added? i forget tbh)
17:47 pmoreau: It would have been less than that, so that’s not the issue then.
17:48 imirkin: commit 1e4f5988edd2fb9eafcf5010498b0e93bae1ae26 and its parents
17:48 imirkin: commit 61d7676df779829e713cdbc2569f7ab50492078d added the shl stuff
17:49 pmoreau: I rebased for the last time mid November, and when I rebased today, I am hitting some new issues, which go away if I disable ConstantFolding.
17:50 imirkin: mmmm.... i might have pushed something recently
17:50 imirkin: 0bd83d04612520ff97e21d41bcc3ad2e68e160df
17:50 imirkin: and a few of its parents
17:50 pmoreau: Let’s have a look
17:51 imirkin: shouldn't affect constfolding directly, but it's all stuff that's downstream of that
17:53 imirkin: oh, i did change mod around
17:53 imirkin: i wonder if i broke mod u64
17:55 pmoreau: I’m not using mod, so that should be OK.
17:56 pmoreau: But somehow, I end up with an `add u32 %r97 neg 0x00000002 0x00000020` which triggers an assert when emitting the code, as src0 should not be an immediate.
17:56 imirkin: yeah, i definitely broke mod
17:56 imirkin: gr
17:57 karolherbst: mhh
17:57 imirkin: (for 64-bit)
17:57 karolherbst: pmoreau: interesting
17:58 karolherbst: pmoreau: that should be opted to a mov anyway
17:58 pmoreau: Here is the program before and after constant folding: https://hastebin.com/quhokedigo.pl (I disable the other optimisations)
18:00 karolherbst: pmoreau: this makes no sense...
18:01 hakzsam_: karolherbst: pmoreau, images should work on fermi, kepler and maxwell at least, 3D images are unsupported though
18:01 imirkin: oh good. we can't get a 64-bit mod.
18:01 imirkin: (something lowers it away in glsl ir or something)
18:02 imirkin: 3d images are unsupported on kepler and earlier
18:02 imirkin: they work on maxwell
18:02 pmoreau: hakzsam_: When we looked at it (at XDC 2016), there was something missing to get it working with SPIR-V, like some format not being uploaded by Nouveau; does Nouveau do bindless nowadays?
18:02 karolherbst: well, pascal says no
18:02 hakzsam_: no bindless
18:03 imirkin: pmoreau: i think that's after that expandMAD thing
18:03 hakzsam_: pmoreau: wait, you mean AB_bindless_texture?
18:03 imirkin: pmoreau: er, expand 64->32 bit ops thing
18:03 imirkin: pmoreau: i have patches for bindless, but they're incomplete
18:03 pmoreau: I think SPIR-V “expects” bindless, which is what was missing.
18:04 imirkin: no
18:04 imirkin: spir-v expects separate texture/sampler bindings
18:04 imirkin: which nouveau should be able to support
18:04 pmoreau: I should have another look at it, some days
18:05 imirkin: either way, most of the problems i ran into were on the driver level
18:05 imirkin: the compiler patches were pretty trivial
18:05 imirkin: have a look in my 'cts' branch
18:12 pmoreau: imirkin: You are right, it’s not in the ConstantFolding pass.
18:31 rhyskidd: imirkin: thanks for that comment. I'll wait for any other review feedback and then respin the series
19:46 rhyskidd: so this was a pleasant surprise with v4.15-rc1: GP107M isn't hitting that MMIO timeout fault at 409800
19:46 rhyskidd: https://paste.debian.net/998791/
19:47 rhyskidd: karolherbst: do you want me to try on top of v4.15-rc1 this patch? https://gist.github.com/karolherbst/d82372046148582d8204f26d62af670b
19:47 rhyskidd: ^^ i am disabling runtime pm, so imagine won't link a sleep cycle too much ...
19:47 rhyskidd: s/link/like/
19:49 karolherbst: well
19:50 karolherbst: we kind of know what is going on and skeggsb thought the current code actually does this in some way already
19:51 rhyskidd: haven't really stressed it, but glxgears did at least appear to render via DRI_PRIME
19:52 karolherbst: well, right
19:53 karolherbst: not getting that "MMIO timeout fault at 409800" and prime offloaded OpenGL kind of goes hand in hand here
20:28 rhyskidd: perhaps -- but it's still an improvement over 4.14 where i was getting the "MMIO timeout fault at 409800" with prime offloaded OpenGL ... https://paste.debian.net/998798/
20:53 Lyude: karolherbst: nice! let me know how it goes
20:53 karolherbst: Lyude: pretty good so far
20:53 Lyude: I'm still working on getting kepler2 to work with BLCG, although it looks like it's just a simple mistake of me enabling CG_CTRL too early
22:23 rhyskidd: has anyone here got intel-gpu-tools working on nouveau?
22:24 rhyskidd: i saw tagr had some older wip patches
22:25 pmoreau: I doubt I tried. Also, I’m not even sure I tried it on Intel.
22:25 karolherbst: rhyskidd: you mean the tools or the test?
22:25 imirkin_: there's been talk, but i'm not aware of anyone actually runnign i
22:25 imirkin_: it*
22:26 karolherbst: intel runs them afaik
22:38 rhyskidd: the tests, given they exercise the drm interfaces --- i can see some works been going on to definitely make them useful beyond i965 hw
22:38 rhyskidd: yeh, they run them on their CI
22:39 rhyskidd: and anholt had a fork running on vc4 which was apparently helpful finding weird corner cases
22:40 karolherbst: well I don't think there is so much intel specific code in there actually
22:40 karolherbst: I am sure they can be ran on nouveau in less than a day
22:40 rhyskidd: hrmm
22:41 karolherbst: they actually want to rename i
22:41 karolherbst: t
22:41 rhyskidd: just to igt?
22:41 karolherbst: well, intel shouldn't be inside the name anymore
22:41 karolherbst: maybe it will be igt, where i doesn't stand for intel
22:54 imirkin_: it's a lot of non-intel-specific tests, in addition to some intel-specific ones
22:54 imirkin_: it does seem like the sort of thing that should be mostly plug & play for nouveau
22:55 rhyskidd: that's my thinking too
22:55 imirkin_: of course a lot of the automation apparently relies on being able to read out crc's for the scanned out fb
22:56 imirkin_: i suspect nvidia hw from the semi-modern era has this
22:56 imirkin_: but it'll take some snooping to find it
22:57 imirkin_: somewhere in the SOR/PIOR registers
22:57 imirkin_: or crtc