00:23 imirkin: pmoreau: if you use the MOV variant rather than the LD variant, i think so... LD requires a barrier on maxwell while mov does not.
00:24 pmoreau: Ok
00:24 pmoreau: Is the mov variant available on NV50+ ?
00:31 imirkin: pmoreau: i think so yea
00:31 imirkin: either way, emitMOV() knows what to do
00:32 pmoreau: Oooh, maybe I should use that rather than doing it manually… :-D
00:32 imirkin: there's logic that splits up wide loads as well
00:32 imirkin: iirc it splits into 64-bit chunks for kepler+, and 32-bit chunks for maxwell
00:35 pmoreau: Ah, emitMOV() is not a function to be directly called from a *_from_*.cpp file.
00:36 imirkin: no
00:37 imirkin: and nv50_ir_target_* specifies the splitting level
00:37 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp#n404
00:38 pmoreau: Good to know, thanks!
00:38 pmoreau: Now I can check why the ld were not automatically converted to mov…
00:42 imirkin: e.g. mov can't handle indirect offsets
00:48 pmoreau: I was wrong: Nouveau did convert some of the ld to mov, but I was looking at the last print of the program, which had everytthing as ld. But using envydis on the binary showed some where mov.
00:49 imirkin: right, the selection happens in emitMOV/emitLD
00:49 imirkin: the thing that nouveau prints is the IR, not the actual emitted instructions
00:50 pmoreau: Right
00:52 imirkin: RSpliet: did you ever do more work on fermi reclocking after your little sprint in april or so?
00:53 pmoreau: I could add a pass to split direct ld from cmem into 32-bit chunks, so they can be mov'ed upon emit.
00:58 imirkin: we already have that.
00:59 imirkin: ohhh hrm. maybe not. MemoryOpt normally does it
00:59 imirkin: but it's more about *combining* ld's (or rather, not doing it) based on parameters
00:59 imirkin: rather than *splitting* existing ones
01:55 dboyan_: imirkin: I've found a useful tool to read perf counters: https://github.com/trtt/apitrace
01:56 dboyan_: It can read perf counters per draw call or frame
01:59 dboyan_: So if I have a set of traces, I can compare the numbers by replaying them
04:06 ptx0: does nouveau prevent the system from sleeping?
07:58 hakzsam: dboyan_: because some perf counters should return a floating point value, like metric-ipc but the driver query interface doesn't really support floats
07:59 hakzsam: dboyan_: for documentation, please refer to NVIDIA directly. I used the same names and you can find descriptions in the code (ie. nvc0_query*)
08:46 dboyan_: hakzsam: Thanks, I think I've found some useful information. But the ipc-related values are really strange. Also, some percentage-based values are shown as floats (in modified apitrace) but they are really ints
08:48 hakzsam: yeah, those should be between 0.0 and 1.0 actually like what blob does
08:57 dboyan_: hakzsam: Just curious, is the inability to return proper floats lies within interfaces in mesa (or spec) or rather within nouveau's implementation?
08:58 hakzsam: the HUD IIRC, but it might work with amd_perf_monitor
08:59 dboyan_: I'm actually using AMD_perfmon
08:59 dboyan_: via a modified version of apitrace
09:00 hakzsam: and those queries don't return floats?
09:01 dboyan_: yeah, seems so
09:01 dboyan_: metric-issued_ipc is UINT64, only 0 or 1 can be seen
09:02 dboyan_: metric-issue_slot_utilization is PERCENTAGE (float) but the values are really small, and seems it should have been interpreted as int
09:02 hakzsam: you might want to try to return floats instead, or to multiply by 100
09:03 dboyan_: I'll try when I have access to the test machine
09:05 hakzsam: okay, let me know
11:55 voxadam: Be forewarned, this question is definitely off-topic for #nouveau, it's just that I don't really know anywhere else to ask it. What is it about an older system like something from ~2012 with an X79 type PCH and system firmware of the same vintage that keeps it from successfully booting/posting with a modern GPU such as a recent 9 or 10 series Nvidia based card?
11:56 RSpliet: imirkin: I didn't... waiting for Ben to push his current tree
11:56 voxadam: Well, other than system firmware based blacklists/whitelists.
11:57 RSpliet: (yes, the branch on github is outdated, and I don't want to risk wasting more of my time)
12:33 pmoreau: voxadam: I know I had issues getting my MB at work to accept a Titan X card, until I updated the BIOS. My guess was that the initial firmware got confuse by the amout of VRAM.
14:45 imirkin: voxadam: assuming it's not some super-special-snowflake system, the only things i can think of are maybe the GPU doesn't support PCIe 2.0 (unlikely), or maybe the board isn't supplying enough power (more likely)
14:46 imirkin: i guess if it's EFI boot, there's a lot more things that could go wrong
14:50 spacebug^: I have a 'NVIDIA Geforce GTX 960' card. Currently using NVIDIA closed source driver in Debian Jessie. It does not work with the nouveau driver in Jessie. Could it work in Stretch which uses a newer version of nouveau and libdrm, or is it as I think that I still need that binary blob to run the card?
14:53 pmoreau: spacebug^: To get hardware acceleration, you will need Linux 4.6 (and Mesa 11.2). If using xf86-video-nouveau rather than modesetting, you will need 1.0.14 to get acceleration through EXA.
14:54 pmoreau: You won't be able to reclock it with Nouveau though, as we are still lacking the proper firmwares from NVIDIA to change the fan speed.
14:58 spacebug^: pmoreau: ok. Seems Debian will use 1.0.13-3 in stretch :/ How about hardware decoding?
15:00 pmoreau: spacebug^: No idea… I think it's not supported but I could be wrong
15:01 spacebug^: pmoreau: ok. Thanks.
15:01 spacebug^: I really should just get another card.
15:11 RSpliet: spacebug^: if by other you mean "newer NVIDIA", you'll just get yourself into more trouble
21:07 pmoreau: imirkin: Would you have some insights on what I did wrong, causing RA to be quite confused: https://hastebin.com/paqotariqu.pl (I am loading a struct { int; long; float; char[12] } from cmem to regs, and then storing from regs to gmem.
21:07 pmoreau: (The char[12] is there for padding purposes, and got inserted by the compiler.)
21:08 pmoreau: And all optimisations should be off; I have the same weird results with them on.
21:30 RSpliet: pmoreau: ... padding of 12 bytes? Is it trying to pad to 28-byte boundaries?
21:32 pmoreau: RSpliet: You get { int; implicit_padding_4bytes; long; float; explicit_padding_12bytes }, so 32 bytes total
21:32 RSpliet: ah ok :-)
21:32 RSpliet: So the original shader is trying to reconstruct the individual char bytes, which is silly but oh well
21:32 RSpliet: why is it doing mov u32 %r16 c0[0x20] (0) ?
21:34 pmoreau: Why not? I am not sure what you mean by that
21:34 RSpliet: If the input is a struct of 32-bytes, I'm curious why it needed the constant in c0[0x20]... and why it knows it can be optimised away :-D
21:35 pmoreau: And yes, it is silly to try to reconstruct the individual bytes as they are only there for padding, but they are not marked as such in the SPIR-V (well, SPIR-V does not have a way to express that anyway)
21:36 pmoreau: Oh! Because c0[0x0-0x4] is the gmem store pointer
21:36 pmoreau: The structure itself starts at c0[0x8]
21:37 RSpliet: That makes sense...
21:38 RSpliet: So in the final compiled shader, the last eight chars are eliminated
21:38 RSpliet: by a pass of DeadCodeElim
21:41 pmoreau: That is true, I had missed that
21:41 pmoreau: Mhhh
21:42 pmoreau: Right, because they aren't used, which is a bug in my code…
21:42 RSpliet: Not even for the blind copy to gmem? okay
21:44 pmoreau: The secon store is garbage: it should be coalescing the float and the 12 chars
21:44 pmoreau: *second
21:44 RSpliet: yes, but it isn't
21:45 pmoreau: Yeah…
21:45 RSpliet: if you look at the original shader, line 63, 64, 65, you'll see that it's treating each char as 4 bytes
21:46 RSpliet: wait... no this is weird :-D
21:46 pmoreau: Yes, cause codegen isn't that happy if you give it 1 byte values, and ultimately they will be in 32-bit regs
21:47 RSpliet: but look at your paste line 156, 157, 158
21:47 RSpliet: %r55, %r60 are each a 32-bit reg containing only one of the chars in its low 8 bits
21:48 RSpliet: but instead of merging that into a single 32-bit reg, it's merging the four into a 128-bit consecutive reg
21:48 pmoreau: Yes, I realised that the second store is messed up
21:49 pmoreau: It's a bit complicated, because at that point, from the SPIR-V's pov, I am dealing with a flow of bytes, there are no notions of a structure left.
21:50 RSpliet: is it your SPIR-V -> NV50_IR translation that emits these incorrect merges?
21:51 pmoreau: So the second store being messed up is my fault, but the first one seems fine until RA.
21:51 pmoreau: Yes
21:51 RSpliet: generating a ton of superfluous code is one thing, but correctness would be nice :-)
21:51 RSpliet: ok
21:51 pmoreau: :-p
21:51 pmoreau: Here is the SPIR-V code: https://hastebin.com/labejitehe.pl
21:52 RSpliet: is it really correct?
21:52 pmoreau: The function body is 3 lines: 36, 37, 38
21:52 RSpliet: look at line 72-74 of your hastebin paste
21:52 RSpliet: it merges %r1 and %r40 in line 72... so the padding you had disappeared
21:53 pmoreau: That is… true
21:54 pmoreau: But that shouldn't make Nouveau generate lines 1312, 1313, 1314
21:55 pmoreau: I'll try to fix those errors and see what happens
21:56 pmoreau: Thanks for the help! :-)
21:56 RSpliet: line 1312 makes sense, line 1313 should've been $r2, $3 is mysteriously missing :-D
21:57 pmoreau: Yeah :-D
22:05 pmoreau: Ok: missing alignment fixed (and now the generated first load lokks way better, though I do not get the correct value for the long). Onto the chars
22:08 RSpliet: It might be confused by how two compounds are merged :-P
22:09 pmoreau: Could be
22:25 pmoreau: Mhhhh… I need to pass along some additional data to solve that char issue
22:25 pmoreau: Or literally transform the whole struct in chars stored in 32-bit values
22:27 pmoreau: But I can't handle having some regs being 32-bit ints, and some being chars stored in a whole 32-bit value, while only knowing that they are in theory all chars.
23:58 imirkin: pmoreau: sub-32-bit value support is, at best, untested