03:45rhyskidd: skeggsb: thanks for the comment on the DP code
03:45rhyskidd: probably should try to get myself a copy of the current DP spec
11:04karolherbst: pmoreau: nice, struct with global pointers example works :) https://github.com/karolherbst/HMM-examples/blob/master/fma_struct_ptrs.cl https://github.com/karolherbst/HMM-examples/blob/master/fma_struct_ptrs.h https://github.com/karolherbst/HMM-examples/blob/master/fma_struct_ptrs.cpp
11:04karolherbst: pmoreau: image doing this without SVM :p
11:05pmoreau: Cool! I gave up on getting something to show up on my computer. :-/ I’ll try again over the weekend.
11:05karolherbst: you need my _v3 branch for this
11:05karolherbst: and glisses HMM kernel
11:06karolherbst: pmoreau: git://people.freedesktop.org/~glisse/linux nouveau-hmm
11:06pmoreau: Okay, and a Maxwell+ card for this as well?
11:07pmoreau: How can you be using constexpr but stil sticking with srand()? oO
11:07karolherbst: no clue :D
11:08pmoreau: Also, you are leaking memory! std::unique_ptr to the rescue :-D
11:08karolherbst: I have no idea on what chipsets that HMM stuff works
11:08karolherbst: I test on pascal
11:08karolherbst: pmoreau: meh..
11:08pmoreau: I didn’t know there was a std::fma
11:08karolherbst: those ns until the resources are freed anyway
11:09karolherbst: pmoreau: a memory leak is if you overwrite a pointer before calling free ;)
11:10pmoreau: Well, not calling free at all is not better
11:10karolherbst: freeing directly before returning from main is kind of pointless though
11:11karolherbst: I mean sure, the process could get stuck somewhere
11:11karolherbst: but this isn't production code either
11:11karolherbst: so, meh
11:11pmoreau: Anyway, nice example
11:11karolherbst: I tried some linked list thing
11:11karolherbst: but this just blows up spirv_to_nir
11:11karolherbst: and llvm-spirv
11:12karolherbst: global struct pointers are kind of not working
11:12karolherbst: within structs
14:14Subv: do maxwell vertex shaders always access their input attributes' elements as if they were 32 bits, regardless of the actual input data size?
14:14Subv: i have a shader with an input attr of gf100_vtxelt_size 8_8_8_8, but the shader accesses the elements with ld b32 $r4 a[0x90] 0x0 and so on
14:16karolherbst: Subv: registers are 32 bit
14:16karolherbst: allthough I am sure you can do 64 bit loads from a space
14:17imirkin: Subv: the vertex data fetcher converts into 32-bit elements
14:17karolherbst: but only if you also read a[0x94] this makes actually sense
14:17imirkin: so if it's 8888_UNORM it'll convert into 4x fp32
14:17imirkin: if it's 8888_UINT it'll convert into 4x u32
14:18imirkin: there MAY be an option to disable the conversion... i think there might be for textures, at least. but i've never played around with such a thing.
14:18karolherbst: imirkin: there wouldn't be much of a benefit to that, or would there?
14:19Subv: i see, thanks
14:19imirkin: can't really think of one, but that doesn't mean there isn't one :)
15:44karolherbst: imirkin: do you know if there are some differences in CPU vs Nvidia GPU fma?
15:45karolherbst: just super rare or basically the same if the GPU fma is configured the right way
15:45imirkin_: things like mul are well-defined by ieee754
15:45imirkin_: i'm not 100% sure that fma is equally well-defined
15:45imirkin_: long story short - i don't know :)
15:45karolherbst: well I am testing with 1k random values and can == compare the CPU and GPU result
15:45imirkin_: i suspect they should be equivalent, assuming identical rounding settings
15:45imirkin_: but same could be set of "mul" as well
16:07lachs0r: https://0x0.st/sBN8.png oops. this happened after resuming from suspend (g218)
16:07lachs0r: lots of errors like nouveau 0000:03:00.0: fb: trapped write at 006008f9b0 on channel 2 [1fb9e000 DRM] engine 0d [PCE0] client 13  subclient 02  reason 00000000 [PT_NOT_PRESENT]
16:08lachs0r: but the system doesn’t seem to be unstable (so this is still better than nvidia’s proprietary driver)
16:11karolherbst: pmoreau: is it even possible to do the fma_struct_ptr example without SVM?
16:11karolherbst: pmoreau: afaik global pointers inside structs can't be filled
16:11karolherbst: except by other kernels
16:12imirkin_: lachs0r: PCE0 = copy engine 0. this is used for background copies by the ttm to move things in and out of vram.
16:12imirkin_: skeggsb: --^
16:13imirkin_: some kind of race between ttm and PTE population on resume?
16:14imirkin_: [on a GT218 by the sounds of it... so not PCE0, that's just the message lying. probably PCOPY, same general idea though.]
16:14imirkin_: lachs0r: did it fix itself after it had a chance to redraw?
16:17lachs0r: imirkin_: no but restarting xorg did the trick… except now the kworker stuff is back and spams the kernel log with [20348.788272] [drm] Got external EDID base block and 1 extension from "edid" for connector "HDMI-A-1"
16:17imirkin_: can't win 'em all
16:19lachs0r: oh looks like it settled
17:05pendingchaos: imirkin_: I think I've finished implementing GL_NV_conservative_raster, GL_NV_conservative_raster_dilate and GL_NV_conservative_raster_pre_snap_triangles
17:05pendingchaos: I also have tests for the three extensions though the GL_NV_conservative_raster test fails with the blob on a few subtests
17:05pendingchaos: should I start composing patches for piglit and mesa?
17:06imirkin_: do you understand why your test fails on blob?
17:06imirkin_: is it a bug in their thing, or a misunderstanding on your part?
17:06imirkin_: (or you don't know)
17:07imirkin_: either way, yes, definitely put patches together for mesa :)
17:07pendingchaos: It fails because the blob moves the vertices slightly
17:07pendingchaos: it seems unrelated to conservative rasterization as results for one subtest differ from nouveau without it enabled
17:07pendingchaos: I think I mentioned it sometime earlier
17:08imirkin_: it always feels a bit ... wrong ... that the same person writes the tests and the implementation, but that's how it tends to happen in mesa
17:08imirkin_: yeah, you did mention
17:08imirkin_: you still don't know why it happens?
17:08imirkin_: if you have a test that's sensitive to it, might be interesting to see what other hw does with it
17:08pendingchaos: no, i don't
17:08imirkin_: if you publish the test (or even just mail it out) you could request other people to provide reuslts on their hw
17:08pendingchaos: the blob also seems to render it differently with an older version
17:09imirkin_: huh. maybe they messed something up?
17:09imirkin_: but it's not like i know of some "randomly shift vertices over" feature either
17:10pendingchaos: how should the code be split into patches (e.g. each extension with it's own patch or one patch for all three)?
17:10imirkin_: generally it's split as (a) add core bits for extension (b) add gallium APIs / caps for extension (c) modify st/mesa to make use of these apis, and finally (d) modify driver to support new thing
17:11pendingchaos: It should be simple to create a file usable with piglit's shader_runner that replicates one of the subtests that fails
17:11pendingchaos: i mean how is the work split into threads on the mailing list
17:11imirkin_: ah cool. that should make it easy to get people to think about it
17:12imirkin_: well, once you have the patches, git format-patch --cover-letter origin.. and edit and send
17:12imirkin_: as for doing the exts separately vs together, if they modify the same bits, then do them together
17:16pendingchaos: ok, thanks
17:19karolherbst: pendingchaos: I have no idea if anybody (including me) asked you if you are a student and might be interested in GSoC/EvoC. Seems like you are capable enough to do it :p
17:23pendingchaos: I'm not a student
17:26karolherbst: k, I hope I won't forget and ask again :D
17:30karolherbst: pmoreau: as I understand stuff, every driver needs to support CL_DEVICE_SVM_COARSE_GRAIN_BUFFER for OpenCL 2.0, right?
17:30karolherbst: it isn't enough to just do CL_DEVICE_SVM_FINE_GRAIN_SYSTEM
17:31karolherbst: I am sure we could do noops for most of the functions and have trivial svmAlloc implementation doing malloc
17:31karolherbst: not quite sure, wondering what you think about that?
19:15moben: Is reclocking supposed to work on nva8 by now? xorg locks up here when I try it :(
19:15moben: dmesg | grep nouveau: https://paste.pound-python.org/raw/WyBrSjtIORjdnMS4gKTr/
19:16imirkin_: yes, it is.
19:16imirkin_: obviously something gets messed up though
19:16imirkin_: does this happen with all perf levels?
19:18moben: 03 and 07 seemed to work but I didn't test as heavily because I wanted more fps for them games. Can try again though
19:18moben: 0f locks up
19:18imirkin_: and nothing in between presumably
19:18imirkin_: the actual messages right after reclock would be useful
19:18imirkin_: what you have shown are not that interesting
19:18imirkin_: it's just genero pile-o-fail type messages
19:19moben: any additional debugging options I could/should turn on?
19:19karolherbst: moben: could you cat the pstate file after reclocking? sometimes the set clocks are total garbage compared to what nouveau tried to set
19:20karolherbst: I saw weird stuff like that on some Tesla cards actually
19:21karolherbst: moben: otherwise creating a mmiotrace with the nvidia driver might help
19:21karolherbst: and having your vbios
19:21moben: karolherbst: https://paste.pound-python.org/raw/txAnD7jPdYOT1dRVPDuw/
19:21karolherbst: this should help us figuring out what we do wrong
19:21karolherbst: okay, at least that seems fine
19:21moben: karolherbst: vbios is here, https://bugs.freedesktop.org/attachment.cgi?id=112241 (from an earlier report)
19:22moben: this is a mobile Quadro NVS 3100M btw
19:23imirkin_: i think tesla reclocking has had a lot less testing than kepler, unfortunately
19:23imirkin_: and regressions can sneak in too
19:25karolherbst: imirkin_: the PM_Mode table is kind of weird
19:25imirkin_: you know about 100x more about reclocking than i do
19:25karolherbst: engine entry 7/8 are set to 0MHz for 0x3 and 0x5, but to some clocks at 0xf
19:25karolherbst: any idea what that's about?
19:26imirkin_: all i know about it is that it has something to do with clocks.
19:26karolherbst: moben: so yeah, mmiotraces should help
19:27karolherbst: moben: one with nouveau with setting marks before/after the reclock
19:27karolherbst: and one with nvidia where you check via nvidia-settings that it drops to the lowest clocks and clock to max from there
19:27karolherbst: there is a performance mode you set to max perf or something
19:28karolherbst: it is usually set to adaptive
19:28karolherbst: moben: what kernel are you on by the way?
19:28moben: alright, I'll try to get the nvidia driver running
19:29moben: karolherbst: 4.14.27
19:29karolherbst: that is reasonable new enough
19:30moben: I also have a /sys/kernel/debug/dri/128 in addition to /sys/kernel/debug/dri/0, both have the same 'name' but only the latter has outputs. Idk if that is expected, seemed weird to me
19:30karolherbst: wow, my mmiotrace fix landed in 4.14.22 already
19:30imirkin_: moben: expected
19:31imirkin_: it's based on dri nodes, and there's a card0 and renderD128
21:44imirkin_: pendingchaos: btw, if you wanted to beat your head against the wall but implement a potentially useful thing, you'd figure out how to get ARB_transform_feedback_overflow_query implemented
21:44imirkin_: the trouble is that the hw provides the per-stream queries, but there is no way (i know of) for the non-per-stream version
21:45imirkin_: so one needs to do something clever. i haven't quite figured out what that might be.
21:45imirkin_: the reason this is useful is that it's part of GL 4.6 core features
21:47pmoreau: karolherbst: Right, that’s my understanding of the spec. (re: “every driver needs to support CL_DEVICE_SVM_COARSE_GRAIN_BUFFER for OpenCL 2.0”
21:58karolherbst: pmoreau: k, thanks for confirming
21:59karolherbst: pmoreau: so we can just to clSVMalloc -> malloc though and clSVMCopy clEnqueueSVMMemcpy a simply memcpy or something
21:59karolherbst: though, right?
22:01pmoreau: You might be able to get away with it like that
22:01pmoreau: (and have the map/unmap operations be nops)
22:02HdkR: When working with Nvidia hardware, crying is always an option. :)
22:02HdkR: Whoa, my scrollback was way gone
22:03HdkR: Still relevant to current discussion ;)
22:04imirkin_: every discussion, really
22:06lachs0r: seems like it’s going to be the only option if their proprietary drivers keep getting worse at the current pace
22:07lachs0r: so many issues recently, it’s not even funny
22:07lachs0r: soon it’s going to be the new fglrx
22:22karolherbst: pmoreau: yeah, that was my thought. But that means we need more complex CAPs in gallium, so that devices can say: I implement SVM 1, 2 and 3, so I don't want to map all ops to 3
22:22karolherbst: and if a device can only do 3, we emulate everything
22:22karolherbst: or do noops
22:23karolherbst: in the end it only comes down to the application rather using clEnqueueSVMMemcpy than writing into the result of svmAlloc/malloc directly
22:24karolherbst: pmoreau: the enqueue functions are a bit hard to deal with though. There is clEnqueueSVMFree and clSVMFree :(
22:24karolherbst: and I guess we have to still keep the ordering with the enqueue functions
22:25karolherbst: so we would end up doing two copies with clEnqueueSVMMemcpy, because it could refer to stack memory
22:25karolherbst: stack memory is a big problem anyway with those enqueue commands
22:49moben: karolherbst: sent nouveau and nvidia mmiotrace, let me know if I fucked it up :)
22:50karolherbst: moben: where to?
22:55moben: "mmio dot dumps at gmail dot com" as the wiki says
22:55karolherbst: ahh okay
22:55karolherbst: I will look into that tomorrow
22:55karolherbst: quite late here
22:56moben: same here. thank you for taking a look
23:13pendingchaos: imirkin_: patches sent. this was the first time I ever send patches on a mailing list. hopefully I got it right
23:13pendingchaos: I might try implementing ARB_transform_feedback_overflow_query sometime
23:13imirkin_: pendingchaos: missing a name
23:13imirkin_: (actually i dunno what mesa's policy is on pseudonyms)
23:14imirkin_: looks like you got it mostly right though, well done :)
23:14pendingchaos: missing one where?
23:14imirkin_: on the patches.
23:14skeggsb:wonders how secure that firmware method is
23:14imirkin_: and in the copyright header... unless your legal name is 'PendingChaos'
23:15skeggsb: that looks like a register number being poked in there
23:15imirkin_: skeggsb: we have their firmware, we could look
23:15pendingchaos: ah forgot about the copyright thing
23:15skeggsb: i don't feel like reading falcon assembly today :P
23:15pendingchaos: I wasn't sure if it was meant to be one's legal name or not so I left it for later
23:15imirkin_: or you represent PendingChaos LLC or Inc or Corp ;)
23:15imirkin_: (or Plc or GmbH or whatever)
23:16pendingchaos: I don't think I've ever seen anyone in the mailing list use pseudonyms? though that does not mean they are not meant to be used
23:17imirkin_: i haven't looked in great detail, but your patches look fine at first glance. i'll have a more thorough look later.
23:17imirkin_: copyrights can't be assigned to pseudonyms
23:17imirkin_: i'm also not aware of any active mesa contributor to use a pseudonym
23:18gyroninja: imirkin_: >copyrights can't be assigned to pseudonyms
23:18imirkin_: gyroninja: my general understanding of copyrights. nothing specific. also, IANAL
23:19imirkin_: all ownership tends to be in terms of legal entities
23:19imirkin_: so nothign wrong with assigning copyright to Secret Illegal Offshore Holdings Corp, as long as it's registered
23:20imirkin_: pendingchaos: and yes, i think that method is a mask method. i.e. the equivalent of nvkm_mask() in the kernel
23:21skeggsb: is that extension something earlier GPUs would support too btw? (assuming we implemented the method in our FECS firmware?)
23:21imirkin_: but the FIRMWARE calls are
23:21skeggsb: ok, just curious :)
23:22imirkin_: and i've seen blob do firmware calls for some stuff
23:22imirkin_: on earlier hw
23:22skeggsb: yes, i know that, just curious about that particular feature
23:22imirkin_: ah yeah. that one's GM200+
23:22pendingchaos: seems relevant: https://law.stackexchange.com/questions/10910/how-to-express-copyright-when-you-use-a-pen-name
23:23gyroninja: >You can use a pseudonym for the claimant name
23:24skeggsb: this has been tested on nouveau btw?
23:24skeggsb: nvgpu code shows their gr setup providing an "access map" to FECS for regs allowed to be touched
23:24skeggsb: we don't do such a thing
23:24skeggsb: perhaps there's a default list in the fw image already though
23:24skeggsb: oh, there's also an "allow_all" option
23:26skeggsb: yeah, we'll be in the "allow all" mode from what i can tell
23:29imirkin_: "But be aware that if a copyright is held under a fictitious name, business dealings involving the copyrighted property may raise questions about its ownership"
23:29imirkin_: having bits of mesa copyrighted in such a way is likely to raise eyebrows
23:34imirkin_: wow, you even covered displaylists. impressive.
23:43pendingchaos: how does one reply to a message on a mailing list? does one simply reply to a message like with normal email?
23:43imirkin_: i tend to do reply-to-all
23:43imirkin_: but yeah
23:46imirkin_: and you even hit attribs. that's the easiest one to forget :)
23:51imirkin_: pendingchaos: others can test for you if you don't have easy access to hw
23:51imirkin_: don't forget to mail the piglits (firstname.lastname@example.org)
23:52pendingchaos: I'll start composing patches for piglit tomorrow (it's getting rather late here)
23:52imirkin_: ah ok. i figured you had them
23:53imirkin_: anyways, very strong initial patchset. very well done imo
23:55imirkin_: ah crap. you're right - i didn't see your no_error variants.
23:55imirkin_: they all looked so similar, my eyes glazed over them