03:45 rhyskidd: skeggsb: thanks for the comment on the DP code
03:45 rhyskidd: probably should try to get myself a copy of the current DP spec
11:04 karolherbst: pmoreau: nice, struct with global pointers example works :) https://github.com/karolherbst/HMM-examples/blob/master/fma_struct_ptrs.cl https://github.com/karolherbst/HMM-examples/blob/master/fma_struct_ptrs.h https://github.com/karolherbst/HMM-examples/blob/master/fma_struct_ptrs.cpp
11:04 karolherbst: pmoreau: image doing this without SVM :p
11:04 karolherbst: *imaging
11:05 pmoreau: Cool! I gave up on getting something to show up on my computer. :-/ I’ll try again over the weekend.
11:05 karolherbst: you need my _v3 branch for this
11:05 karolherbst: and glisses HMM kernel
11:06 karolherbst: pmoreau: git://people.freedesktop.org/~glisse/linux nouveau-hmm
11:06 pmoreau: Okay, and a Maxwell+ card for this as well?
11:07 pmoreau: How can you be using constexpr but stil sticking with srand()? oO
11:07 pmoreau: *still
11:07 karolherbst: no clue :D
11:08 karolherbst: mhh
11:08 pmoreau: Also, you are leaking memory! std::unique_ptr to the rescue :-D
11:08 karolherbst: I have no idea on what chipsets that HMM stuff works
11:08 karolherbst: I test on pascal
11:08 karolherbst: pmoreau: meh..
11:08 pmoreau: I didn’t know there was a std::fma
11:08 karolherbst: those ns until the resources are freed anyway
11:09 karolherbst: pmoreau: a memory leak is if you overwrite a pointer before calling free ;)
11:10 pmoreau: Well, not calling free at all is not better
11:10 karolherbst: well
11:10 karolherbst: freeing directly before returning from main is kind of pointless though
11:11 karolherbst: I mean sure, the process could get stuck somewhere
11:11 karolherbst: but this isn't production code either
11:11 karolherbst: so, meh
11:11 pmoreau: Anyway, nice example
11:11 karolherbst: yeah
11:11 karolherbst: I tried some linked list thing
11:11 karolherbst: but this just blows up spirv_to_nir
11:11 karolherbst: and llvm-spirv
11:12 karolherbst: global struct pointers are kind of not working
11:12 karolherbst: within structs
14:14 Subv: do maxwell vertex shaders always access their input attributes' elements as if they were 32 bits, regardless of the actual input data size?
14:14 Subv: i have a shader with an input attr of gf100_vtxelt_size 8_8_8_8, but the shader accesses the elements with ld b32 $r4 a[0x90] 0x0 and so on
14:16 karolherbst: Subv: registers are 32 bit
14:16 karolherbst: allthough I am sure you can do 64 bit loads from a[] space
14:17 imirkin: Subv: the vertex data fetcher converts into 32-bit elements
14:17 karolherbst: but only if you also read a[0x94] this makes actually sense
14:17 imirkin: so if it's 8888_UNORM it'll convert into 4x fp32
14:17 imirkin: if it's 8888_UINT it'll convert into 4x u32
14:18 imirkin: there MAY be an option to disable the conversion... i think there might be for textures, at least. but i've never played around with such a thing.
14:18 karolherbst: imirkin: there wouldn't be much of a benefit to that, or would there?
14:19 Subv: i see, thanks
14:19 imirkin: can't really think of one, but that doesn't mean there isn't one :)
14:20 karolherbst: :)
15:44 karolherbst: imirkin: do you know if there are some differences in CPU vs Nvidia GPU fma?
15:45 karolherbst: just super rare or basically the same if the GPU fma is configured the right way
15:45 imirkin_: things like mul are well-defined by ieee754
15:45 imirkin_: i'm not 100% sure that fma is equally well-defined
15:45 imirkin_: long story short - i don't know :)
15:45 karolherbst: well I am testing with 1k random values and can == compare the CPU and GPU result
15:45 imirkin_: i suspect they should be equivalent, assuming identical rounding settings
15:45 imirkin_: but same could be set of "mul" as well
15:46 imirkin_: said*
15:46 karolherbst: okay
16:07 lachs0r: https://0x0.st/sBN8.png oops. this happened after resuming from suspend (g218)
16:07 lachs0r: lots of errors like nouveau 0000:03:00.0: fb: trapped write at 006008f9b0 on channel 2 [1fb9e000 DRM] engine 0d [PCE0] client 13 [] subclient 02 [] reason 00000000 [PT_NOT_PRESENT]
16:08 lachs0r: but the system doesn’t seem to be unstable (so this is still better than nvidia’s proprietary driver)
16:11 karolherbst: pmoreau: is it even possible to do the fma_struct_ptr example without SVM?
16:11 karolherbst: pmoreau: afaik global pointers inside structs can't be filled
16:11 karolherbst: except by other kernels
16:12 imirkin_: lachs0r: PCE0 = copy engine 0. this is used for background copies by the ttm to move things in and out of vram.
16:12 imirkin_: skeggsb: --^
16:13 imirkin_: some kind of race between ttm and PTE population on resume?
16:14 imirkin_: [on a GT218 by the sounds of it... so not PCE0, that's just the message lying. probably PCOPY, same general idea though.]
16:14 imirkin_: lachs0r: did it fix itself after it had a chance to redraw?
16:17 lachs0r: imirkin_: no but restarting xorg did the trick… except now the kworker stuff is back and spams the kernel log with [20348.788272] [drm] Got external EDID base block and 1 extension from "edid" for connector "HDMI-A-1"
16:17 imirkin_: can't win 'em all
16:17 lachs0r: :D
16:19 lachs0r: oh looks like it settled
17:05 pendingchaos: imirkin_: I think I've finished implementing GL_NV_conservative_raster, GL_NV_conservative_raster_dilate and GL_NV_conservative_raster_pre_snap_triangles
17:05 pendingchaos: I also have tests for the three extensions though the GL_NV_conservative_raster test fails with the blob on a few subtests
17:05 pendingchaos: should I start composing patches for piglit and mesa?
17:06 imirkin_: do you understand why your test fails on blob?
17:06 imirkin_: is it a bug in their thing, or a misunderstanding on your part?
17:06 imirkin_: (or you don't know)
17:07 imirkin_: either way, yes, definitely put patches together for mesa :)
17:07 pendingchaos: It fails because the blob moves the vertices slightly
17:07 pendingchaos: it seems unrelated to conservative rasterization as results for one subtest differ from nouveau without it enabled
17:07 pendingchaos: I think I mentioned it sometime earlier
17:08 imirkin_: it always feels a bit ... wrong ... that the same person writes the tests and the implementation, but that's how it tends to happen in mesa
17:08 imirkin_: yeah, you did mention
17:08 imirkin_: you still don't know why it happens?
17:08 imirkin_: if you have a test that's sensitive to it, might be interesting to see what other hw does with it
17:08 pendingchaos: no, i don't
17:08 imirkin_: if you publish the test (or even just mail it out) you could request other people to provide reuslts on their hw
17:08 pendingchaos: the blob also seems to render it differently with an older version
17:09 imirkin_: huh. maybe they messed something up?
17:09 imirkin_: but it's not like i know of some "randomly shift vertices over" feature either
17:10 pendingchaos: how should the code be split into patches (e.g. each extension with it's own patch or one patch for all three)?
17:10 imirkin_: generally it's split as (a) add core bits for extension (b) add gallium APIs / caps for extension (c) modify st/mesa to make use of these apis, and finally (d) modify driver to support new thing
17:11 pendingchaos: It should be simple to create a file usable with piglit's shader_runner that replicates one of the subtests that fails
17:11 pendingchaos: i mean how is the work split into threads on the mailing list
17:11 imirkin_: ah cool. that should make it easy to get people to think about it
17:12 imirkin_: well, once you have the patches, git format-patch --cover-letter origin.. and edit and send
17:12 imirkin_: as for doing the exts separately vs together, if they modify the same bits, then do them together
17:16 pendingchaos: ok, thanks
17:19 karolherbst: pendingchaos: I have no idea if anybody (including me) asked you if you are a student and might be interested in GSoC/EvoC. Seems like you are capable enough to do it :p
17:23 pendingchaos: I'm not a student
17:26 karolherbst: k, I hope I won't forget and ask again :D
17:30 karolherbst: pmoreau: as I understand stuff, every driver needs to support CL_DEVICE_SVM_COARSE_GRAIN_BUFFER for OpenCL 2.0, right?
17:30 karolherbst: it isn't enough to just do CL_DEVICE_SVM_FINE_GRAIN_SYSTEM
17:31 karolherbst: I am sure we could do noops for most of the functions and have trivial svmAlloc implementation doing malloc
17:31 karolherbst: maybe
17:31 karolherbst: not quite sure, wondering what you think about that?
19:15 moben: Is reclocking supposed to work on nva8 by now? xorg locks up here when I try it :(
19:15 moben: dmesg | grep nouveau: https://paste.pound-python.org/raw/WyBrSjtIORjdnMS4gKTr/
19:16 imirkin_: yes, it is.
19:16 imirkin_: obviously something gets messed up though
19:16 imirkin_: does this happen with all perf levels?
19:18 moben: 03 and 07 seemed to work but I didn't test as heavily because I wanted more fps for them games. Can try again though
19:18 moben: 0f locks up
19:18 imirkin_: and nothing in between presumably
19:18 imirkin_: the actual messages right after reclock would be useful
19:18 imirkin_: what you have shown are not that interesting
19:18 imirkin_: it's just genero pile-o-fail type messages
19:19 moben: any additional debugging options I could/should turn on?
19:19 karolherbst: moben: could you cat the pstate file after reclocking? sometimes the set clocks are total garbage compared to what nouveau tried to set
19:20 karolherbst: I saw weird stuff like that on some Tesla cards actually
19:21 karolherbst: moben: otherwise creating a mmiotrace with the nvidia driver might help
19:21 karolherbst: and having your vbios
19:21 moben: karolherbst: https://paste.pound-python.org/raw/txAnD7jPdYOT1dRVPDuw/
19:21 karolherbst: this should help us figuring out what we do wrong
19:21 karolherbst: okay, at least that seems fine
19:21 moben: karolherbst: vbios is here, https://bugs.freedesktop.org/attachment.cgi?id=112241 (from an earlier report)
19:22 moben: this is a mobile Quadro NVS 3100M btw
19:23 imirkin_: i think tesla reclocking has had a lot less testing than kepler, unfortunately
19:23 imirkin_: and regressions can sneak in too
19:25 karolherbst: imirkin_: the PM_Mode table is kind of weird
19:25 imirkin_: you know about 100x more about reclocking than i do
19:25 karolherbst: engine entry 7/8 are set to 0MHz for 0x3 and 0x5, but to some clocks at 0xf
19:25 karolherbst: any idea what that's about?
19:26 imirkin_: all i know about it is that it has something to do with clocks.
19:26 imirkin_: ;)
19:26 karolherbst: :D
19:26 karolherbst: k
19:26 karolherbst: moben: so yeah, mmiotraces should help
19:27 karolherbst: moben: one with nouveau with setting marks before/after the reclock
19:27 karolherbst: and one with nvidia where you check via nvidia-settings that it drops to the lowest clocks and clock to max from there
19:27 karolherbst: there is a performance mode you set to max perf or something
19:28 karolherbst: it is usually set to adaptive
19:28 karolherbst: moben: what kernel are you on by the way?
19:28 moben: alright, I'll try to get the nvidia driver running
19:29 moben: karolherbst: 4.14.27
19:29 karolherbst: that is reasonable new enough
19:30 moben: I also have a /sys/kernel/debug/dri/128 in addition to /sys/kernel/debug/dri/0, both have the same 'name' but only the latter has outputs. Idk if that is expected, seemed weird to me
19:30 karolherbst: wow, my mmiotrace fix landed in 4.14.22 already
19:30 imirkin_: moben: expected
19:31 imirkin_: it's based on dri nodes, and there's a card0 and renderD128
21:44 imirkin_: pendingchaos: btw, if you wanted to beat your head against the wall but implement a potentially useful thing, you'd figure out how to get ARB_transform_feedback_overflow_query implemented
21:44 imirkin_: the trouble is that the hw provides the per-stream queries, but there is no way (i know of) for the non-per-stream version
21:45 imirkin_: so one needs to do something clever. i haven't quite figured out what that might be.
21:45 imirkin_: the reason this is useful is that it's part of GL 4.6 core features
21:47 pmoreau: karolherbst: Right, that’s my understanding of the spec. (re: “every driver needs to support CL_DEVICE_SVM_COARSE_GRAIN_BUFFER for OpenCL 2.0”
21:58 karolherbst: pmoreau: k, thanks for confirming
21:59 karolherbst: pmoreau: so we can just to clSVMalloc -> malloc though and clSVMCopy clEnqueueSVMMemcpy a simply memcpy or something
21:59 karolherbst: *do
21:59 karolherbst: though, right?
22:01 pmoreau: You might be able to get away with it like that
22:01 pmoreau: (and have the map/unmap operations be nops)
22:02 HdkR: When working with Nvidia hardware, crying is always an option. :)
22:02 HdkR: Whoa, my scrollback was way gone
22:02 pmoreau: :-D
22:03 HdkR: Still relevant to current discussion ;)
22:04 imirkin_: every discussion, really
22:06 lachs0r: seems like it’s going to be the only option if their proprietary drivers keep getting worse at the current pace
22:07 lachs0r: so many issues recently, it’s not even funny
22:07 lachs0r: soon it’s going to be the new fglrx
22:22 karolherbst: pmoreau: yeah, that was my thought. But that means we need more complex CAPs in gallium, so that devices can say: I implement SVM 1, 2 and 3, so I don't want to map all ops to 3
22:22 karolherbst: and if a device can only do 3, we emulate everything
22:22 karolherbst: or do noops
22:23 karolherbst: in the end it only comes down to the application rather using clEnqueueSVMMemcpy than writing into the result of svmAlloc/malloc directly
22:24 karolherbst: pmoreau: the enqueue functions are a bit hard to deal with though. There is clEnqueueSVMFree and clSVMFree :(
22:24 karolherbst: and I guess we have to still keep the ordering with the enqueue functions
22:25 karolherbst: so we would end up doing two copies with clEnqueueSVMMemcpy, because it could refer to stack memory
22:25 karolherbst: stack memory is a big problem anyway with those enqueue commands
22:49 moben: karolherbst: sent nouveau and nvidia mmiotrace, let me know if I fucked it up :)
22:50 karolherbst: moben: where to?
22:55 moben: "mmio dot dumps at gmail dot com" as the wiki says
22:55 karolherbst: ahh okay
22:55 karolherbst: I will look into that tomorrow
22:55 karolherbst: quite late here
22:56 moben: same here. thank you for taking a look
23:13 pendingchaos: imirkin_: patches sent. this was the first time I ever send patches on a mailing list. hopefully I got it right
23:13 pendingchaos: I might try implementing ARB_transform_feedback_overflow_query sometime
23:13 imirkin_: pendingchaos: missing a name
23:13 imirkin_: (actually i dunno what mesa's policy is on pseudonyms)
23:14 imirkin_: looks like you got it mostly right though, well done :)
23:14 pendingchaos: missing one where?
23:14 imirkin_: on the patches.
23:14 skeggsb:wonders how secure that firmware method is
23:14 imirkin_: and in the copyright header... unless your legal name is 'PendingChaos'
23:15 skeggsb: that looks like a register number being poked in there
23:15 imirkin_: skeggsb: we have their firmware, we could look
23:15 pendingchaos: ah forgot about the copyright thing
23:15 skeggsb: i don't feel like reading falcon assembly today :P
23:15 pendingchaos: I wasn't sure if it was meant to be one's legal name or not so I left it for later
23:15 imirkin_: or you represent PendingChaos LLC or Inc or Corp ;)
23:15 imirkin_: (or Plc or GmbH or whatever)
23:16 pendingchaos: I don't think I've ever seen anyone in the mailing list use pseudonyms? though that does not mean they are not meant to be used
23:17 imirkin_: i haven't looked in great detail, but your patches look fine at first glance. i'll have a more thorough look later.
23:17 imirkin_: copyrights can't be assigned to pseudonyms
23:17 imirkin_: i'm also not aware of any active mesa contributor to use a pseudonym
23:18 gyroninja: imirkin_: >copyrights can't be assigned to pseudonyms
23:18 gyroninja: Source?
23:18 imirkin_: gyroninja: my general understanding of copyrights. nothing specific. also, IANAL
23:19 imirkin_: all ownership tends to be in terms of legal entities
23:19 imirkin_: so nothign wrong with assigning copyright to Secret Illegal Offshore Holdings Corp, as long as it's registered
23:20 imirkin_: pendingchaos: and yes, i think that method is a mask method. i.e. the equivalent of nvkm_mask() in the kernel
23:21 skeggsb: is that extension something earlier GPUs would support too btw? (assuming we implemented the method in our FECS firmware?)
23:21 imirkin_: no
23:21 imirkin_: but the FIRMWARE calls are
23:21 skeggsb: ok, just curious :)
23:22 imirkin_: and i've seen blob do firmware calls for some stuff
23:22 imirkin_: on earlier hw
23:22 skeggsb: yes, i know that, just curious about that particular feature
23:22 imirkin_: ah yeah. that one's GM200+
23:22 pendingchaos: seems relevant: https://law.stackexchange.com/questions/10910/how-to-express-copyright-when-you-use-a-pen-name
23:23 gyroninja: https://www.copyright.gov/fls/fl101.pdf
23:23 gyroninja: >You can use a pseudonym for the claimant name
23:24 skeggsb: this has been tested on nouveau btw?
23:24 skeggsb: nvgpu code shows their gr setup providing an "access map" to FECS for regs allowed to be touched
23:24 skeggsb: we don't do such a thing
23:24 skeggsb: perhaps there's a default list in the fw image already though
23:24 skeggsb: oh, there's also an "allow_all" option
23:26 skeggsb: yeah, we'll be in the "allow all" mode from what i can tell
23:29 imirkin_: "But be aware that if a copyright is held under a fictitious name, business dealings involving the copyrighted property may raise questions about its ownership"
23:29 imirkin_: having bits of mesa copyrighted in such a way is likely to raise eyebrows
23:34 imirkin_: wow, you even covered displaylists. impressive.
23:43 pendingchaos: how does one reply to a message on a mailing list? does one simply reply to a message like with normal email?
23:43 imirkin_: i tend to do reply-to-all
23:43 imirkin_: but yeah
23:46 imirkin_: and you even hit attribs. that's the easiest one to forget :)
23:51 imirkin_: pendingchaos: others can test for you if you don't have easy access to hw
23:51 imirkin_: don't forget to mail the piglits (piglit@lists.freedesktop.org)
23:52 pendingchaos: I'll start composing patches for piglit tomorrow (it's getting rather late here)
23:52 imirkin_: ah ok. i figured you had them
23:53 imirkin_: anyways, very strong initial patchset. very well done imo
23:54 pendingchaos: thanks
23:55 imirkin_: ah crap. you're right - i didn't see your no_error variants.
23:55 imirkin_: they all looked so similar, my eyes glazed over them