00:00 imirkin: also if you could test that viewport_relative is workign as expected, that'd be nice
00:00 imirkin: e.g. if you have something already which would be sensitive to that
00:00 fincs: Heh, that needs me to implement a test that uses layered framebuffer
00:00 imirkin: yea
00:00 imirkin: don't worry about it
00:00 imirkin: i have to write piglits anyways
00:00 fincs: Maybe I should do the cubemap thing once and for all
00:01 fincs: But I am pretty sure that reg controls viewport relative
00:01 fincs: Like, really really sure
00:01 imirkin: i'm sure you're right
00:01 imirkin: but if you already had a program that used it, would be nice to have confirmation
00:01 imirkin: if not, then not
00:02 fincs: I made the blob compile a program without viewport_relative, certain flag was 0; then another program with viewport_relative, flag was 1
00:02 fincs: And then the code reads that flag and puts it into that reg
00:03 imirkin: 20.0-branchpoint happened jan 24, so i'd expect a couple of weeks and then the 20.1 branchpoint happens
00:05 imirkin: fincs: btw, what's the main use-case for switch for this stuff?
00:05 fincs: Using OpenGL and porting everything under the sun to the Switch
00:05 imirkin: someone writes a game in GL for the switch?
00:05 fincs: More like, someone ports an existing game/emulator/etc to the Switch
00:06 imirkin: i.e. is it likely to use those NV_* exts
00:06 fincs: Who knows
00:06 imirkin: fair enough
00:06 imirkin: so same as on nouveau, basically :)
00:06 imirkin: except no one uses nouveau
00:06 fincs: Except everyone uses nouveau on Switch :p
00:06 imirkin: right
00:07 fincs: We even have SDL2 hooked up to this thing
00:19 karolherbst: huh.. where do we allocate the copy class.. mhh
00:21 karolherbst: ohhh there
00:21 karolherbst: magic
00:21 karolherbst: huh
00:22 karolherbst: fincs: now that I look at it.. you copy class binding patch looks like it would be required but.. heh :D
00:22 fincs: ( ͡° ͜ʖ‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌├┬┴┬┴
00:24 imirkin: binding things is no longer required on kepler+ iirc
00:24 imirkin: things have fixed places
00:24 karolherbst: imirkin: what are you saying now about this: https://gist.githubusercontent.com/karolherbst/000ff42968129c5c37b93178b731d420/raw/a828d38d08ea5c2e0bfc348482c57e02e11ea801/gistfile1.txt
00:24 karolherbst: :)
00:25 karolherbst: slowly that tool becomes useful :)
00:25 imirkin: _much_ better
00:25 imirkin: i think adding = in there would make it much better
00:25 karolherbst: ahh yeah
00:25 imirkin: also the whole sub-chN is super-verbose, how about just like [N] or something
00:25 imirkin: try to match what demmt does
00:26 imirkin: i think demmt does it quite well
00:26 karolherbst: https://gist.githubusercontent.com/karolherbst/d25c08f0b08231d1f2696c0241803d9e/raw/ede55c79aa4593ad0b4a4fb4fc1ee5c233d7bf6d/gistfile1.txt
00:27 imirkin: much more readable already, imo
00:27 karolherbst: 7 is SW btw.. still need to add this as well
00:27 imirkin: SW isn't like a defined thing though
00:27 imirkin: i.e. sw methods are ... well ... software
00:27 karolherbst: yeah...
00:27 karolherbst: I doubt rnndb will help us here as we put there the nv stuff, no?
00:28 karolherbst: or do we try to match them?
00:28 imirkin: could
00:28 karolherbst: the class name still misses the color.. how to get this properly done.. mhh
00:29 karolherbst: now the code got a little messy though :/ https://github.com/envytools/envytools/pull/203/files
00:29 karolherbst: I wish there would be an easier way to define the classes
00:29 karolherbst: it's not that bad though
00:30 imirkin: check demmt, you put funny chars in to tell the term to flip colors
00:30 karolherbst: I know.. but it's not easy to extract the added variants...
00:30 karolherbst: I think demmt cheats here a lot
00:30 imirkin: who doesn't like cheating
00:31 karolherbst: ahh yeah
00:31 karolherbst: sprintf(dec_obj, "%s%s%s", colors->rname, obj->desc, colors->reset); :)
00:31 karolherbst: classic
00:31 imirkin: :)
00:34 karolherbst: nice
00:34 karolherbst: that works though
00:35 imirkin: like i said - cheating is best
00:41 karolherbst: why is there no working pastebin tool which supports color codes :D
00:41 karolherbst: the heck
00:41 karolherbst: nvm
00:42 imirkin: there's that terminal capture thing
00:42 karolherbst: mhhh
00:42 imirkin: which presents in a youtube-style player, but it's all text
00:44 imirkin: karolherbst: http://showterm.io/
00:44 karolherbst: mhhh
00:45 karolherbst: https://showterm.io/84f3f1c91e06b2c8e0e03
00:45 karolherbst: mhhh
00:45 karolherbst: dunno if I like this :D
00:46 imirkin: you can scroll around
00:46 imirkin: i think it's fine
00:46 karolherbst: could be better
00:46 imirkin: go to the end (click "stop")
00:46 karolherbst: why doesn't gist just supports that :D
00:46 karolherbst: they support everything else
00:46 imirkin: yea dunno
00:46 imirkin: gah, page up/down don't work
00:48 imirkin: yeah, it's made more for recording terminal
00:48 karolherbst: why is the subchan mapping different in nv50 :D
00:48 karolherbst: how annoying :D
00:48 imirkin: it's different on fermi too
00:48 imirkin: i think
00:48 karolherbst: no
00:48 imirkin: maybe not in any substantial way though
00:48 imirkin: the video stuff is done differently
00:49 karolherbst: nv50/nv50_winsys.h vs nvc0/nvc0_winsys.h
00:49 karolherbst: the SUBC_ macros
00:49 imirkin: ya
00:49 imirkin: ideally you should look for the object bindings
00:49 karolherbst: well
00:49 karolherbst: how :p
00:49 imirkin: i.e. whatever the class that's written to 0 is right
00:49 imirkin: iirc we bind things
00:49 karolherbst: thing is, I also want to be able to use the parser when you trace a game
00:49 karolherbst: and just put in the last pushbuffer
00:49 karolherbst: because you debug a hang
00:49 imirkin: yeah
00:50 karolherbst: so.. I rather just hardcode that stuff
00:50 karolherbst: less trouble
00:50 imirkin: k
00:50 karolherbst: well, the idea is to just dump in the pushbuffer we get on errors
00:50 imirkin: should allow arguments then
00:50 karolherbst: like when submission fails
00:50 imirkin: since blob might do whatever
00:50 karolherbst: no, that's just for nouveau :p
00:51 imirkin: ok
00:51 karolherbst: the idea was to make use of that synced pushbuffer features
00:51 karolherbst: *feature
00:51 karolherbst: so a user can say "FORCE_PB_SYNC=1"
00:51 imirkin: ok
00:51 karolherbst: and the pushbuffer printed on submission failed is the one messing stuff up :p
00:51 karolherbst: maybe it helps, maybe not
00:51 karolherbst: dunno
00:51 karolherbst: just thought that might be useful to have
00:59 karolherbst: ohh there is a gt215 compute class, interesting
01:12 imirkin: karolherbst: and a G80 one...
01:12 imirkin: compute has been around since then
01:12 karolherbst: yeah...
01:12 karolherbst: but we don't really use all of them
01:12 karolherbst: so dunno if it actually makes a difference that much
01:13 karolherbst: but I guess some hw support multiple classes
01:13 karolherbst: but then.. how does the copy stuff even works
01:14 karolherbst: imirkin: so.. we have those NV01_SUBCHAN_OBJECT methods to bind an object to a subchannel, right?
01:14 karolherbst: but we really only bind the COPY one for 0xeX GPUs
01:14 imirkin: we should bind everying...
01:16 karolherbst: imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c#n1133
01:16 karolherbst: but we use NVF0_P2MF_CLASS on newer GPUs
01:16 imirkin: o
01:16 imirkin: that's ... probably an oversight
01:16 karolherbst: yeah..
01:16 karolherbst: why does it work though
01:17 karolherbst: fincs needed to patch that on the switch
01:17 imirkin: because it's probably not necessary on nvf0+
01:17 karolherbst: but.. how does the hw know what's bound on that subchan id
01:17 fincs: https://github.com/devkitPro/mesa/commit/20ca89912d8f234472a0cea2409b826502d0b9de
01:17 imirkin: it's fixed.
01:17 karolherbst: ahh
01:17 fincs: And I just realized the commit msg is wrong, it should be p2mf not m2mf
01:17 karolherbst: I thought it's more or less dynamic
01:17 imirkin: on fermi, yes
01:18 imirkin: on kepler+, no
01:18 karolherbst: ohhh
01:18 karolherbst: interesting
01:18 karolherbst: so why would the nvidia blob care on the switch then
01:18 karolherbst: or the hw or whatever
01:18 fincs: All I know is, without this commit, copy engine stuff is broken
01:18 imirkin: dunno. perhaps there's more to it.
01:18 karolherbst: probably
01:20 karolherbst: imirkin: btw, do we allocate the CP subchan dynamically? don't see it in your trace
01:20 imirkin: i had a lot of stuff commented out =]
01:20 imirkin: including compute.
01:20 karolherbst: ahhh
01:20 karolherbst: I see
03:08 imirkin: fincs: fyi, looks like success with the viewport-relative thing
03:17 imirkin: next stop, passthrough GS.
03:17 imirkin: gotta catch up on other things though, so not for a bit.
03:23 HdkR: Who needs passthrough GS anymore when you can just convert to a mesh shader right? :P
03:30 imirkin: that's a lot more typing.
03:30 imirkin: not to mention, not available everywhere
03:31 HdkR: It's been long enough, it's time to drop support for pre-Turing. Holding back innovative features </s>
03:32 imirkin: =]
03:32 imirkin: we don't even have a compiler for volta+ :)
03:33 imirkin: does TU11x have mesh shaders btw?
03:33 HdkR: yea
03:33 HdkR: TU11x just drops the RT and Tensor bits
03:33 imirkin: RT?
03:33 HdkR: ray tracing
03:33 imirkin: i thought that was mesh
03:33 HdkR: RT != Mesh
03:33 HdkR: They are independent features
03:34 HdkR: There's also questions about how one would use mesh shading with RT. Don't think it has been answered
03:35 HdkR: GPU generated polygons don't exist in the BVH structure, how do you intersect test them?
03:42 imirkin: ah ok
03:42 imirkin: it wasn't clear to me that the raytracing stuff was actual help for raytracing
03:42 imirkin: i never looked into the details
03:43 imirkin: i did see mesh shaders and assumed that was the lot of it
03:43 HdkR: Yea, gives you BVH and triangle intersection testing acceleration
03:43 imirkin: BVH sounds nice
03:45 HdkR: One problem is that it doesn't accelerate BVH generation/updating so RT with super dynamic scenes /hurt/
03:45 imirkin: heh ok
03:46 HdkR: PowerVR had that bit accelerated. Called something like scene conversion
04:29 imirkin: oh annoying. GLES3 requires xfb pause/resume support
04:29 imirkin: which is not available on the first half of the tesla gen
06:23 imirkin: skeggsb: i'm seeing a test which allocates a 4096x4096(x4 bytes) depth texture, and then clears it. then deletes it. then allocates an identical texture, and clears again.
06:23 imirkin: skeggsb: the second clear fails with
06:24 imirkin: fb: trapped write at 002add5500 on channel 2 [0faac000 glcts[20359]] engine 00 [PGRAPH] client 0b [PROP] subclient 08 [ZETA] reason 00000002 [PAGE_NOT_PRESENT]
06:24 imirkin: the address happens to be 0x1555500 from the allocation. which strikes me as an odd number.
06:24 imirkin: as a result of deleting and reallocating the same size texture, the base address stays the same
06:24 imirkin: this is on nv50 btw (well, g84)
06:26 imirkin: hm, actually i lied. the base address does not stay the same. but in other cases it does.
06:27 imirkin: could there be a race with moving bo's into vram?
06:27 imirkin: this is using the CRYPT method for buffer copies
06:28 imirkin: although it seems unlikely that it would have gotten evicted in the first place...
06:28 imirkin: dunno. it has 256MB vram, each one of these textures is 64MB + 64MB for color, yeah ok, i guess it could get tight
09:58 karolherbst: finally Ofast binaries and the CTS is _soooo_ much faster already
10:48 RSpliet: karolherbst: https://patchwork.kernel.org/project/alsa-devel/list/?series=269931
10:48 karolherbst: :)
10:48 karolherbst: do those patches help?
10:50 RSpliet: The top one is mine ;-)
10:50 RSpliet: Yep, they solve the problem
10:51 karolherbst: cool
11:39 fincs: "fyi, looks like success with the viewport-relative thing" <-- :)
11:39 fincs: "when you can just convert to a mesh shader right" <-- Not everyone can afford the luxury of owning a Turing card lmao
11:40 fincs: Plus, I like my Switch (Tegra X1) and I want to make full use of its GPU
11:40 karolherbst: I just upgraded the kernel on my jetson nano to 5.6.3 :)
11:40 karolherbst: but I think there is still plenty of stuff wrong with the dts stuff
11:41 karolherbst: https://gist.githubusercontent.com/karolherbst/332b2eca147b2e5526fb9c0b8ea071ff/raw/01f524217d6fe66e8caea76462a1f7b39ec69e57/gistfile1.txt
11:41 karolherbst: those gpio errors and the therm stuff especially
11:41 fincs: Is that TX1?
11:41 karolherbst: jetson nano .p
11:41 karolherbst: :p
11:41 fincs: "NVIDIA Maxwell™ architecture with 128 NVIDIA CUDA® cores"
11:42 karolherbst: "Machine model: NVIDIA Jetson Nano Developer Kit"
11:42 karolherbst: the jetson nano is soo tiny
11:42 karolherbst: it's essentially just a so-dimm
11:42 fincs: Looks like a Pi going by photos
11:42 karolherbst: smaller
11:42 karolherbst: what you see is the dev board :D
11:42 fincs: Cute heatsink
11:42 karolherbst: under the heatsink is the nano
11:43 karolherbst: it's really just the so-dimm
11:43 fincs: I guess if it's a SoC then it can be really really tiny
11:43 karolherbst: yeah
11:43 karolherbst: the dev board just adds some luxuray like all the prots :p
11:43 karolherbst: *ports
11:44 karolherbst: *luxury
11:44 karolherbst: it even has a mpci slot I think
11:44 karolherbst: uhm
11:44 karolherbst: how is it called
11:44 karolherbst: M.2 Key E
11:44 karolherbst: that's the stuff
11:49 karolherbst: ufff
11:49 karolherbst: X server fails to start
11:51 karolherbst: yeah uhhh.. something is odd with the tegradrm stuff
11:52 karolherbst: ohh shit kernel/dma/swiotlb.c:683
12:44 karolherbst: need to go back to 5.4 due to kernel bugs.. *sigh*
12:46 fincs: Hmm, trying to figure out why upgrading mesa made the final compiled binary grow up by almost 900KB
12:46 karolherbst: libnir?
12:46 fincs: Yeah that's disabled
12:46 karolherbst: mhh
12:47 fincs: I don't see anything obvious in the symbol list file
12:47 karolherbst: probably other stuff, we do add tons of features over time
12:47 karolherbst: what was the version jump?
12:47 fincs: Sure but... 900KB worth of new features?
12:47 fincs: 19.0.8 -> master + the viewport_array2 branch
12:47 karolherbst: mhh, that's a big jump
12:47 fincs: Yeah it is
12:47 karolherbst: there were some 4.6 stuff going on
12:47 karolherbst: especially the spirv stuff
12:47 fincs: spirv stuff is also disabled
12:49 karolherbst: looking through the new features
12:51 karolherbst: fincs: mind checking which .a file increase in size that much? I kind of expect gallium to be the biggest impact here
12:51 fincs: Everything is within libEGL.a :p
12:51 karolherbst: yeah sure.. but you still have the build dir
12:52 fincs: For the old mesa I don't
12:52 fincs: That hasn't survived several OS reinstallations + a computer swap :p
12:52 karolherbst: you can still build the old version I assume :p
12:53 fincs: I have the symbol list file for a binary compiled with the old version of mesa
12:53 karolherbst: find build -iname *.a -exec du -h {} +
12:53 fincs: Maybe I can write a script that checks each symbol and its size with the new symbol list file
12:54 karolherbst: maybe
12:54 karolherbst: sometimes compiler get better
12:54 karolherbst: also to consider
12:54 karolherbst: except you build with Os
12:55 karolherbst: mhh build/src/mesa/libmesa_common.a is the biggest file here
12:55 karolherbst: but that's a debug build
13:01 karolherbst: imirkin: btw, astc seems to work on the gm20b, I just got a random hang in the run... trying to figure out what that was all about
13:01 imirkin: karolherbst: ok, send patches =]
13:02 imirkin: nuke that comment about GM107 btw
13:02 karolherbst: yeah
13:02 karolherbst: we have no 3D clas for gm20b though, so I guess I'll add a chipset check there :/
13:02 imirkin: i had a mistaken notion that it was supported starting with the gm107 "new" TIC format, but that notion was ... mistaken.
13:02 karolherbst: I am still wondering about the hang though... it just stopped to continue
13:03 imirkin: yeah, can't help with that.
13:03 karolherbst: saw some nouveau error in dmesg though
13:03 imirkin: seems unlikely they are related to astc/etc2
13:04 karolherbst: I didn't test etc2 yet though, only astc
13:04 karolherbst: but yeah
13:04 karolherbst: probably not
13:04 imirkin: those tests would be in GLES3
13:04 karolherbst: yeah.. I am using the gles31 runner anyway
13:05 karolherbst: any plan to add support for 3d textures? :D
13:05 imirkin: yeah, probably need to flip to the gles3 one for the etc2 tests
13:05 karolherbst: although I doubt the hw supports this HDR astc stuff
13:05 imirkin: you mean 3d astc?
13:05 imirkin: or hdr astc? they're different things.
13:06 karolherbst: NotSupported (TEXTURE_3D target requires HDR astc support. at es31fCopyImageTests.cpp:1905)
13:06 imirkin: hmmmmmmm
13:06 imirkin: right ok
13:06 imirkin: so there's 2 things
13:06 imirkin: 1 is OES_texture_compression_astc or whatever
13:06 imirkin: which introduces 3d astc modes (4x4x4 and whatnot)
13:06 imirkin: afaik there's no hw out there with support for that
13:06 imirkin: there's also this other thing...
13:07 karolherbst: actually there are some 3d texture tests which are enabled.. interesting
13:07 imirkin: ah right. HDR does add support for the sliced thing.
13:07 karolherbst: dEQP-GLES31.functional.copy_image.mixed.viewclass_128_bits_mixed.rgba32f_rgba_astc_10x5_khr.texture3d_to_texture2d eg
13:07 imirkin: The HDR profile is a superset of the LDR profile, and also supports
13:07 imirkin: texture target TEXTURE_3D for images made up of multiple two-dimensional
13:07 imirkin: slices of compressed data.
13:07 karolherbst: but not dEQP-GLES31.functional.copy_image.mixed.viewclass_128_bits_mixed.rgba32f_rgba_astc_10x5_khr.texture3d_to_texture3d
13:07 imirkin: i thought there was a separate "sliced" extension too
13:08 imirkin: yeah, there is! KHR_texture_compression_astc_sliced_3d
13:08 karolherbst: ahh
13:08 imirkin: not 100% sure we support it. if we don't, we probably should.
13:08 karolherbst: well.. it seems just a little random which tests are disabled or not
13:08 imirkin: nah, it makes sense
13:08 imirkin: rgba32f 3d -> astc 2d
13:09 imirkin: which is ok, coz they're in the same view class
13:09 imirkin: 3d -> 3d requires astc to work with 3d, whcih it decides it doesn't
13:09 imirkin: either because we don't advertise the sliced ext
13:09 imirkin: or because the test doesn't know about it
13:10 karolherbst: well, it always skips 3D to 3D, yes
13:10 karolherbst: but 2D to 3D seems random
13:10 imirkin: if the 3d side is !astc, should be fine
13:10 imirkin: do we advertise the sliced ext?
13:11 karolherbst: GL_KHR_texture_compression_astc_sliced_3d
13:11 karolherbst: yes
13:11 imirkin: yeah, we should
13:11 imirkin: so the tests just don't know about it
13:11 imirkin: i'm less interested in the copy image tests btw
13:11 imirkin: and more in the proper astc texturing tests
13:12 karolherbst: give me the proper regex and I'll run it on the hw :p
13:12 imirkin: heh
13:12 imirkin: sec
13:12 karolherbst: I am running *astc* so that' should hit everything :p
13:13 imirkin: dEQP-GLES3.functional.texture.compressed.astc.*
13:13 imirkin: it's in gles3 =]
13:13 imirkin: actually, all of dEQP-GLES3.*astc*
13:14 imirkin: there are a few more
13:14 imirkin: and also *etc* while you're at it
13:14 imirkin: (which will get a couple of texturefetch tests, but you can handle it)
13:15 karolherbst: I forgot to clock up the jetson :)
13:15 karolherbst: but the bottleneck really seems to be deqp
13:16 karolherbst: "dEQP-GLES3.*astc*" would also select copy_image, no?
13:16 imirkin: no copy image in GLES3
13:16 karolherbst: or is that 3.1 only
13:16 karolherbst: ahhh
13:17 karolherbst: nice, the jetson consumes 5.5W now :)
13:17 karolherbst: but something is wrong with the voltage regulation
13:17 karolherbst: at least nouveau reports garbage
13:17 karolherbst: GPU core: -0.02 V (min = +4294967.28 V, max = +4294967.28 V)
13:17 imirkin: sounds like -1 :)
13:17 karolherbst: min max, yes
13:17 karolherbst: the value is weird
13:18 karolherbst: on 0xb pstate it is more sane
13:18 karolherbst: "GPU core: -0.02 V (min = +4294967.28 V, max = +4294967.28 V)"
13:18 karolherbst: ...
13:18 karolherbst: GPU core: +1.11 V (min = +4294967.28 V, max = +4294967.28 V)
13:18 imirkin: maybe the GPU core is just producing power, and they're indicating it with a negative voltage? :)
13:18 karolherbst: but I suspect the dts stuff to be buggy still as well
13:18 imirkin: nuclear-powered nano
13:19 karolherbst: still have random errors like this: pwm-regulator regulators:regulator@6: Failed to get enable GPIO: -517
13:19 imirkin: that's EPROBE_DEFER iirc
13:19 RSpliet: They split Intel atom to power that Jetson
13:19 imirkin: RSpliet: nice :)
13:19 karolherbst: imirkin: yeah.. but I am sure to have enabled the correct drivers
13:19 imirkin: karolherbst: yeah, that's a huge problem with ARM builds
13:19 karolherbst: but I get also stuff like this: sb2-0: usb2-0 supply vbus not found, using dummy regulator
13:20 imirkin: that's expected
13:20 karolherbst: "Tegra210: unknown SKU 0x8f" :)
13:20 karolherbst: also nice
13:20 imirkin: i assume you've enabled the tegra pinctrl thing?
13:20 karolherbst: yeah
13:20 imirkin: otherwise you wouldn't get far
13:20 karolherbst: I enabled the SOC_TEGRA210 option
13:20 karolherbst: which enables a bunch of stuff
13:21 imirkin: ah yeah, ideally that should be up-to-date
13:21 karolherbst: still needed to add a few things, but yeah
13:21 imirkin: well, it selects stuff for the SOC
13:21 karolherbst: one error vanished with 5.5+
13:21 karolherbst: but... uff
13:21 imirkin: not for random shit that happens to be on attached to it
13:21 karolherbst: 5.5+ is just broken
13:22 imirkin: on the bright side, 5.6 includes enough fixes to work with ifc6410 again :)
13:22 imirkin: (apq8064)
13:22 karolherbst: yeah well :/
13:22 karolherbst: with 5.6 I get this: https://gist.githubusercontent.com/karolherbst/ab31b41d600d20dc00ffa8b38d6281ab/raw/3a1395155a8119a04286683f678fc787e6dee615/gistfile1.txt
13:22 imirkin: can't have 'em all
13:22 imirkin: good one...
13:22 imirkin: i think tagr would be interested --^
13:22 karolherbst: yep
13:23 karolherbst: swiotlb was broken in 5.5 as well, but different
13:24 karolherbst: ahh
13:24 karolherbst: hit the rror again
13:25 karolherbst: nouveau 57000000.gpu: fifo: FB_FLUSH_TIMEOUT
13:25 karolherbst: https://gist.githubusercontent.com/karolherbst/80e809ad768fd285e280471e42e002d3/raw/b85bedb623c2745d5fd7c7f93ad78b39d3813c96/gistfile1.txt
13:28 karolherbst: anyway.. running the 3.0 tests now
13:28 karolherbst: looks fine for now
13:38 karolherbst: imirkin:
13:38 karolherbst: Passed: 1332/1334 (99.9%)
13:38 karolherbst: Failed: 2/1334 (0.1%)
13:39 imirkin: look for "Fail" in TestResults.qpa
13:39 karolherbst: imirkin: ../../executor/testlog-to-csv TestResults.qpa | grep ,Fail :p
13:39 imirkin: or that
13:39 karolherbst: dEQP-GLES3.functional.negative_api.texture.compressedteximage3d_invalid_astc_target and dEQP-GLES3.functional.negative_api.texture.texstorage3d_invalid_astc_target
13:39 imirkin: i didn't want to look up how the to-csv stuff worked, wanted to give you something easy :)
13:39 karolherbst: API tests.. so what :D
13:39 imirkin: ok, so those tests are most likely wrong
13:39 imirkin: because they don't know about the sliced_3d ext
13:40 karolherbst: GL_INVALID_OPERATION should be generated if using TEXTURE_3D with LDR ASTC.
13:40 karolherbst: yeah...
13:40 imirkin: good times.
13:41 imirkin: this is with hw astc or sw?
13:41 karolherbst: hw
13:41 imirkin: coolio
13:41 karolherbst: what annoys me the most is that I can't turn of the tegra device :(
13:41 imirkin: coz of the nuclear powerplant inside of it/
13:41 karolherbst: well.. at least I hoped I can just cut of the PoE power through the switch
13:42 karolherbst: but it only allows me to power cycle it
13:42 karolherbst: :D
13:42 imirkin: heh
13:42 imirkin: i'm sure some number of wires could be unplugged to cause it to lose power
13:42 karolherbst: and powering the device of just let it get stuck somewhere
13:42 imirkin: for example, you could flip the main breaker in the house
13:42 karolherbst: well..
13:42 karolherbst: it works with l4z
13:42 karolherbst: *l4t
13:42 imirkin: annoying.
13:42 karolherbst: yes
13:43 imirkin: you can ask in like #tegra or something.
13:43 karolherbst: the jetson is just 3m away, but still :D
13:43 imirkin: your wingspan isn't that long? :)
13:44 karolherbst: it's more about the cable I have to remove :/
13:44 karolherbst: uff.. the poe splitter has a idle power consumption of 0.5W uff
13:44 karolherbst: I blame PoE
13:44 karolherbst: the jetson can do PoE, but only at 5V
13:44 karolherbst: and PoE is usually like 52V...
13:45 karolherbst: so I needed a splitter to convert that down :(
13:45 imirkin: it doesn't support standard?
13:45 karolherbst: not the 52V at least
13:45 imirkin: weird.
13:45 karolherbst: but I think it supports the standard PoE controls
13:46 karolherbst: anyway, the splitter gives me the same feature set
13:46 karolherbst: imirkin: could have bought something like this, but.. well https://www.iotamy.com/20W-PoE-Module-for-Jetson-Nano
13:46 karolherbst: it's essentially the same :)
13:47 imirkin: who makes non-standard PoE inputs... sigh
13:47 karolherbst: well
13:49 karolherbst: imirkin: I think the jetson only needs the voltage step-down regulator.. the wiring is nearly done or something
13:49 karolherbst: at least there are pins for this
13:50 karolherbst: imirkin: seems like the raspberry pi has the same flaw here
13:50 karolherbst: but also pins
13:54 karolherbst: anyway, those poe splitter are a nice thing to have :) mine can even switch between 5V/9V/12V
13:55 imirkin: the nice thing about standards...
13:55 imirkin: is that there are so many to choose from
13:55 karolherbst: well.. the port itself is compliant :)
13:55 karolherbst: you just get the 52V from some pins on the board
13:58 karolherbst: ohh.. I forgot to test etc
14:02 karolherbst: fincs: etc seems to work in hw as well
14:03 karolherbst: so I guess nv just disables it for no reason
14:07 imirkin: presumably astc is better all around
14:07 imirkin: no need to create confusion for developers
14:08 imirkin: i guess etc may be better for the r11 / rg11 things
14:08 imirkin: dunno
14:08 karolherbst: well
14:08 karolherbst: I could imagine there might be some corner cases where etc is better
14:08 karolherbst: but yeah
14:08 karolherbst: maybe
14:09 imirkin: they probably also don't expose s3tc, rgtc, or bptc either
14:12 karolherbst: probably not
14:12 karolherbst: maybe s3tc though
14:12 karolherbst: given how much it's used for games
14:28 fincs: "etc seems to work in hw as well" <-- Nice, good to hear confirmation
14:28 fincs: NVN exposes all the BCn formats from DirectX
14:28 fincs: So they do expose S3TC/RGTC/BPTC
14:28 imirkin: huh ok
14:28 imirkin: so just ETC got the axe? weird.
14:28 fincs: They specifically only left out ETC
14:28 imirkin: patent fees? dunno
14:28 fincs: They even expose BC7 ffs
14:28 imirkin: or someone was lazy
14:29 karolherbst: imirkin: etc is royalty free :)
14:29 imirkin: the hardware?
14:29 imirkin: they may have had to license some IP thing, who knows
14:29 karolherbst: ahh, maybe
14:30 imirkin: esp since it's tegra-only to begin with, that's not completely crazy
14:30 karolherbst: ehhh, nouveau doesn't load on 5.6
14:30 karolherbst: ufff
14:30 karolherbst: what the heck is wrong
14:30 karolherbst: well it loads, but doesn't bind fo the device
14:30 imirkin: karolherbst: it's because apq8064 got fixed. gotta maintain a balance of working-ness
14:30 karolherbst: at least I got patches to fix the swiotlb bug
14:30 imirkin: if you fix thing, another one has to get broken. it's the law.
14:31 karolherbst: ahh.. the gpu is gone.. uff
14:31 imirkin: welcome to ARM boards
14:32 karolherbst: the heck...
14:35 karolherbst: actually.. the gpu still exists
14:35 karolherbst: mhhh
14:35 karolherbst: why doesn't it bind to nouveau
14:36 imirkin: did you forget to enable CONFIG_NOUVEAU_PLATFORM or whatever?
14:36 imirkin: (is it a separate enable)
14:36 karolherbst: I just updated from 5.4
14:36 karolherbst: but maybe it got disabled or so
14:37 karolherbst: nope, it's enabled
14:37 imirkin: nevermind, i don't think it's a separate enable
14:37 karolherbst: CONFIG_NOUVEAU_PLATFORM_DRIVER
14:37 imirkin: oh, ok
14:37 imirkin: well, that would have been easy :)
14:37 imirkin: modinfo nouveau
14:38 imirkin: what does it say for "alias"?
14:38 karolherbst: it's builtin
14:39 imirkin: hrmph
14:40 karolherbst: mhh "No such device"
14:40 karolherbst: when trying to bind it
14:40 karolherbst: there is this "Failed to set up IOMMU for device 57000000.gpu; retaining platform DMA ops" error, but that never caused any issues in the past
14:40 karolherbst: _but_
14:40 karolherbst: wasn't there some commit in this direction?
14:41 imirkin: dunno
14:53 karolherbst: anyway.. it's probably a nouveau bug :( sigh
15:03 karolherbst: ehh
15:03 karolherbst: now we enforce the vdd to be valid
15:03 karolherbst: and "pwm-regulator regulators:regulator@6: Failed to get enable GPIO: -517" is the vdd the gpu points to
15:03 karolherbst: uff
15:06 karolherbst: but heh.. why.. nothing really changed here
15:17 imirkin: could be a load ordering thing if nouveau doesn't handle the EPROBE_DEFER
15:18 karolherbst: doubtful
15:18 karolherbst: the VDD is indeed disabled
15:18 karolherbst: wondering why
15:18 karolherbst: that dts stuff is just super annoying to debug
15:19 karolherbst: the vdd_gpu is the regulator@6 thingy
15:19 karolherbst: and it has this entry: enable-gpios = <&pmic 6 GPIO_ACTIVE_HIGH>;
15:20 imirkin: let's seeeeee here
15:20 imirkin: 5.6, right?
15:20 karolherbst: yes
15:20 imirkin: which dts is it?
15:20 imirkin: there's like 30 tegra210's
15:21 fincs: Okay, I see a ton of functions that appear to come from GL_EXT_direct_state_access (this was introduced in mesa 20.0)
15:21 karolherbst: arch/arm64/boot/dts/nvidia/tegra210-p3450-0000.dts
15:21 imirkin: fincs: correct
15:21 fincs: All this stuff is compatibility profile grrrrr
15:21 imirkin: fincs: correct
15:21 karolherbst: imirkin: # cat /sys/bus/platform/devices/regulators:regulator@*/regulator/regulator.*/state
15:21 karolherbst: all enabled except that one
15:21 karolherbst: so.. things seem to work somewhat at least
15:22 fincs: Lots of _mesa_* _mesa_marshal_* _mesa_unmarshal_* funcs related to cmds added by that extension
15:22 karolherbst: yep
15:22 karolherbst: threading stuff
15:22 karolherbst: more perf and so
15:22 fincs: And yeah I saw threading stuff
15:23 imirkin: karolherbst: do you have tegra pwm built?
15:23 karolherbst: yes
15:23 imirkin: karolherbst: pastebin your .config
15:24 karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/1684dd72a992d96042f31f9a613c828f/raw/08b50a4d5e8372b40b2c9254377a94984b6e48da/gistfile1.txt
15:24 fincs: So it really is related to new features after all
15:24 karolherbst: yep
15:24 karolherbst: that threading stuff is a bit huge though
15:24 karolherbst: you could probably get rid of it.. but-.. mhhh
15:25 karolherbst: it helps if you are CPU bound
15:25 karolherbst: which you are
15:25 karolherbst: probably
15:25 fincs: Is it even possible to disable it? I don't feel right removing threading fixes because we do in fact have threads
15:25 imirkin: one has to enable it though - it's off by default
15:25 karolherbst: fincs: it's async execution of stuff
15:25 fincs: Hmm
15:25 karolherbst: so we marshal the gallium stuff and enqueue on a worker thread
15:25 fincs: Ohhhhhhhh
15:25 fincs: Yeah I don't want that lol
15:25 karolherbst: well
15:25 karolherbst: you want that
15:25 imirkin: you say that
15:26 karolherbst: it helps with perf
15:26 imirkin: but you kinda do :)
15:26 fincs: Yeah but... code size...
15:26 karolherbst: well
15:26 karolherbst: perf or size ;)
15:26 fincs: I thought it was shit like
15:26 fincs: Locking/unlocking fixes mostly
15:26 karolherbst: you could have multiple builds though
15:26 karolherbst: that would be interesting
15:26 karolherbst: "super small" and "super fast"
15:27 karolherbst: and you change some defaults around
15:27 imirkin: karolherbst: can you check that pwm@7000a000 is "ok"?
15:27 karolherbst: like you could potentially even enable the spirv bits... but I suspect I still need to support some nir stuff for that
15:28 imirkin: vdd_gpu is the only regulator that sets it
15:28 karolherbst: imirkin: how...
15:28 karolherbst: it's there, but no idea if it's in a good state
15:28 imirkin: same as for the regulators
15:28 karolherbst: sadly no
15:28 fincs: What's the name of the extension that does the worker thread stuff?
15:28 imirkin: fincs: no ext
15:29 imirkin: has to be enabled via driconfig setting
15:29 imirkin: (can also do it via env var)
15:29 fincs: What's the name of the env var?
15:29 imirkin: mesa_glthread=1 iirc
15:30 imirkin: karolherbst: pastebin dmesg
15:30 fincs: Hmm, but this seems to be inside dri stuff and we don't use dri
15:30 karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/a365edce97dfb2f24a6f49039b5559ab/raw/5587fcbca8f534e5e2ea05a4523e43657fabdf37/gistfile1.txt
15:30 karolherbst: no change in regards to 7000a000 from a working boot
15:30 imirkin: hrm. there's a way to get more info...
15:31 imirkin: actually, no, this makes sense.
15:31 imirkin: [ 0.104048] pwm-regulator regulators:regulator@6: Failed to get enable GPIO: -517
15:31 imirkin: this is a EPROBE_DEFER
15:31 imirkin: [ 1.014597] max77620 4-003c: PMIC Version OTP:0x35 and ES:0x8
15:31 imirkin: this is the "pmic" which is loading
15:31 karolherbst: ehh
15:31 imirkin: and is required by the regulator
15:31 karolherbst: what's funny is, that I got this error also on working boots, so...
15:32 imirkin: yeah, it's not an error
15:32 imirkin: it's fine
15:32 karolherbst: okay
15:32 imirkin: i mean - it's an error, but it's not a bad error :)
15:32 imirkin: but that regulator never ends up re-loading
15:32 imirkin: should look at pwm-regulator
15:32 imirkin: perhaps it doesn't handle EPROBE_DEFER correctly
15:33 imirkin: which would be hugely surprising
15:33 karolherbst: yeah..
15:33 imirkin: hold on
15:33 karolherbst: I could git bisect it...
15:33 imirkin: it's probably nouveau
15:33 imirkin: no
15:33 imirkin: give me a few minutes
15:33 imirkin: yeah
15:34 fincs: Thread code doesn't even seem reachable
15:34 karolherbst: fincs: it's very magical
15:34 imirkin: ok, so in nvkm_device_tegra_new, it tries to get the regulator
15:34 karolherbst: like the no_error stuff
15:34 karolherbst: also magic
15:34 fincs: Like, it's only enabled/called from dri stuff?
15:34 karolherbst: ohhh
15:34 karolherbst: might be.. yeah
15:35 fincs: And... we aren't using dri at all
15:36 karolherbst: ohh, you never call into dri_create_context...
15:36 imirkin: karolherbst: hrmph. it looks like it should work :(
15:36 karolherbst: yep...
15:36 imirkin: karolherbst: i bet if you build nouveau as a module, it *will* work
15:36 karolherbst: ufff
15:36 karolherbst: annoying
15:36 karolherbst: but
15:36 karolherbst: I can't enable that regulator dynamically
15:37 imirkin: nothing uses it
15:37 imirkin: i dunno
15:37 imirkin: maybe something else is missing
15:37 karolherbst: ahh, mhh
15:37 imirkin: maybe module won't help
15:37 imirkin: dunno
15:37 imirkin: in that case that regulator depends on something funky
15:37 imirkin: but i don't see it
15:38 karolherbst: well.. a git bisect will probably tell us
15:38 imirkin: it depends on the pwm
15:38 karolherbst: the compilation time is also super low
15:38 imirkin: which you've enabled
15:38 karolherbst: yep
15:38 karolherbst: as I said, it used to work
15:39 karolherbst: $ git fetch --unshallow is really the most painful part of bisecting here ...
15:39 karolherbst: oohhh.. I have an idea
15:39 fincs: Yeah so I can't see any codepath that would actually call into the thread creation/teardown code
15:39 imirkin: could be this tegra-car thing
15:40 karolherbst: "$ git fetch /home/kherbst/git/linux --unshallow" wtf
15:40 karolherbst: fincs: you might want to add support for that
15:40 karolherbst: should give a significant boost in CPU bound workloads
15:41 fincs: Hmm, I think I should indeed create multiple builds of mesa
15:41 karolherbst: yeah
15:41 karolherbst: the small one you can even build with Os then
15:41 karolherbst: and the fast with Ofast :)
15:41 fincs: Problem is, I don't know what's the best way to install things side by side
15:41 karolherbst: mhhhh
15:41 fincs: We have libEGL.a
15:41 karolherbst: that's the only thing, no?
15:41 fincs: And libglapi.a and the GLES libs too
15:41 karolherbst: mhhh
15:42 imirkin: karolherbst: heh
15:42 imirkin: i think i see it...
15:42 fincs: But really: https://github.com/mesa3d/mesa/search?q=start_thread&unscoped_q=start_thread
15:42 imirkin: oh wait, no
15:42 imirkin: gr
15:43 imirkin: i got nothin'
15:43 imirkin: good luck.
15:43 karolherbst: well, I cross compile the kernel, so that's not that painful :p
15:43 karolherbst: I just hope I won't need the serial console :D
15:44 imirkin: there's a CONFIG_DRIVER_DEBUG which should give you more info
15:44 karolherbst: ohh
15:44 imirkin: it makes load attempts much more verbose
15:44 karolherbst: well.. I just bisect, maybe that's enough
15:44 karolherbst: shouldn't take too long
15:44 karolherbst: the kernel builds in like a minute
15:52 fincs: Okay, I'm happy now :)
15:53 fincs: Now to wait for 20.1 release
15:53 imirkin: if you're lucky, i'll have finished passthrough GS in time
15:54 fincs: ( ͡° ͜ʖ ͡°)
15:54 karolherbst: fincs: so how did you solve your propblem :p
15:54 fincs: Just a simple two-line ifdef
15:54 karolherbst: you threw away the threading stuff?
15:55 fincs: I allowed the linker to discard it, yes
15:55 fincs: (as it was already unreachable)
15:55 karolherbst: ahh
15:55 fincs: This is something I'd only enable for real in a separate build
15:55 fincs: As it has a cost not every homebrew app will want to pay
15:55 karolherbst: well.. isn't there a linker option to discard everything not exported? or is that not a thing in *.a files.. well I guess not
15:56 fincs: This is yet another case of "pointer to unused thing stored in function pointer table"
15:57 fincs: Which neither the compiler nor the linker can discard
15:58 fincs: So I had to give it a bit of help :)
16:00 karolherbst: mhhh... might be worth having a whitelist of symbols
16:00 karolherbst: and some unreachable check
16:01 karolherbst: fincs: ever thought about enabling LTO? This sounds like something lto should be able to take care of as this function was obviously not used
16:01 fincs: I'm thinking of enabling LTO, however
16:01 fincs: - I don't think LTO is smart enough to throw out this code, and
16:01 karolherbst: imirkin: nouveau loads with 5.5 at least :)
16:01 fincs: - mesa built with LTO on means every single homebrew compilation needs to compile the entirety of mesa
16:02 fincs: Which is... ouch
16:02 karolherbst: why?
16:02 fincs: Because all compilation is done at link time
16:02 karolherbst: you can embed the LTO stuff into the *.a file
16:02 karolherbst: and if the user doesn't use LTO it won't matter
16:02 karolherbst: the file will be... big
16:02 fincs: Even if the user doesn't use LTO it still needs to compile
16:02 karolherbst: but it doesn't matter
16:02 fincs: Because mesa is built with LTO
16:02 fincs: And only LTO
16:02 karolherbst: no
16:02 fincs: Yes
16:02 karolherbst: you can support both
16:02 fincs: It needs to codegen at link time
16:03 karolherbst: and the lto stuff is emebdded in the *.a file
16:03 karolherbst: so you can just ship it
16:03 fincs: But the LTO stuff is only half compiled
16:03 karolherbst: you can have a hybrid binary
16:03 fincs: And even if your program doesn't use LTO, it still needs to finish off mesa's LTO object files into actual code
16:03 fincs: That's how LTO works...
16:04 karolherbst: and even if, so what?
16:04 karolherbst: but you can just enable -ffat-lto-objects
16:04 karolherbst: which is the default afaik
16:04 fincs: mesa is a big codebase and we'd be talking about a non-trivial amount of minutes for every single time a user wants to build a homebrew app
16:04 fincs: It's not the default
16:05 karolherbst: ohh, right
16:05 karolherbst: anyway
16:05 karolherbst: there is a solution
16:05 karolherbst: and if that leads to smaller binaries, why is this a problem
16:05 fincs: A solution which isn't really feasible
16:05 karolherbst: why not?
16:05 fincs: Due to the issues I've pointed out
16:05 karolherbst: those are no issues as there is -ffat-lto-objects
16:05 fincs: Yeah, so then non-LTO apps can't benefit
16:06 karolherbst: well, then they don't care about size as much anyway :p
16:06 fincs: Assuming -ffat-lto-objects does what you say it does
16:06 karolherbst: "Fat LTO objects are object files that contain both the intermediate language and the object code."
16:06 fincs: Yes but
16:06 fincs: You're assuming the linker will pick out the non-LTO part if you don't link with LTO
16:06 karolherbst: "This makes them usable for both LTO linking and normal linking. "
16:06 fincs: I think -ffat-lto-objects is meant for linking with LTO unaware linkers
16:06 fincs: But we don't ship any LTO unaware linkers
16:07 karolherbst: then it's not an issue
16:07 karolherbst: and it's not minutes
16:07 fincs: Yes it is, as we would still have astronomically slow linking times
16:07 karolherbst: lto is fast these days
16:07 karolherbst: seriously
16:07 fincs: Not really
16:07 karolherbst: it is
16:07 karolherbst: I run an entire system with lto
16:07 fincs: We have a project that uses LTO
16:07 fincs: And it takes ages to build
16:08 karolherbst: new gcc?
16:08 fincs: gcc 9.x
16:08 karolherbst: weird
16:08 karolherbst: it
16:08 fincs: And it takes up gargantuan amounts of RAM too
16:08 karolherbst: it's really not that slow anymore
16:08 karolherbst: -Ofast has a bigger impact then lto
16:09 karolherbst: yeah.. it needs more RAM, that's true
16:09 karolherbst: but then again, you still have the hybrid stuff
16:09 karolherbst: and it works
16:10 fincs: Tbh, throwing LTO at something is kind of a bodge
16:10 HdkR: ThinLTO is super nice but it isn't an end all solution :)
16:10 karolherbst: fincs: why?
16:10 fincs: Because it's just a bandaid that tries to cover design flaws
16:10 karolherbst: LTO is nice as it gives you usually smaller and faster binaries
16:11 karolherbst: well...
16:11 karolherbst: it's an optimization for a hard problem
16:11 karolherbst: and what do you mean by "design flaw"?
16:11 karolherbst: is the design flaw there is no inter object opts?
16:11 karolherbst: then LTO should be the default :p
16:12 fincs: The design flaw is that some software is built in a way that makes it impossible to tell what's unused code
16:12 karolherbst: that's not something LTO addresses directly
16:12 fincs: Mostly due to function pointer tables
16:12 karolherbst: it's just a side effect
16:12 karolherbst: fincs: this issue is not a func pointer table one
16:12 fincs: Yes
16:12 fincs: LTO isn't even meant to solve this
16:13 fincs: But anyway
16:13 karolherbst: fincs: why is dri_create_context even picked up in your build?
16:13 fincs: dri_create_context is not used nor picked up
16:13 fincs: It's not even compiled
16:14 karolherbst: okay, so which functions get removed by your solution. All the marshal ones I suspect?
16:14 fincs: st_api_create_context sets up a table of function pointers ("iface"), which includes the start_thread/thread_finish "methods" of the interface
16:14 fincs: I just ifdef'd out the part that fills in those function pointers
16:15 fincs: And that cascades into removing 560KB
16:15 karolherbst: yeah well..
16:15 karolherbst: func ptr vs conditional code
16:15 fincs: So basically... all of glthread.c/h
16:16 fincs: And related files
16:16 fincs: Funnily enough, the marshalling code is still here
16:16 karolherbst: yeah...
16:16 fincs: But that's probably due to the apigen stuff
16:16 karolherbst: thats probably forced exported or so
16:16 karolherbst: that hooks into the dispatch table, no?
16:16 fincs: Yeah I believe so
16:17 karolherbst: you shouold probably take a look in there as well and be able to get rid of the validation layer as well :p
16:17 fincs: Need to find where the marshalling code lives though
16:18 karolherbst: generated
16:25 karolherbst: ahhhhhhh
16:25 karolherbst: I hate embedded devices :D
16:27 karolherbst: yeah.. nice.. my serial console is probably at work
16:32 fincs: Okay
16:32 fincs: Looks like there's no marshalling code here... there's *un*marshalling code
16:35 fincs: Ahh I know why this is happening
16:35 fincs: gl_marshal.py generates a table (_mesa_unmarshal_dispatch), and that's fine
16:35 fincs: However I think we compile without -fdata-sections
16:35 fincs: So that gets linked in anyway
16:36 fincs: Oh, nope, we are in fact using -fdata-sections, hmm
16:38 fincs: _mesa_glthread_finish/_mesa_glthread_destroy are still getting linked in
16:39 fincs: Yup it's _mesa_glthread_destroy
16:43 karolherbst: *sigh*...
16:44 fincs: An additional 120KB removed, nice :D
17:02 karolherbst: this starts wonderfully: https://gist.githubusercontent.com/karolherbst/a87a8af3d617662934ccb02426c32a66/raw/ba236f07fce0e8f076fab31949fbf6a92620dc05/gistfile1.txt
17:05 imirkin: lol
17:06 karolherbst: btw, had to skip three times in the meantime
17:07 karolherbst: stupid compilation error
17:07 imirkin: it seems to happen a lot more in the arm world
17:07 imirkin: things just get broken
17:07 karolherbst: linux/drivers/gpio/gpio-max77620.c:44:42: error: ‘struct gpio_chip’ has no member named ‘irq’
17:07 karolherbst: well
17:07 imirkin: it's such a wide array of things that'd need to get tested
17:07 karolherbst: sure
17:07 karolherbst: but my config is super minimal
17:07 karolherbst: but.. yeah
17:07 imirkin: compilation seems easier to nail down
17:07 karolherbst: well if tegra only used stuff doesn't get tested
17:07 imirkin: but i mean runtime issues
17:08 karolherbst: yeah..
17:08 karolherbst: that's normal and that's also less annoying
17:08 karolherbst: but I also had runtime issues where the jetson doesn't boot :(
17:08 imirkin: i guess i'm more easily annoyed
17:08 imirkin: compilation is usually a stupid fix later on
17:08 karolherbst: yep..
17:08 karolherbst: but that driver is only used by jetson devices
17:08 imirkin: which is easy to locate and cherry-pick during the bisect
17:08 karolherbst: so.. testing is very minimal I guess
17:09 imirkin: runtime issues are like "well, this doesn't work. let me find a base kernel to start the bisect at"
17:09 imirkin: and you end up going back to like 4.2
17:09 karolherbst: well..
17:09 imirkin: coz it's been broken this whole time
17:09 karolherbst: the issue with cherry-picking is, that it doesn't work as well on a linux tree
17:09 karolherbst: I normally pick such fixes and insert them into the tree on the correct place.. but mhhh
17:09 imirkin: works fine, esp if you don't commit
17:09 karolherbst: you can't rebase
17:09 imirkin: nah, just run the cherry-pick at the bisect point
17:09 karolherbst: yeah... well...
17:09 imirkin: you have to keep track of which things to pick when
17:09 karolherbst: ahh mhh
17:10 karolherbst: right.. that should work as well
17:10 imirkin: if you make a commit, bisect will get upset
17:10 imirkin: since it'll think you tested a different kernel than it wanted
17:10 imirkin: (easily worked around too, but even easier is to use -n or whatever ot make cherry-pick not make a new commit)
17:12 imirkin: so yeah, in all i much prefer compilation failures. they tend not to last long, and get resolved easily.
17:12 imirkin: runtime issues can linger for ages
17:13 karolherbst: I just test the commit fixing it
17:13 karolherbst: ...
17:14 karolherbst: the heck...
17:14 karolherbst: "Restart config..." *sigh*
17:14 karolherbst: I hit all the issues this time I think
17:18 karolherbst: ahh no... that's caused by somebody not able to merge stuff properly
17:33 karolherbst: this restarting config bug is probably the most annoying and most stupid of them all :(
17:34 karolherbst: well.. heck
17:38 karolherbst: ufff
18:24 karolherbst: finally.. 60% done
18:25 karolherbst: that took longer than expected...
18:39 imirkin: and the winner is?
18:40 karolherbst: well... not there yet. I was more refering to finding a proper range to have a less annoying bisection
18:41 imirkin: i like to look at "git bisect visualize" while it's building and place bets (with myself, i suppose), on which way it'll go
20:15 karolherbst: this is by far the most painful bisect I ever did :/
20:19 Lyude: karolherbst: I always just do `make olddefconfig`
20:19 karolherbst: Lyude: yeah.. but sometimes that won't work
20:19 karolherbst: and the kernel starts the config from scratch
20:20 Lyude: karolherbst: you sure? i've done a lot of really long/painful bisects and that command was what I ended up with so I'd never have to actually input more then two or three config options
20:20 imirkin: Lyude: arm boards?
20:21 imirkin: where drivers are added/removed/moved around?
20:21 karolherbst: Lyude: yes...
20:21 Lyude: imirkin: oh lord for arm boards that's a whole other story
20:21 Lyude: x86 usually isn't terrible though
20:21 karolherbst: adhshdjakshdjkashdkasd
20:21 karolherbst: this is terrible
20:21 karolherbst: I have like 2200 commits to go still
20:22 karolherbst: and now I moved a few merges down... doesn't boot
20:22 karolherbst: and if I am lucky to get a commit which boots, I am able to reduce the bisect by 50 commits
20:22 karolherbst: ....
20:22 karolherbst: trying out 5.7-rc1.. the hell
20:22 Lyude: karolherbst: last time I dealt with issues like that I had to bisect what broke the boot, then restart the previous bisect from where I left off while applying the patches to fix trhings over and over again
20:23 Lyude: git rerere makes that slightly less nightmarish at least
20:26 imirkin: Lyude: yeah, on x86 bisects are fairly painless
20:27 imirkin: hardware's basically stable
20:27 imirkin: config options change rarely
20:27 imirkin: the ones that do don't really matter
20:27 karolherbst: Lyude: any further questions? https://gist.githubusercontent.com/karolherbst/eaeab4dcaeb9d31b2e22f2759f23babf/raw/080bb6dec51e42f6408f0d510e15997eda55dbb2/gistfile1.txt
20:28 imirkin: i spent like a full day bisecting something which turned out to be a kconfig issue
20:28 imirkin: bisected it to the commit which added the config entry
20:28 karolherbst: yeah.. I expect the same here as well
20:28 imirkin: i had been enabling it, but it was poison for the system
20:28 imirkin: for no apparent reason either - unrelated driver
20:28 imirkin: but it selected something which in turn was bad
20:29 karolherbst: uff
20:29 karolherbst: I hope 5.7 fixes all my problems...
20:31 imirkin: ;)
20:31 imirkin: ask in #tegra
20:31 karolherbst: I already did
20:31 imirkin: chances are others are running into similar issues
20:31 imirkin: oh
20:31 imirkin: what'd they say?
20:31 imirkin: "it's easter monday, wait until tomorrow"? :)
20:31 karolherbst: there are patches for the swiotlb bug :p
20:32 karolherbst: ahh yeah.. 5.7 doesn't work either
20:56 Akronym: err... what had one to do again if he has issues with nouveau on debian? A few weeks back someone suggested to blacklist $something and force $something_else to load. Not talking about the blob, foss driver stuff only.
20:56 Lyude: hooray, crc work unblocked
20:57 imirkin: Lyude: thread_worker's got what ails ya?
20:57 Lyude: Akronym: file a bug report on gitlab usually :)
20:57 imirkin: Lyude: not great advice for nouveau
20:57 Lyude: imirkin: well I think that's what we were already using
20:57 imirkin: those bugs go nowhere
20:57 Lyude: imirkin: we should probably start looking at those bugs, *looks at self*
20:57 imirkin: Akronym: what are you looking to do? kill nouveau?
20:57 Akronym: Lyude, not what I was talking about, but thanks for the input. :p
20:58 imirkin: Lyude: if it doesn't come on a ML, i ain't looking at it
20:59 Akronym: imirkin, no no (not yet anyway *g*), it has to do with something debian did that loaded some broken / glitchy nouveau driver instead of the more sensible $thing (not called nouveau... somehow).
20:59 imirkin: Akronym: oh right
20:59 imirkin: add Device "nouveau" in your xorg.conf
21:00 Akronym: imirkin, no blacklisting needed?
21:00 imirkin: the issue is with "modesetting", which debian has patched in to be the preferred driver
21:00 imirkin: by copying the same patch in fedora
21:00 imirkin: no blacklisting needed
21:01 Akronym: imirkin, funny, that could explain why I didn't find the stuff I was doing back then... as I was SURE xorg.conf is only needed for the blob.
21:01 imirkin: i probably told you to create a 90-nouveau.conf or something?
21:01 imirkin: or 00-nouveau.conf
21:01 Akronym: imirkin, yeah
21:01 imirkin: in /etc/xorg.conf.d
21:02 Akronym: -rw-r--r-- 1 root root 73 Nov 4 22:32 20-nouveau.conf :)
21:05 Akronym: btw. how up to date is https://nouveau.freedesktop.org/wiki/FeatureMatrix/ ?
21:06 imirkin: the items on there are pretty generic
21:06 imirkin: oh yeah, should update the 2D stuff to DONE
21:06 imirkin: good point.
21:06 imirkin: thank you.
21:08 Akronym: imirkin, exactly what I was about to ask, as I did not notice any issues on 110 and 130 generation cards (just switched in this rig from a 740ti to a 1050ti).
21:08 Akronym: 750ti even
21:08 imirkin: done.
21:08 imirkin: no reclock on the 1050ti, unfortunately
21:09 imirkin: i expect that the dual-link dvi/dual-head, multicard should be set to DONE for volta/turing as well
21:09 imirkin: as well as suspend and hdmi audio
21:10 Akronym: yes, hdmi-audio did actually work I noticed :)
21:11 imirkin: but those categories just don't make a ton of sense anymore
21:11 Akronym: imirkin, reclock == power management stuff?
21:11 imirkin: yes
21:11 imirkin: reclock is a subset of pm stuff
21:11 Akronym: imirkin, did PM work on 110?
21:11 imirkin: ya
21:11 Akronym: wtf?
21:11 imirkin: manual, but you could do it
21:11 imirkin: starting with GM20x, all the firmware is locked down
21:11 imirkin: and nvidia doesn't release PM-capable fw
21:12 karolherbst: ...
21:12 karolherbst: I found my serial console
21:12 Akronym: you mean apart from the stuff baked into the blob?
21:12 imirkin: Akronym: which isn't so easy to dig out in the first place
21:12 imirkin: but yes, it's in there
21:12 imirkin: but not separately redistributable
21:13 Akronym: imirkin, but I guess the source of the fw is needed anyway, just with a blobby fw you couldn't do much anyway?
21:14 RSpliet: We'd be able to do an awful lot with a blobby firmware. Interfacing with it is fairly easy
21:14 Lyude: Akronym: i mean if we had firmware from nvidia, we could
21:14 imirkin: nah, it presents an interface, it'd be useable.
21:14 imirkin: but redistribution is a big sticking point
21:14 Lyude: ^
21:14 imirkin: we have scripts to extract e.g. video decoding firmware (where supported) from blob
21:14 imirkin: but that's a niche feature
21:16 RSpliet: and a tool to grab it from a running system... which worked back on Tesla/Fermi
21:16 RSpliet: It being the PM firmware
21:16 imirkin: yep
21:17 RSpliet: Back when I didn't know how to spell retrieve
21:18 imirkin: those were the days!
21:18 imirkin: i before e, except after c, and bunch of other exceptions :)
21:18 RSpliet: Heh yes, that rule to which there are more counterexamples than examples
21:20 Akronym: imirkin, is the FW regularly updated like the blobby driver?
21:20 Akronym: or is it more or less stable and doesn't change much?
21:21 imirkin: i think they break API with every major blob release
21:21 imirkin: or at least many major blob releases
21:21 imirkin: but if firmware works, no need to update it
21:21 karolherbst: imirkin: defconfig doesn't even enable everything I need :D
21:22 karolherbst: eg r8169 is n
21:22 karolherbst: :/
21:22 Akronym: imirkin, so one could use a once extracted blobby FW for ages with a floss driver? In theory I mean.
21:22 karolherbst: also loading usbx firmware just fails as well... it's a mess
21:22 imirkin: yes
21:22 RSpliet: karolherbst: welcome to ARM
21:22 imirkin: karolherbst: why would it enable r8169?
21:22 karolherbst: because the jetson nano has a r8169 driven ethernet port :p
21:22 karolherbst: at least the dev board has
21:23 Akronym: imirkin, and extracting PLUS distributing just the FW is forbidden?
21:23 imirkin: you're not building the jetson nano defconfig
21:23 imirkin: you're building the tx1 soc defconfig. is the r8169 inside the soc?
21:23 karolherbst: it's not inside the soc :(
21:23 imirkin: Akronym: distributing is defintiely forbidden
21:23 imirkin: extracing is more questionable
21:23 imirkin: karolherbst: do you expect some random raid controller to be enabled in that defconfig?
21:24 karolherbst: it's from the official dev board
21:24 imirkin: here's what'd be AWESOME
21:24 imirkin: is if you could take a *dts* file and do a make config on that
21:24 imirkin: and then add in any extra peripherals like r8169
21:24 imirkin: which can never really be defaulted
21:25 imirkin: but figuring out what all you have to enable for a particular dts/dsti is a HUGE dog
21:25 RSpliet: Whoa, not too sensible there
21:25 RSpliet: Creating the "compatible" string to driver mapping sounds like a huge PITA
21:25 imirkin: is it?
21:25 karolherbst: imirkin: hey... do you want to kill jobs? :D
21:26 imirkin: it's all declared
21:26 imirkin: karolherbst: i want to not have to kill myself.
21:26 karolherbst: :p
21:26 imirkin: if that kills jobs in the process, so much the merrier.
21:27 RSpliet: Pretty sure those people could be put to good use by writing maintainable drivers at a pace that we don't need CodeAurora anymore. Might take a bit of training...
21:28 imirkin: RSpliet: or at least something that makes suggestions
21:28 imirkin: that can be appended to some reasonable defconfig
21:30 karolherbst: imirkin: firmware stuff is sometimes also a problem
21:30 karolherbst: but yeah..
21:30 RSpliet: Oh yeah, I'll be darned, compatible strings are just in the module device table. So all we need is an intern* that parses source code to extract it... because having to build all modules before you can create a defconfig sounds like defeating the purpose
21:31 RSpliet: Even if it isn't
21:31 RSpliet: It sounds like it
21:31 karolherbst: yeah...
21:31 karolherbst: use clang+llvm for the parsing :p
21:32 RSpliet: or grep to look for the .compatible= "blah" pattern, and scan the Kconfig to see which module option corresponds with that
21:32 RSpliet: What could go wrong...
21:34 imirkin: grep seems best.
21:34 imirkin: look, this doesn't have to be perfect
21:34 imirkin: it has to be decent.
21:34 imirkin: and talk about things it didn't find
21:35 imirkin: e.g. couldn't find any compatible strings for device X, go look for it yourself ya lazy bum!
21:35 imirkin: i'm gonna try it tonight.
21:35 imirkin: if i remember.
21:36 imirkin: i've spent _days_ on this shit
21:37 RSpliet: imirkin: think you can make a hosted solution? E.g. just scan the entire source tree, extract all "string, source file, KConfig option" combos and just stick'em in a textfile/SQlite DB/whatever... so that I can just upload a DTC/DTB to a web service and it'll tell me what to do? :-P
21:38 imirkin: yes, that's the plan
21:38 imirkin: maybe not hosted, but definitely 2-step
21:38 imirkin: RSpliet: speaking of hosted, i had some fun with emscripten:
21:38 imirkin: https://people.freedesktop.org/~imirkin/edid-decode/
21:38 imirkin: https://people.freedesktop.org/~imirkin/nvbios/
21:39 RSpliet: Nice one! Maybe I'll finally get off my lazy arse coming weekend and properly fix up the vbios DRAM timing parsing...
21:39 imirkin: the same lazy arse that's finishing a PhD thesis?
21:40 RSpliet: Handed that in two weeks ago
21:40 imirkin: oh nice!
21:40 imirkin: already signed by your advisor, or still being reviewed?
21:40 RSpliet: Hence the sudden patches against snd-hda-intel and mdp5
21:40 RSpliet: I will be examined within the next 6 weeks, then I'll get a list of corrections back, depending on how bad they are I'll have 3-6 months for those
21:41 imirkin: until then, it's get-reclocking-working-on-G92 time? :)
21:41 RSpliet: Ehm, yes about that.
21:41 RSpliet: Did I tell you about the colossal mistake I made last Christmas?
21:41 RSpliet: It's nothing to do with giving away hearts
21:41 imirkin: guessing the boards didn't survive the mistake?
21:42 RSpliet: I thought "hmm, if I have to move back after that PhD, better make my life easier and ship the stuff I haven't touched in a while back to the Netherlands"
21:42 RSpliet: Ergo: my electric guitar, and graphics cards
21:42 RSpliet: Then COVID happened
21:44 RSpliet: As a result, my nouveau hacking is strictly limited to a perfectly reclocking GT640, an "Optimus" 940M without a sound codec, and a Jetson TX1
21:44 RSpliet: The boards are unharmed, just... not within arms reach. And I have pretty long arms.
21:45 imirkin: heh, ok :)
21:45 imirkin: ooo well
21:46 Akronym: Hmm... I was sure "doesn't support nouveau driver" was a warning added to every steam game, now I don't see that anymore. Am I blind or did something change there?
21:47 RSpliet: I mean, there's still stuff worth REing. The NISO buffers for one. Then global perfomance counters. Then load-based automatic reclocking. Y'know, the stuff nobody's touched but really should.
21:47 imirkin: RSpliet: go for it =]
21:48 imirkin: Akronym: they assume people know now :)
21:48 imirkin: [tbh, i don't remember those]
21:48 imirkin: RSpliet: having flicker-free reclock would be huge
21:48 Akronym: imirkin, hum... I tried Deus Ex Human Revolution once (with proton obviously) did work, slow, but did work
21:49 RSpliet: That's precisely what the NISO buffers are for
21:49 Lyude: also there's still my powergating work that's unfinished
21:49 imirkin: Akronym: if you had reclocked your 750ti it would have been modestly slow rather than very slow
21:49 Lyude:would be more then happy to help someone pick that back up again
21:49 Lyude: especially since maxwell1/2 shouldn't need much work at all to get it going
21:49 Akronym: imirkin, slower than in windows?
21:49 imirkin: Lyude: btw, since you seem to have gravitated towards display stuff...
21:49 RSpliet: Lyude: if your boss wants to hire me for that in London, we might have a win-win situation :-P
21:50 imirkin: Lyude: could i interest you in having a closer look at HDMI 2.0? several people have had trouble getting screens to turn on
21:50 imirkin: Akronym: yes, probably about half speed
21:50 Lyude: imirkin: yeah sure
21:50 Lyude: i'm still planning on going forward with the disp cap stuff so hopefully that should help with that
21:50 imirkin: Lyude: i did try to implement it reasonably, but i only have the one display, and it worked
21:50 Akronym: imirkin, why? PM just not there yet?
21:51 RSpliet: Akronym: death by 1000 paper cuts
21:51 imirkin: Akronym: it turns out that having 1000 engineers optimize a driver with full documentation generates better results than 1-2 engineers working part time on it without docs.
21:51 Akronym: RSpliet, O_o
21:51 Lyude: imirkin: tbh, depends on the hardware ;)
21:51 imirkin: no one could have predicted it, but that's how it works out
21:51 imirkin: at least in nvidia's case...
21:51 imirkin: perhaps i'm just bad at programming.
21:52 imirkin: along with everyone else here. could be.
21:52 RSpliet: I'm not going to deny that hypothesis
21:53 imirkin: Lyude: and yeah, the ARM driver situation is laughable compared to how much effort is invested in them
21:53 Lyude: imirkin: yeah it's quite interesting
21:53 Lyude: it seems like so much more effort goes into keeping it closed vs. the actual effort required to just re the hardware
21:53 imirkin: i do think the drivers have gotten better with better conformance suites
21:53 imirkin: coz it's all outsourced development
21:54 imirkin: so they just want to call it done ASAP without much regard to quality
21:54 imirkin: so the metric is "does it pass CTS"
21:54 imirkin: and now it's harder to do a BS job and still pass CTS
21:55 Akronym: imirkin, oh... 1.) I wasn't aware that only two dudes work on nouveau 2.) Obviously I have no idea how a driver works, so, blame my ignorance that I didn't know more people on developemnt brings a better outcome. :)
21:55 imirkin: Akronym: i actually expect that docs play a big fraction of the difference
21:56 Lyude: btw skeggsb - now that I (think) we have the bikeshedding questions out of the way for the nouveau crc stuff, I'm going to work on getting a respin out asap (especially since I'm trying to get better at not leaving things hanging…), just figured I'd let you know in case there's anything you wanted me to change while I'm at it
21:56 imirkin: some perf features are very hard to RE properly
21:56 imirkin: like all this subtiling bullshit
21:56 imirkin: but the last 20% is definitely manpower
21:57 Akronym: imirkin, so easy fix, we just have to ask nvidia to release ALL documentation on there hardware very nicely and all is good? ;)
21:57 Lyude: I think we'd have a much easier time getting manpower if we didn't have the huge issue of pm looming over our heads
21:57 Akronym: their even
21:57 RSpliet: Only 20%? that's a lot of docs you're anticipating then. Who's going to read all those docs?
21:57 imirkin: RSpliet: i'm anticipating a programming guide for zcull, subtiling, and compression
21:58 imirkin: and probably some register usage guidelines
21:58 imirkin: (bank conflicts, etc)
21:58 imirkin: that would get us probably within 20% of blob
21:59 fincs: "a programming guide" <-- huh?
21:59 RSpliet: Oh that one's been sort-of explained by patents. TL;DR: we shouldn't have to worry about register bank conflicts.
21:59 imirkin: the rest of it is lots of perf analysis
21:59 imirkin: which requires a lot of manpower
21:59 karolherbst: imirkin: for perf we need a instruction scheduler with volta :)
21:59 RSpliet: scrap "with volta"
21:59 imirkin: yeah
21:59 karolherbst: RSpliet: before that it's not that relevant actually
22:00 karolherbst: sure, you can move some loads around
22:00 karolherbst: but normally it's enough to optimize against lower stall count
22:00 karolherbst: but... volta...
22:00 RSpliet: it's death by 1000 paper cuts. All those 1%s add up
22:00 airlied:wonders if skeggsb has made any progress on mem alloc uapis
22:00 karolherbst: RSpliet: volta has an alu split
22:00 karolherbst: so you have int and float alu being separate
22:01 karolherbst: of course, you have half of those per SM now
22:01 karolherbst: as in previous gens
22:01 karolherbst: so....
22:01 karolherbst: you kind of need to alternate int and float alu ops
22:01 RSpliet: Oh I'm sure it matters "more" on Volta, but I don't want to dismiss its importance :-)
22:01 karolherbst: yeah..
22:01 karolherbst: but there are bigger things
22:02 karolherbst: even dual issueing on maxwell doesn't matter _that_ much
22:02 karolherbst: even though I got some nice perf values
22:02 karolherbst: +3% perf in some benchmarks
22:02 RSpliet: That's valuable!
22:02 karolherbst: sure
22:02 karolherbst: just not good enough understood yet
22:02 fincs: Did you get around implementing the "first instruction must be ALU" rule?
22:02 RSpliet: About the fpu/alu split... how much of that will be masked by the parallel warps
22:02 karolherbst: fincs: not yet
22:02 karolherbst: RSpliet: nothing
22:03 karolherbst: fp only shaders have reduced perf
22:03 karolherbst: that's just how it is
22:03 RSpliet: Oh sure
22:04 karolherbst: why does hw have to suck so much...
22:04 karolherbst: generally
22:04 RSpliet: But warp 1's int instruction can run in parallel with warp 2's FP instruction. Using insn scheduling to Interleave fp with insn scheduling doesn't *have* to have a big impact.
22:04 RSpliet: (that is: if the kernel has both instructions ;-))
22:05 karolherbst: ohh sure
22:05 karolherbst: we still have this sched stuff as well
22:05 karolherbst: but you can dual issue int and fp alu ops
22:05 karolherbst: which.. brings us back to where we were with kepler
22:06 RSpliet: Oh can you! Ressurected from that brief death, dual-issue is kind of like Jesus :-P
22:06 karolherbst: well.. if int and float ops use different parts of the SM :p
22:06 karolherbst: with kepler they focused too much on that, so you could dual issue two alu ops
22:06 RSpliet: If you can dual-issue then yes, scheduling them to dual issue probably makes sense
22:06 karolherbst: but...
22:06 karolherbst: I guess in maxwell they removed too much
22:06 karolherbst: so with volta they found the perfect middle ground
22:07 karolherbst: most modern shaders are like 30% int ops anyway
22:07 karolherbst: so...
22:07 karolherbst: who cares
22:08 RSpliet: Most OpenCL kernels (at least the ones I looked at, a completely arbitrary set of benchmarks from rodinia/parboil) are 30-50% int too
22:08 RSpliet: But that's pointer arithmetic
22:08 karolherbst: yep
22:10 HdkR: karolherbst: "fp only shaders have reduced perf" That's still not necessarily true. The instruction latency has reduced, so it'll be about the same :P
22:11 RSpliet: HdkR: instruction latency doesn't matter, throughput of an FPU is still 1 IPC presumably
22:14 HdkR: I mean, latency matters when you're calculating raw float ops per second, so..flops?
22:15 RSpliet: Nobody really defined what this latency actually means. But, this stuff is parallel. The latency is presumably the time between instruction issue and result. Other warps can issue instructions to the same FP in the meanwhile
22:15 RSpliet: Like with DRAM, throughput is what matters, not latency.
22:18 karolherbst: ahh cool, with the serial console I can actually see if nouveau gets loaded even though the system ends up not being usable.. nice
22:19 HdkR: RSpliet: But if you can issue one instruction per cycle and you have a latency of six cycles then you're going to fill a queue and stall somewhere in the pipeline?
22:25 HdkR: I guess I just know too much about how it works as to why it isn't really an issue :P
22:26 RSpliet: HdkR: the instruction having a latency of 6 cycles presumably means it takes 6 cycles to go through that pipeline. But if that pipeline is 6 stages, the other 5 can be occupied in the meanwhile.
22:27 RSpliet: The whole point of pipelining is to hide instruction latency
22:35 karolherbst: RSpliet: I expect that the pipeline depth could be smaller with volta as the alu is simpler
22:36 karolherbst: but maybe that doesn't matter
22:36 karolherbst: who knows
22:39 RSpliet: I don't expect it to matter too much. Deeper pipelines can help bring the frequency (and thus the throughput) up, which is why most CPUs have like 15(+) pipeline stages. Pipeline depth matters less for GPU performance than CPU performance, because of all the data parallelism :-)
22:39 RSpliet: If anything, simplifying the ALU means reduced power consumption on int operations, allowing them to clock the beast a little higher
22:39 karolherbst: if nvidia has shown anything with turing, that they can bring up the frequency no matter what :p
22:39 RSpliet: without overheating or exceeding the power budget :-)
22:40 karolherbst: turing has insanely high clocks
22:40 karolherbst: not sure if they broke 2GHz though
22:40 RSpliet: Highly optimised 12nm finfet, woohoo
22:41 karolherbst: I think the next gen will be brutal
22:41 karolherbst: that will probably be the "AMD can't do the same to us as they did to Intel" gen :p
22:42 fincs: I hope next gen doesn't cost an arm and a leg
22:43 karolherbst: it's nvidia...
22:43 RSpliet: Is NVIDIA going 10nm or smaller for that one?
22:43 karolherbst: RSpliet: I would be surprised if they go for anthing besides 7nm
22:43 karolherbst: why 10nm?
22:43 karolherbst: Nvidia doesn't do the same mistakes Intel does
22:43 fincs: Weren't they confirmed to go 7nm
22:44 RSpliet: I always had the impression GPUs lagged behind smartphones in terms of feature size a little bit
22:44 karolherbst: no
22:44 karolherbst: AMD is already at 7nm as well, no?
22:44 karolherbst: yeah
22:44 fincs: Yes they are
22:44 RSpliet: Oh yeah
22:44 karolherbst: RDNA is 7nm
22:44 karolherbst: GCN 5th gen was already 7nm
22:44 RSpliet: Heh, they caught up, nice
22:44 karolherbst: ahh no
22:44 karolherbst: that was 14nm
22:45 fincs: RDNA, aka "We renamed GCN in order to stop people asking "when are you going to replace GCN" :p"
22:45 karolherbst: well.. rdna is different
22:45 karolherbst: and more sane
22:45 RSpliet: Yeah just reading Ampère will be 7nm
22:45 RSpliet: Nice
22:45 fincs: But isn't it the same ISA
22:45 karolherbst: no
22:45 karolherbst: rdna is completly different
22:45 karolherbst: well
22:45 karolherbst: more different
22:45 fincs: Is it?
22:45 karolherbst: sure
22:45 fincs: I saw the document and it looked exactly the same as GCN but with additions
22:46 karolherbst: well..
22:46 karolherbst: the details are important
22:46 karolherbst: nvidias ISA also _looks_ the same since forever
22:46 karolherbst: but on the hw is completly different still
22:46 fincs: I understand they threw out their previous impl and made a new one
22:46 karolherbst: yeah
22:46 karolherbst: this allows for a lot of random goodies
22:46 RSpliet: Well, Turing/Volta is... I had the impression Maxwell and Pascal were uninspired :-P
22:46 fincs: But that's the actual impl
22:46 karolherbst: like changing amount of regs, etc...
22:47 karolherbst: RSpliet: yeah.. volta was the first big change
22:47 karolherbst: since tesla..
22:47 fincs: And I was surprised to learn Pascal is basically yet another hacked upon Maxwell
22:47 karolherbst: pascal is essnetially the same ISA as maxwell
22:47 karolherbst: even the encoding is the same
22:47 RSpliet: fincs: it's called evolution, rather than revolution ;-)
22:47 karolherbst: it's even mostly compatible
22:47 karolherbst: mostly
22:47 karolherbst: there are subtle differences
22:47 fincs: So from now on I'll call it Maxwell 3rd gen ;)
22:48 karolherbst: 2nd
22:48 karolherbst: :p
22:48 RSpliet: It's all marketing anyway
22:48 fincs: There's already a 2nd gen
22:48 karolherbst: all SM5X are really the same
22:48 karolherbst: SM52 added sqrt
22:48 karolherbst: ...
22:48 karolherbst: maybe someting else?
22:48 karolherbst: dunno
22:48 fincs: Also on the subject of shader stuff
22:48 fincs: TX1 has half float stuff
22:48 karolherbst: I think that's the only difference we have in mesa
22:48 karolherbst: ahh yeah...
22:48 karolherbst: we could actually do something about it at some point
22:48 karolherbst: but.. uggg
22:48 fincs: I wonder if there's any sane way to make it usable
22:49 karolherbst: all this simd in a reg is annoying
22:49 karolherbst: not really
22:49 fincs: GLSL doesn't really have the concept of half floats :p
22:49 karolherbst: except for texturing
22:49 fincs: There's this precision shit which seems to be a GLES exclusive
22:49 RSpliet: OpenCL perhaps?
22:49 karolherbst: you _could_ load 2x16 bit vals into a 32 reg directly
22:49 karolherbst: and skip the merging
22:49 fincs: According to emu devs, Switch games do in fact use half float stuff
22:50 karolherbst: fincs: the problem with SIMD in a reg is the (un)folding
22:50 karolherbst: the ALU is still mostly 32 bit
22:50 karolherbst: there are just some half precision ops with hi/lo flags
22:50 karolherbst: but.. uff
22:51 RSpliet: If there's GL or GLES extensions for it, I'm sure it can be made to work.
22:51 karolherbst: mhhh...
22:51 karolherbst: fincs: fp16 was it, right?
22:51 fincs: Yeah, 16-bit floating point
22:51 karolherbst: RSpliet: GL_AMD_gpu_shader_half_float
22:51 karolherbst: have fun :p
22:51 fincs: https://github.com/ReinUsesLisp/nxas/blob/master/table.h#L64-L79
22:52 RSpliet: NV_gpu_shader5 is mentioned in there
22:52 fincs: We even have some hwtests for those instructions
22:52 karolherbst: fincs: .. yeah.. that's not much
22:52 karolherbst: I am more concerned about codegens general state and optimizations and everything :/
22:52 karolherbst: codegen was never written with !32 in mind really
22:53 karolherbst: there are some 64 bit thingies, but... uff
22:53 fincs: I saw 64-bit add with constants isn't exactly codegenned right
22:53 karolherbst: yeah.. there are bugs
22:53 fincs: Missed opportunity for using c[] in the add instructions
22:53 karolherbst: and CL hits them regulary
22:54 fincs: Instead there are two movs + two adds which is a bit of a shame
22:54 karolherbst: really thinks that volta is a good opportunity to clean up a lot of the mess
22:54 karolherbst: fincs: well.. the alu is mostly 32 bit though
22:54 RSpliet: Yes, it'll be quite a major overhaul.
22:54 fincs: Volta/Turing go together, right?
22:54 karolherbst: ye
22:54 karolherbst: s
22:54 karolherbst: turing has some more anoying additions though
22:54 karolherbst: like uniform regs/predicates
22:54 fincs: But still... reclocking... argh
22:54 RSpliet: Borrow some code from the 4vec vieux "compiler" perhaps :-P
22:55 karolherbst:is not going to write a uniform value checker for codegen
22:56 karolherbst: we have some which would work exactly until the first loop
22:56 karolherbst: but ufff
22:56 karolherbst: *up to
22:57 karolherbst:needs more time
22:57 karolherbst: :D
22:58 fincs: We really need more docs and support :\
22:58 karolherbst: we need more people
22:58 fincs: And yeah, more people
22:59 karolherbst: I know tons of stuff I can work on, and I need others to tell them what to work on :D
23:00 fincs: I still don't understand why nvidia is the only company making this stuff difficult
23:00 karolherbst: they aren't really
23:00 karolherbst: at least not deliberatly
23:01 fincs: Then why are there still so many things unpublished, and the reclocking problem unsolved?
23:01 karolherbst: if, then they wouldn't ship tools like nvdisasm
23:01 karolherbst: fincs: ohh, not caring and annoying others deliberatly are still two different things
23:01 fincs: Meanwhile other vendors have full ISA docs and users can pick two different high performance open source drivers
23:02 karolherbst: fincs: well.. usually those two options are not equally open source
23:02 karolherbst: more like open source and propriatary software with source code attached
23:03 karolherbst: I would never call amdvlk open source
23:03 fincs: Isn't it MIT?
23:03 karolherbst: so what?
23:03 fincs: mesa is MIT too
23:03 karolherbst: that's besides the point
23:04 karolherbst: amdvlk is still developed like proprietary software
23:04 karolherbst: no community really
23:04 karolherbst: only "oh hey.. bi-weekly code drops" crap
23:04 karolherbst: got better though
23:04 fincs: Well
23:04 fincs: It used to be proprietary
23:05 karolherbst: it's not developed in the open is what I mean
23:05 karolherbst: AMD decides what happens there
23:05 fincs: And old habits die hard I guess :p
23:05 karolherbst: not a community
23:05 karolherbst: AMD can always say "ahh well, tough for older gens, but this patch increases perf on new gens by 5% and lowers perf on older ones by 50%"
23:05 karolherbst: and nobody can do much against it except.. downstream patches
23:05 fincs: Then users can revert :p
23:06 karolherbst: and then it becomes pointless
23:06 karolherbst: yeah well
23:06 karolherbst: then you hurt user with modern cards
23:06 fincs: Or fork or whatever
23:06 karolherbst: yeah.. well
23:06 karolherbst: or just create radv and not deal with this mess :p
23:06 fincs: I mean
23:06 karolherbst: valve agrees btw :p
23:06 fincs: I looked at radv source code and I liked it
23:06 fincs: Then I looked at amdvlk code and... I was in vtable hell :p
23:06 karolherbst: yeah well :p
23:07 fincs: Nvidia is also guilty of vtable hell btw
23:07 karolherbst: everybody is
23:07 karolherbst: even the kernel :p
23:07 karolherbst: seriously. vtables are better then if else ladders
23:07 karolherbst: *than
23:07 fincs: "It's not a vtable because this is C, not any of that C++ crap" yeah well :p
23:07 fincs: Yeah, tables of function pointers have their use
23:09 fincs: Anyway, the thing is
23:09 fincs: I sense a mix of lack of care, secrecy, or maybe even contempt
23:10 fincs: And this is all opportunity cost
23:10 karolherbst: yeah well..
23:10 karolherbst: it's still their choice
23:10 karolherbst: it's getting better, just super slowly
23:10 fincs: Of course
23:10 fincs: But still, if I hear someone wanting to build a computer and run Linux on it, I'll recommend AMD to them
23:11 fincs: I hope at least we get *anything* out of what was promised back in... December I think it was? Can't remember now
23:12 imirkin: stuff gets promised all the time
23:12 fincs: Even if it's only for Turing, I'm sure some stuff will be relatable to older gens
23:12 karolherbst: was there a promise?
23:12 fincs: There was supposed to be a GDC talk but of course that was cancelled
23:12 imirkin: i dunno, like in 2014 there was talk of greater cooperation
23:12 airlied: karolherbst: rdna really isnt that different an ISA
23:12 airlied: internally maybe, but the ISA was pretty close
23:13 karolherbst: airlied: well, but the encoding is new, no?
23:13 karolherbst: and I thought there was some substantial difference actualy
23:14 airlied: karolherbst: nope same encoding
23:14 karolherbst: imirkin: yeah... sadly the nv <-> nouveau community stuff is really lacking :(
23:14 fincs: The "substantial difference" is the impl :p
23:14 airlied: though I think it actually matches an earlier encoding
23:14 karolherbst: airlied: ohh, interesting
23:14 karolherbst: then I missremembered
23:14 airlied: it's like they reverted a bunch of vega changes
23:14 karolherbst: ahh :D
23:14 airlied: and went back to the older gen encoding
23:15 fincs: Was Vega good? I hear for some reason Polaris is more popular
23:16 airlied: vega wasn't great, lots of new design features that didn't quiet work
23:16 airlied: quite
23:17 fincs: I see
23:17 airlied: the next-gen shaders stuff is quite the mess even on navi
23:19 karolherbst:is still annoyed the entire runpm situation
23:21 fincs: I wish I could play around with the "next-gen" shader stuff
23:21 fincs: To me it sounds like they finally made the VTG part of the pipeline not suck
23:26 airlied: fincs: it's interesting, just seems to only benefit some use cases I guess we don't see in games :-)
23:27 fincs: There's also the fact that games don't really write their own graphics code anymore; everyone seems to be using all sorts of 3rd party engines
23:27 fincs: And well, engines tend to cater to the lowest common denominator in order to be compatible with all sorts of potatoes
23:27 airlied: tell that to doom eternal :-P
23:27 fincs: Yeah not saying there aren't exceptions
23:27 airlied: most of the NGG stuff seems to benefit more CAD things
23:28 airlied: and possibly styff doing a bit more TESS
23:28 karolherbst: what's the big change with ngg btw?
23:29 airlied: they merged shader stages
23:29 imirkin: one big happy vertex stage, right?
23:29 karolherbst: huh
23:29 karolherbst: how does that make sense
23:30 airlied: so vs/tcs run together
23:30 airlied: and tes/gs
23:31 imirkin: do you end up having to do primitive assembly by hand?
23:31 imirkin: oh, i guess not
23:31 imirkin: it just hands you the points from the tessellator and you evaluate them however you like
23:31 airlied: the biggest problem is transform feedback is hard, and ended up being quite broken
23:32 karolherbst: yeah.. sounds like a big issue with that
23:32 airlied: you can also merge vs/gs if no tess
23:33 karolherbst: I kind of see the point, I just fail to see the bigger benefit here.. you essentially serialize instead of running in parallel, no?
23:33 karolherbst: so the vs becomes a loop body in the merge shader or do I miss anything here?
23:33 karolherbst: *merged
23:34 airlied: yeah pretty much that, the main thing is it keep stuff on-chip
23:34 airlied: you can avoid writing out the VS output to a VRAM ring and then reading it back in a lot of cases
23:34 karolherbst: why has the VS output have to be a VRAM ring though
23:35 karolherbst: you could also just add a huge on chip cache and just not add a completly new way of doing shaders
23:35 airlied: I think the huge on chip cache was hitting issues as well :-P
23:35 karolherbst: probably :p
23:36 airlied: but I haven't found a great explaination for what the benefits of NGG are
23:36 airlied: mareko mumbles something about culling and needs early culling to see it
23:36 karolherbst: maybe it helps if your VS is just a passthrough or small one
23:39 karolherbst: uffff.....
23:39 karolherbst: this is terrible: https://gist.githubusercontent.com/karolherbst/7dc57407d385f90945033884340bd9ce/raw/84329155b44b982a5b7fabd2abe3e578b56e7b11/gistfile1.txt
23:39 karolherbst: one of my worst git bisects by far
23:39 imirkin: as if you didn't know you had some sort of DT binding issue ;)
23:39 karolherbst: yeah...
23:39 karolherbst: thanks I guess?..
23:39 karolherbst: will recheck the last bad/goods though
23:40 karolherbst: damn compile errors
23:40 karolherbst: seriously
23:40 imirkin: none of those look like obviously "it" though =/
23:41 karolherbst: yeah.. well, let me bisect a second time witht he last good/bads
23:41 karolherbst: maybe I messed something up somewhere
23:41 karolherbst: not compiling and not booting kernels do make things interestingly hard
23:44 fincs: Tsk tsk... committing broken code :p
23:57 imirkin: karolherbst: so what was wrong with 893e591b59036f9bc629f55bce715d67bdd266a2?
23:57 imirkin: i just looked at it, seems perfectly sane
23:58 imirkin: it does update the dtc compiler
23:58 imirkin: you marked the previous merge commit (1c715a659a16e193a23051ddff4becdad8e18ba1) as good