00:20cmarcelo: anyone that knows / maintains margebot: wondering if I just put marge in an odd state by pushing on an MR that marge pushed to.
00:21cmarcelo: (is there a "marge log" somewhere I can peek at in those cases?)
00:22airlied: cmarcelo: what it processing that MR?
00:22airlied: if so you should unassign it, and kill any pipelines it was running
00:25cmarcelo: oh, I missed canceling the pipeline (it was already failing)
00:25airlied: yeah I think if it get cancelled, marge might wake up before the 1hr expiry
00:25cmarcelo: it did wake up
00:25airlied: though it may not
00:25cmarcelo: thanks
01:03memleak: Hey I was debugging PREEMPT_RT latency spikes with amdgpu and radeon DRM drivers, I finally have a consistent stack trace now which is exceeding 30-50 microseconds (occasionally even spikes to above 200 microseconds)
01:04memleak: I discovered a tool called timerlat and it's been a huge help. trace: https://dpaste.com/42WHYK5EQ
01:04memleak: With `nomodeset` the spikes go away, radeon_ib_schedule radeon_cs_ib_vm_chunk and/or radeon_cs_ioctl must be the culprit
01:06memleak: 6.5.2 is my kernel version
01:15memleak: dpaste.com is acting up, new link: https://dpaste.org/G3y6y
01:22airlied: memleak: might be worth trying the amdgpu driver instead of radeon, though it might end being the same or worse
01:23airlied: that radeon trace shows it latency when it interacts with the hw interface
01:23airlied: which is kinda hard to avoid
01:23airlied: cik_gfx_set_wptr
01:23airlied: is mostly just a register write
01:30memleak: I'll get a stack trace with amdgpu one sec
01:42memleak: new trace: https://dpaste.com/FZRVR327P
01:42memleak: https://dpaste.org/7eSaD
01:43airlied: okay so yes you are hitting a hw register and hw takes time to react
01:43memleak: ok :)
01:43airlied: not sure there's much can be done about it
01:43airlied: mmio register reads/writes can stall the cpu, don't think there's any nice way around it
01:44memleak: well hey! that at least solves the mystery!
01:44memleak: and it's not user error!
01:52memleak: Thank you! :D
04:04memleak: hey airlied I just wanted to come back and say I'm sorry for possibly annoying the shit out of you years ago, I was really hyper, I talked too much and I was a handful for everybody.
04:05memleak: I was in junior high when I first started dabbling with X.org, anyways, thank you for everything.
08:14karolherbst: itoral: any clues? https://gist.github.com/karolherbst/407dea07c0d8fd9ff04b28d81823614f
08:15karolherbst: or rather ideas..
08:15karolherbst: apparnetly setting "V3D_DEBUG=" starts to trigger gpu memory faults
08:44itoral: karolherbst: doesn't make any sense to me... if you don't V3D_DEBUG at all, don't you see any mem faults?
09:11karolherbst: itoral: correct
09:11karolherbst: maybe something something VM placement or something
09:12karolherbst: I think the shader accesses OOB no matter what, but I'll debug more thouroughly today on what's going on here... I was able to get rid of that error by doubliing buffer sizes
09:12karolherbst: it's just _very_ confusing that setting that env var makes a difference :D
09:13karolherbst: the value of v3d_mesa_debug doesn't change, but I suspect something changes in that handling of that env var which changes something else? dunno.. it's just very odd :D
09:15itoral: yeah, I think that what happens is that for some reason when the envvars are set some allocation patterns change and that makes some OOB accesses land into valid memory addresses
09:21itoral: karolherbst: do these tests use global address intrinsics to read memory from a buffer that is then used to compute global addresses for other global reads/writes?
09:23itoral: I ask because if that is not happening then you can do a simple trick to identify the bad access(es): you drop all global reads/writes from the kernel (for example by not emitting the global intrinsic from the compiler) and then start putting them back into the kernel one by one until you see the OOB error again
09:25itoral: actually, you could also use this tactic even if you use the results from a read to compute the address for follow-up reads, since you are adding later global intrinsics progressively one by one
09:25itoral: one you know the first global intrinsic that causes the problem then we can just look at how the address is generated to figure out what is wrong
10:00karolherbst: itoral: yeah, it's just one sized buffer bound and then read/write to
10:01karolherbst: the kernel is really trivial
10:01karolherbst: it's literaly this:
10:01karolherbst: int tid = get_global_id(0);
10:01karolherbst: dst[tid] = ((1<<16)+1);
10:04karolherbst: I wonder if the test is slightly buggy... maybe I pass the buffer size into it and see what I can do with that
10:09karolherbst: mhhhhh
10:09karolherbst: itoral: I think it has something to do with how the kernel is launched
10:10karolherbst: there is an OOB read and if I cap the tid to the buffer size it doesn't cause those
10:10karolherbst: _but_
10:10karolherbst: the kernel also launches threads according to the buffer size so that should be impossible
10:10karolherbst: however.. CL has a cursed feature.. :D printf
10:11karolherbst: ahh I can't use it as it needs global atomics, which I haven't looked at yet
10:15karolherbst: huh....
10:23karolherbst: itoral: mhhh... maybe it's also something to do with me overclocking the rpi with +400 MHz...
10:23karolherbst: let me try without it first
10:25karolherbst: ahh no..
10:25karolherbst: ahh no..
10:26karolherbst: but higher CPU load does make it more likely at least.. yeah so something odd is going on
10:32karolherbst: I think the test is doing silly things...
11:03glehmann: how do the fdot_replicated opcodes work? can they have any number of output components or does e.g. fdot4_replicated always have a 4 component output?
11:06itoral: overclocking should't really have any impact
11:06karolherbst: well.. I'm quite close to the point where increasing the clock a bit further causes the CPU to do wrong things :D
11:06karolherbst: I've configured it in a way to not increase the voltage over the limit
11:07karolherbst: but yeah.. the setting is fine it seems and never caused any problems
11:07karolherbst: it clearly reads OOB but I have no idea why...
11:08itoral: so capping the TID fixes the issue? mmm...
11:08karolherbst: ehh.. no
11:08karolherbst: I just got (un)lucky
11:08itoral: ah :)
11:09itoral: is that write to dst the only global address access in the kernel?
11:09karolherbst: now I'm running "stress -c8" int the background and things are a bit more interesting
11:09karolherbst: https://gist.githubusercontent.com/karolherbst/0258aba25982ebf84001d09cd8e3423e/raw/602f91c1d69de6a079083f65c76ab5a16c0475af/gistfile1.txt
11:09karolherbst: yes
11:11itoral: interesting, in that case the only way we can have an OOB is that tid is out of bounds.... have you tried making the dst buffer larger and write the tid into it?
11:11itoral: then inspect te buffer when you trigger the mem faults and check it the tids are sane
11:11itoral: I don't quite imagine why they wouldn't be, but something weird is happening so...
11:12karolherbst: maybe something with the shader?
11:14itoral: can you dump the kernel with V3D_DEBUG=cs?
11:14karolherbst: yeah... something is odd
11:16karolherbst: doing this instead makes the fault go away: if (&dst[tid] < 0x70000 || &dst[tid] >= 0x80000) dst[tid] = ((1<<16)+1);
11:16karolherbst: at least it seems that way
11:16karolherbst: itoral: the odd thing is, the test passes no matter what, so maybe it's just more threads running than expected? Anyway, will dump the plain shader
11:24itoral: wht would that if fix anything? isn't dst bound to different addresses in various iterations? At least it looks like that from thaces you pasted
11:25karolherbst: yeah.. there are three pre allocated buffers in that test
11:25itoral: karolherbst: does v3d_csd_choose_workgroups_per_supergroup return a number other than 1?
11:26karolherbst: each 16384 elements big, once with int/int2/int4
11:28karolherbst: itoral: nah, that's always 1 it seems
11:28itoral: ok
11:29karolherbst: tried to denoise the V3D_DEBUG=cs output as much as possible: https://gist.githubusercontent.com/karolherbst/2a2b981e458d59119debeeaf2f9d3e01/raw/cbab9a385ecec66c2362c614f9ff8bc16e4d7c09/gistfile1.txt
11:30karolherbst: mhhh.. maybe I should add an option to disable that offset nonsense...
11:35karolherbst: huh
11:40itoral: karolherbst: what is the workgroup size and the dispatch size for that kernel?
11:41karolherbst: 256x64
11:41karolherbst: uhm.. 256 blocks and 64x1x1 block size
11:45itoral: ok, thats within the limits
11:47itoral: I don't see anything obviously wrong with the shader, but I'll give it a deeper look tomorrow
11:52karolherbst: huh!
11:52karolherbst: I think I know what's up....
11:52karolherbst: uhhhhhhhhhhhhhhhhhhhhh
11:54itoral: karolherbst: before I go, I noticed this:
11:54itoral: con 32x2 %71 = load_const (0x00010001, 0x00010001) = (0.000000, 0.000000) = (65537, 65537)
11:54itoral: @store_global (%71 (0x10001, 0x10001), %70) (wrmask=xy, access=none, align_mul=8, align_offset=0)
11:54karolherbst: nah... it has nothing to do with that
11:54karolherbst: I think I found it
11:54itoral: ah, cool
11:54karolherbst: dst contains the address of the wrong buffer
11:54itoral: what is it?
11:54karolherbst: not sure why
11:54itoral: interesting
11:55karolherbst: but it points to the int buffer for the int2 run sometimes
11:55karolherbst: and then it accesses invalid memory
11:55itoral: uh
11:55karolherbst: the test all passes as it just reuses the buffers from the first run
11:55karolherbst: :D
11:55karolherbst: yeah.. probably something like that
11:55karolherbst: not quite sure, but maybe there is a sync issue with the ubo0 thing I'm using
11:56itoral: ok, at least now we know what to look for
11:56itoral: I gotta go now
11:56karolherbst: yeah...
11:56karolherbst: I hope I'll figure out what's wrong
11:56karolherbst: ohhh....
11:56karolherbst: duh
11:56karolherbst: it's a user const buffer at 0 :)
11:56karolherbst: and I suspect v3d doesn't copy it
11:56karolherbst: or something something
11:57karolherbst: I'll dig a bit deeper
11:57itoral: ok, I'll be back tomorrow if you don't find it
11:57itoral: good luck! :)
11:57karolherbst: yeah...
11:57karolherbst: thanks
11:57itoral: np
12:16karolherbst: yeah... it's definetly that
13:08karolherbst: or maybe not? uhhhh
13:28zmike: mareko: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25180
13:37MrCooper: daniels: FWIW, sending plain-text e-mails with Thunderbird works mostly fine for me (still on 102 though, since one extension I use doesn't support 115 yet); I disabled mailnews.send_plaintext_flowed and set mailnews.wraplength to 0
13:39austriancoder: what is the official definition of a nir system value?
13:51mareko: austriancoder: a value that doesn't come from the user
14:00austriancoder: mareko: thanks
19:21lina: DemiMarie & others: You might be interested in this ^^ https://www.youtube.com/shorts/ToulgVAofw8
19:23ccr: \o\
19:30DemiMarie: lina: I will, though it will be 10PM my time then!
19:34lina: Not sure how long it will be yet, but hopefully not 5 hours ^^
19:35airlied: if a stream is less than 5 hours is it really a stream :-P
20:15Lynne: the amd dc code somehow got worse in 6.6-rc1
21:04anholt__: eric_engestrom: looks like pipelines in forks are busted? https://gitlab.freedesktop.org/anholt/mesa/-/pipelines/984424
21:38zf: hi! I'm encountering an assertion failure in Mesa, with radv, while trying to use a geometry shader that uses clip distances:
21:38zf: d3d10core\tests\x86_64-windows\d3d10core_test.exe: ../mesa/src/compiler/nir/nir_validate.c:1390: validate_var_decl: Assertion `glsl_type_is_array(type)' failed.
21:38zf: I hesitate to immediately file a bug, since this could be our bug, although the validation layers don't complain... but I'd appreciate if someone could give me some pointers where to look in the source?
21:39zf: since I am wholly unfamiliar with nir
22:14DemiMarie: zf: if the validation layers don’t complain and your program isn’t corrupting memory (use Address Sanitizer to check that), it’s a Mesa bug
22:15DemiMarie: That error message means that Mesa is generating invalid IR; it’s the equivalent of an internal compiler error in GCC, Clang, or MSVC.
22:15zf: well, it could always be a bug in the validation layers, i.e. a missing validation
22:16DemiMarie: is your program open source?
22:16zf: but I can certainly file a mesa bug on the assumption that it's safe
22:16DemiMarie: yeah
22:17zf: yes, this is actually something we're running into in the Wine self test suite
22:17DemiMarie: Ah, so that is why you have Windows-style pathnames :)
22:18DemiMarie: If you have a small reproducer that should help the Mesa developers fix the issue.
22:18zf: yeah, that's... the hard part
22:18zf: it's Vulkan, so "small reproducer" isn't really a thing
22:19zf: and Wine is not exactly a lightweight piece of software
22:19zf: if this was GL, I could record an apitrace, but I guess no such thing exists for Vulkan
22:19DemiMarie: It probably should
22:19Sachiel: gfxreconstruct exists
22:20Sachiel: you can go back a while and see if the same issue exists and if not, try bisecting
22:20DemiMarie: BTW geometry shaders are generally not very efficient, so if this shader comes from Wine then it is probably best to use something else
22:21zf: we're a translation layer, so we do need the geometry shaders :-)
22:21zf: thanks, I'll try gfxreconstruct
22:22zf: I don't see the issue with stock distribution Mesa, but I wouldn't be surprised if that's because it's built with NDEBUG
22:24DemiMarie: To quote the Arm Mali docs (or possibly an old version of them): “Most use-cases for Geometry shading are better handled by compute shaders.” and “Find a better solution to your problem. Geometry shaders are not your solution.”
22:24zf: trust me, I'm well aware of the problems with geometry shaders, but we don't really have a choice in the matter
22:25DemiMarie: It is actually possible to emulate geometry shaders using nothing but compute shaders.
22:25DemiMarie: AGX will need to do that because Apple hardware doesn’t support geometry shaders at all.
22:25Sachiel: not being in control of either the source of the shaders nor the driver, it'd be a ton of work to probably get a ton of weird failure cases
22:26zf: and worse performance
22:26DemiMarie: That said, I understand you not wanting to take that route.
22:28DemiMarie: Probably