06:37udovdh: Question about weird issue:
06:37udovdh: Upgraded to fedora 33 last week.
06:37udovdh: When screensaver activates after the timeout, and I wake up the screen afterwards, the display shows a smaller active area in the upper left corner
06:38udovdh: This makes the GUI fairly unusable
06:38udovdh: a large L-shaped area around the active area is black after loggin in
06:39udovdh: is this a gnome bug? gnome-shell? mutter?
06:39udovdh: or mesa? amdgpu?
06:39udovdh: xrandr does not show different info
06:39udovdh: what gui tool could show what the gui thinks is the size of the desktop?
06:40HdkR: arandr would likely pull the same information as xrandr
06:45udovdh: what would be the likely source of the problem?
06:50airlied: udovdh: control center should show it, likely a mutter bug
07:11udovdh: airlied, using arandr I can get rid of the problem by changing the resolution back and forth
07:12udovdh: mutter I will make it in the bugzilla for redhat
07:14udovdh: FWIW: https://bugzilla.redhat.com/show_bug.cgi?id=1922740
07:14udovdh: thanks HdkR and airlied
07:27jadahl: ugh, that bug is because xrandr --scale is used to mess around with the monitor configuration, which is not supported
09:12xexaxo1: yeah, I need owner permissions to finish updating waffle 1.6.2 on github. chadv jljusten can do you the honours?
09:39hverkuil: danvet: can you take a look at my email from Jan 12th? https://lkml.org/lkml/2021/1/12/217 This clearly fell through the cracks.
09:47danvet: hverkuil, Lyude has been looking at dp mst stuff recently the most
09:48danvet: Lyude can also merge the entire pile
09:48danvet: I can merge as the ultimate fallback, but I generally try to avoid this
09:49danvet: hverkuil, or you can have drm-misc commit rights and push yourself :-)
09:51hverkuil: danvet: I know I can commit, but since this touches heavily on non-CEC things (MST) it would be nice to have someone review or ack it :-)
09:51hverkuil: I'll check with Lyude.
09:52danvet: oh you're already on the list
09:52danvet: and yeah for some mst checking, Lyude is best
09:57emersion: oh, danvet is back \o/
09:58danvet: emersion, do I need to run away?
09:58hverkuil: danvet: sent a new email to Lyude, thank you for the referral.
09:59emersion: running away won't be enough, we'll hunt you down till you process that big email pile!
10:03MrCooper: run gtk4-demo in a Wayland session, open the "OpenGL → Transitions and Effects" demo, resize window, drool! :)
10:11linkmauve: MrCooper, thanks, just found and reported a bug with that. :)
10:11MrCooper: np :) a bug where, out of curiosity?
10:12linkmauve: Just a bogus warning: (gtk4-demo:1185180): Gdk-WARNING **: 11:06:08.758: GdkToplevelSize: geometry size (330, 936) exceeds bounds (1366, 768)
10:13linkmauve: I have two outputs, the second one (on which gtk4-demo is displayed) is 1920×1200, yet it warns when the window is bigger in size than only the first output.
10:13emersion: ah, just got that as well
10:13emersion: didn't realize it was because of my multi-output setup
10:13emersion: why does GTK even care about it…
10:15jadahl: linkmauve: that warning was torched iirc
10:16linkmauve: Ah, I downgraded to the stable release of gtk4 at some point, maybe I should go back to master. :)
10:16emersion: ah, cool
10:16jadahl: emersion: added "bounds" to one day be able to implement https://gitlab.freedesktop.org/wayland/wayland-protocols/-/issues/17 but the warning for when that "bound" was not respected was useless thus removed
10:21baedert: The warning is still in gdk_toplevel_size_validate()
10:25jadahl: i thought it was removed?
10:26jadahl: oh, it was just a comment about we should be removing it :P
10:34udovdh: again about that mutter bug:
10:34udovdh: They write:
10:34udovdh: It's likely either because mutter gets confused by the use of xrandr --scale, it doesn't do that itself, or Xorg not being able to handle xrandr --scale in combination with power save mode changes. When it unlocked, it takes a second or two to enable power saving mode (the reason why you need to wait for a bit until it reproduces), and when turning the monitor on again, things are not restored properly.
10:34udovdh: If mutter is made to try to restore (it gets such an event from Xrandr), it will not try to change any scaling factor; I don't know if Xorg itself do.
10:35udovdh: So why did it work 100% OK in fedora 32?
10:35jadahl: why are you asking that here and not in the bug?
10:37udovdh: I did but I wonder why they act like it could not have worked at all, not even in fedora 32
10:37udovdh: could it be that mutter still does the same as before but that due to changes elsewhere this fails?
10:37udovdh: or would mutter be the sole cause?
10:39udovdh: would there be a way to work around the problem by running xrandr each time the screensaver is deactivated?
10:39udovdh: i.e.: automagically?
10:39pq: udovdh, FWIW, "they" is jadahl :-)
10:39udovdh: yes, redhat/fedora/etc
10:40jadahl: udovdh: you're talking to the same person that responded to your bug report..
10:40udovdh: yes, I see.
10:41jadahl: anyway, you're using 'xrandr --scale'. mutter doesn't handle this, so if anything told mutter "please reconfigure yourself", then it would do so without changing any "scale"
10:41udovdh: but I am not completely aware of the ins and outs of the graphics stack that is xorg. Therefor I was aking the people here to share insights in causes, ways to work around etc
10:41udovdh: as usually knowledge is lurking here....
10:42udovdh: jadahl, yes but it worked FINE in Fedora 32 and before.
10:42jadahl: not unlikely due to luck
10:42udovdh: how come if we set the scaling only on startup?
10:42udovdh: (actually: first logon)
10:43udovdh: 200% scaling gives stuff that is too big.
10:43jadahl: mutter tries to handle X11 configuration using RANDR. when you try to do so yourself from the outside, mutter tries its best to just accept that, but when it needs to configure things again because something changed (e.g. hotplug), it'll overwrite whatever was done before to not get stuck in a broken state
10:43jadahl: so one way it could have stopped working is that something looking like a hotplug started being emitted
10:43udovdh: we have no option for fractional scaling
10:43udovdh: so we have to use `xrandr --output HDMI-A-0 --scale 1.25x1.25`
10:44pq: ISTR proposing some Xorg patch to make more hotpluggy events a year ago...
10:44MrCooper: the amdgpu DC code is known to generate spurious hotplug events around DPMS in some cases, in particular with HDMI connections
10:44udovdh: this implies that fedora does not support 4k monitors or hidpi screens i general that well
10:45pq: or two years ago? three? it's all a blur
10:45udovdh: we use hdmi as the board only has hdmi; monitor also has dp
10:46jadahl: udovdh: that is true, mutter on x11 doesn't support fractional scaling. manipulating xrandr yourself means you need to handle working against mutter trying to configure things itself. the way X11 works is more or less that you have to "override" whatever mutter tried to do on a hotplug event, i.e. run your "--scale 1.25x1.25" things again
10:46jadahl: MrCooper: that would explain it
10:47udovdh: how can I do this automagically?
10:47udovdh: i.e.: when I unlock the screensaver?
10:47jadahl: run some script slightly after every hotplug
10:47udovdh: it apepars that simply fiddling with the resolution using arandr restores the OK situation
10:47udovdh: how do I 'see'/detetch hotplug?
10:48MrCooper: udovdh: which GPU is that?
10:48jadahl: its a udev event
10:48udovdh: amd 3400g
10:48udovdh: AMD Ryzen 5 3400G with Radeon Vega Graphics
10:48udovdh: udev. ok, so an udev rule could work?
10:51udovdh: simply runnign arandr and clicking on the 'ok' thingie (green V) restores situation OK
10:51udovdh: We see udev events:
10:51udovdh: KERNEL[175812.622173] change /devices/pci0000:00/0000:00:08.1/0000:0a:00.0/drm/card0 (drm)
10:51udovdh: UDEV [175812.629878] change /devices/pci0000:00/0000:00:08.1/0000:0a:00.0/drm/card0 (drm)
10:54udovdh: https://stackoverflow.com/questions/5469828/how-to-create-a-callback-for-monitor-plugged-on-an-intel-graphics might be helpful here
11:01udovdh: hmmm `cat /sys/class/drm/card0-HDMI-A-1/status` yest for xrandr it is HDMI-A-0....
11:05udovdh: appears to work, I see events but display remains OK
11:05udovdh: needs further testing of course
11:05udovdh: so THANKS for the help thus far!
11:06udovdh: We simply run `/usr/bin/xrandr --verbose --output HDMI-A-0 --auto` and all is OK
16:31jekstrand: flto: I added a patch to my common dispatch MR which drops the struct zero-init. If you'd like to review that patch, I'll squash it in. Also, I'm still missing review/ack on the turnip bits.
16:37jekstrand: dj-death: There's 2-3 more patchs on top of the common dispatch MR that need review.
16:45Venemo: is there a way to silence SPIR-V warnings in a debug build?
16:45jekstrand: We could add an ENV var, I guess.
16:45Venemo: freaking CTS is full of them
16:47jekstrand: Yeah, it is.
16:47jekstrand: There's a GLSLang bug or two that trips them regularly
16:48jekstrand: We should probably just fix GLSLang.....
16:57dj-death: jekstrand: was it the zero init that made things slow?
16:57bnieuwenhuizen: on radv?
16:58bnieuwenhuizen: no we were hashing the complete descriptor set layout
16:58bnieuwenhuizen: including the vk_object_base
16:58bnieuwenhuizen: which blew up when you put a pointer in it
16:58jekstrand: dj-death: Could you look at "turnip: Drop some legacy wrappers in favor of common code" and review the added helpers too.
16:59jekstrand: flto acked the patch but it adds common code so I'd like more than an ack
16:59jekstrand: I think we're getting very close to having everything reviewed. \o/
16:59jekstrand: But, as hakzsam requested, I'll wait for bnieuwenhuizen to land the RADV hashing fix first.
17:00bnieuwenhuizen: jekstrand: I'll also take a last look over some of the generators & the final patches today and then rb everything except v3dv/lavapipe
17:00hakzsam: jekstrand: yes, please. I would like to launch a CTS run to be sure everyting is fine
17:00jekstrand: bnieuwenhuizen: Thanks! That'd be awesome.
17:00jekstrand: hakzsam: Feel free.
17:00jekstrand: hakzsam: I'd like to land this week if I could just so we can all move on with life
17:00jekstrand: hakzsam: But I also don't want to screw anyone up. :-)
17:01hakzsam: once !8809 lands, I launch CTS :)
17:04bnieuwenhuizen: marging it now
17:18bnieuwenhuizen: of course I can start CTS by just manually rebasing, and 9 min for a CTS run is really nice :P
17:18hakzsam: 9 min? got a new CPU? :)
17:19bnieuwenhuizen: hakzsam: https://lists.freedesktop.org/archives/mesa-dev/2021-January/224850.html :)
17:19hakzsam: bnieuwenhuizen: oh, so it's really much faster than our runner?
17:20bnieuwenhuizen: about 25% less time or so. AFAIU mostly due to larger batch sizes
17:20hakzsam: wow, I will try it :P
17:23hakzsam: dcbaker: btw, this week is -rc4 or final release?
17:23jekstrand: hakzsam: Just pushed a rebased version of the MR
17:24hakzsam: but if bnieuwenhuizen can run CTS in 9 minutes, I'm out of game :P
17:24bnieuwenhuizen: hakzsam: CTS seems ok on navi21 here
17:24bnieuwenhuizen: (running github master)
17:27hakzsam: the remaining CTS time is expected here, I can confirm the issue is fixed
17:27jekstrand: Intel CI got an upgrade over the week-end and it's not picking up my branches so merging may get held up by that a bit.
17:33anholt: bnieuwenhuizen: for me, manual -j is pegging at full occupancy with a correct deqp count, up to -j72.
17:34bnieuwenhuizen: hmm :(
17:34bnieuwenhuizen: maybe it gets confused because this is a NUMA machine?
17:35bnieuwenhuizen: was yours?
17:35anholt: I'm on dual 36-thread xeons
17:35anholt: were you using "cargo run" by chance (debug build)?
17:36anholt: trying to come up with how we could possibly not keep up with spawning deqps
17:37jekstrand:wishes he could run on dual 36-thread xeons. :-(
17:38anholt: it's a delight
17:38jekstrand: That's my #1 reason for wanting an Intel discrete GPU. :)
17:38anholt: though when you run out of 192gb of ram, it's kind of distressing.
17:40bnieuwenhuizen: anholt: no, just plain deqp-runner after a crago install deqp-runner (and then adding the cargo bin dir to the path)
17:41jekstrand: craftyguy: Have you looked at anholt's deqp-runner? If it's as awesome as people are saying, it might save us some CI time.
17:41jekstrand: craftyguy: Not to add more things to your ToDo list. :-/
17:42craftyguy: no, this is the first I've heard of it (I think?), got a link to it?
17:42bnieuwenhuizen: jekstrand: I think the timing thing is mostly increasing the batch size. If you were using the old runner that is also easy to fix if you can't migrate right now
17:43jekstrand: bnieuwenhuizen: We're not using "the old runner" either. We've got some thing janesma wrote eons ago, I think.
17:43jekstrand: It works fine, last I knew, but it'd be nice if we could roll less of our own infrastructure.
18:07Lyude: hverkuil: oh cool! I will try to get a look at the MST stuff today
18:09Lyude: imre: btw-I'm guessing that those patches you sent are about the NULL deref you were seeing? (if not, I was planning on taking a look at that today)
18:10imre: Lyude: yes, should fix an i2c-adapter NULL deref problem
18:11Lyude: imre: cool, thank you for the fixes! I'll make sure to review them today
18:11imre: ok, thanks
18:11anholt: bnieuwenhuizen: got a nice clean fix for the cpus-1 issue now.
18:12anholt: when you saw low deqp counts, were you by chance using a test list such that jobs * 500 > test count?
18:34anholt: bnieuwenhuizen: that should be fixed too in 0.5.1, hopefully your runs will be nice and busy now.
18:35jekstrand:assigns common dispatch to Marge. \o/
18:35bnieuwenhuizen: no, full CTS
18:35bnieuwenhuizen: aka 788k tests or so
18:35bnieuwenhuizen: (side note: guess we'll be hitting 1M in a year or so? :P)
18:36keithp: it happens
18:36anholt: jekstrand: thank you so much for cleaning up dispatch!
18:36jekstrand: anholt: YW
18:36jekstrand: anholt: It needed to be done so badly
18:36Venemo: jekstrand: can you help me out with a NIR validation issue? we added a few new intrinsics, but I don't understand why they don't pass validation
18:37jekstrand: Venemo: What's going on?
18:37jekstrand: Venemo: I'm going to take off in a few minuts to acquire groceries and lunch but feel free to type at me.
18:37Venemo: jekstrand: sure, enjoy your groceries
18:37Venemo: the issue is this:
18:37Venemo: error: src->ssa->num_components == num_components (../src/compiler/nir/nir_validate.c:208)
18:38Venemo: the new intrinsic loads N dwords from a given src address
18:39jekstrand: Venemo: Right. you probably want your address source to have an explicit (non-zero) number of components or -1 if you want it to be unvalidated.
18:39jekstrand: We use -1 for derefs because they can be anything
18:40Venemo: for example, this is what it looks:
18:40jekstrand: But if this is AMD-specific, maybe an explicit number of components
18:40Venemo: vec4 32 ssa_108 = intrinsic load_smem_gcn (ssa_106, ssa_107) (4, 4) /* align_mul=4 */ /* align_offset=4 */
18:40jekstrand: Venemo: You probably just have src_comps wront in nir_intrinsics.py
18:40Venemo: ssa_106 is vec2, and ssa_107 is vec1
18:41jekstrand: Venemo: What are the semantics of the intrinsic? Why is the first source a vec2? Is it supposed to be?
18:41Venemo: the py has: src_comp=[1, 1] <- is this wrong,t hen?
18:41jekstrand: Venemo: I don't know if that's wrong. Is the first source supposed to be a vec2?
18:43Venemo: err... I see your point. I'll check
18:45Venemo: yes, it's supposed to be 2
18:45jekstrand: Ok, then you want src_comp=[2, 1]
18:46Venemo: that's better, thanks jekstrand
18:47Venemo: jekstrand: if the src size can be anything, then the comp should be -1?
18:47jekstrand: Venemo: Ish
18:48jekstrand: Venemo: 0 means "must match instr->num_components" which is typically used for vectorized things like loads and stores where the instruction has a "width" of sorts.
18:48jekstrand: Venemo: -1 means "don't validate" so it can literally be anything and there's no checking. You shouldn't use this unless you really need it.
18:49jekstrand: Venemo: We use it for pointer type things because we have so many different nir_address_modes that it would be kind-of nuts to try and solidly validate it and when you do a load, for instance, it isn't going to matc instr->num_components because that's the number of components to load.
18:49jekstrand: Venemo: If you know how many components, set the actual number.
18:50Venemo: jekstrand: I have 2 new intrinsics which can load N dwords, and 1 new intrinsic which can store any dwords
18:50Venemo: "N" means "any" too
18:51jekstrand: Venemo: Then you want the source or destination (wherever the data goes) to be 0 and the "address" bits to be an actual number.
18:51Venemo: I thought -1 means any number of components
18:51Venemo: and 0 means it must mutch
18:52jekstrand: Venemo: -1 means "don't validate" and 0 means "must match instr->num_components" which is what you want for the data in a vector load/store
18:52Venemo: right, ok
18:52clever: i'm trying to write some drm based tests for tearing, and i'm wondering, which is more effective, sliding some stripes left/right, or blinking the whole screen between 2 different images?
18:53imirkin: clever: probably not giving someone epilepsy if things are working would be ideal
18:54clever: heh, i'm not sensitive, and that blinking case even made my head hurt a little
18:54imirkin: i'd be in favor of vertical stripes sliding to the right/left
18:54ccr: I've used a video with sliding vertical white bar on black background
18:54clever: and in some setups (blinking between blue and green), i instead just saw a single color (the mix of the 2)
18:55imirkin: "run test case. if you wake up in the hospital, you know it worked"
18:55clever: i also want the actual pageflipping to be as low cpu usage as possible, so i'll pre-generate maybe a dozen frames of the stripe animation, and then just flip thru them
18:56imirkin: clever: you could (ha ha) be clever about it
18:56imirkin: and just have one image
18:56imirkin: that is super-wide
18:56clever: ive already driven the 2d subsystem baremetal, and know the hw is capable of that
18:56imirkin: and just adjust the sliding window over that wide image
18:56clever: it can also pan, but i dont know how to do that with the drm api
18:56clever: one sec
18:56imirkin: you'd have to create new framebuffer objects, but could use the same backing image
18:56imirkin: there's a crtc_x iirc? something like that
18:57clever: imirkin: this is how i create each frame: https://github.com/librerpi/rpi-tools/blob/master/utils/drm-utils.cpp#L31-L69
18:58imirkin: i don't remember precisely how it works
18:58clever: and then i just loop over every pixel, and fill in an image
18:58clever: the part i dont know, is how to make 2 framebuffers, from the same chunk of ram, in the drm api
18:59imirkin: i guess you have to use drmModeAddFB2 to get the offset functionality
19:00imirkin: aha, right
19:00imirkin: the crtc_x stuff is for setting planes with plane offsets and whatnot
19:00imirkin: which is not at all what we want here
19:00clever: at the baremetal level, the 2d subsystem just wants a list of: phys addr of image, w/h, stride (bytes per row), dest xy, and optionally a dest wh
19:00imirkin: drmModeAddFB2 will let you do what you need
19:01clever: i could get such motion out of the hw, by just varying the dest xy, or by making it 1 stript too wide (in ram), and varying the phys addr of the start
19:01vsyrjala: src_x is what you want for panning inside a large fb
19:02imirkin: vsyrjala: where do you supply that?
19:02clever: extern int drmModeAddFB2(int fd, uint32_t width, uint32_t height, uint32_t pixel_format, const uint32_t bo_handles, const uint32_t pitches, const uint32_t offsets, uint32_t *buf_id, uint32_t flags);
19:03vsyrjala: atomic/setplane ioctl
19:03vsyrjala: can't do it with page flip ioctl iirc
19:03imirkin: clever: right, so you just set offsets to be the x offset
19:03vsyrjala: offsets may not be supported
19:03vsyrjala: it's not something anyone really does
19:03vsyrjala: apart from planar formats
19:03vsyrjala: eg. 915 just rejects offsets != 0
19:04imirkin: well then. nevermind.
19:04clever: vsyrjala: at least at the hw level, the gpu is capable of offsets, but i dont know about the driver level...
19:04clever: in a planar format, it just takes a list of start addresses, for each plane
19:05clever: hmmm, so if i create one large buffer-object, and then i create say 10 framebuffers over it, using drmModeAddFB2
19:05clever: then i can use drmModePageFlip to swap between each framebuffer, showing a different section of the buffer-object?
19:06vsyrjala: if the driver allows it. i suspect many don't, or have bugs since there's no userspace that actaully does this
19:06imirkin: clever: apparently some drivers don't support offsets with arbitrary formats
19:07imirkin: which seems odd, since it's just adding? but whatever
19:07clever: i'm also trying to deal with 3 different drivers for the same hw, over 2 kernel versions
19:07clever: because i want to see which api has the least problems with tearing
19:09clever: for context, i'm testing the rpi drivers
19:11vsyrjala: if you use kms apis correctly you won't get tearing. or the driver is broken if you do
19:11vsyrjala: unless you use setcursor/async flips that us
19:11clever: thats why i'm doing a first pass with the raw kms api as root, to confirm the driver is not broken
19:12anholt: clever: note that due to implicit syncing, if you're doing gpu rendering to that shared BO that you're flipping to, you're going to have a disappointing results.
19:12clever: i have experience tearing under youtube+chromium and mpv, but there are too many layers in that stack
19:13clever: anholt: which forms of gpu rendering? without a desktop-like seperate of "gpu" and "cpu" ram, i would expect it to need less flushing
19:13anholt: the vc4 gpu.
19:13anholt: aka anything with opengl
19:14clever: prior to the pi4, anything the v3d/opengl did, was capable of writing either to uncached ram, or via the VPU L2 cache (128kb) to ram
19:14anholt: if you're doing anything with your buffer other than pure cpu access or opengl, you're also going to be disappointed because other things on the rpi platform don't participate in syncing
19:14clever: and the HVS also had the option to read by either path, so there is no need to discard caches, just ensure v3d has flushed the last macroblock
19:15clever: but the pi4 changes things up (due to 8gig of ram support), by adding an mmu between v3d and ram, and i dont know how that impacts the caching
19:15anholt: you should really not worry about caching.
19:16anholt: other than the usual "don't read back gpu bos because perf will suck"
19:16clever: yeah, seen that on the forums already, somebody asking why reading was so much worse then writing
19:18clever: assuming the kms layer was perfectly tear-free, what could cause mpv under X to tear?
19:18keithp: not flipping
19:19anholt: or mpv rendering to BOs before getting the notification that the bo has been swapped away from.
19:19keithp: urf, that would also be bad
19:19clever: i was testing with 3 different rendering backends, gpu, sdl, and xv
19:19anholt: or mpv flipping before some rendering is done (such as by providing a bad in fence, or unfinished cpu-side rendering)
19:19clever: no combination of any variable entirely eliminated tearing
19:19anholt: or by mpv setting the wrong flags on the flip to request tearing.
19:19anholt:can never remember the name of it
19:20clever: include/libdrm/drm_mode.h: * Flag DRM_MODE_PAGE_FLIP_ASYNC requests that the flip happen
19:20clever: anholt: this one?
19:20keithp: X might also queue a blt at vblank, which would get stalled in the kernel to occur after rendering
19:20anholt: you would need to check the docs for what that one does.
19:20clever: but when running under X, which api is it even going to use, to issue the flips and get the BO's on-screen? does it vary with the -vo backend?
19:21anholt: clever: oh, and also if you're doing all this under X, anything other than full screen will definitely tear.
19:21keithp: anholt: if it's using GL, it should be using present, which will queue the blt at vblank time
19:21keithp: which should avoid tearing
19:21anholt: keithp: if you're blitting, you're losing.
19:21keithp: true 'nuf
19:22keithp: but it should work
19:22anholt: because you're trying to be clever and queue the blit at the right time, but you're on the same ring as all the other work so when you submit doesn't really matter.
19:22clever: anholt: why would fullscreen make it better? just one BO fired directly at the gpu, and swap the whole pointer out?
19:22anholt: clever: fullscreen is the only way to page flip, which is the only way to not tear.
19:22clever: the 2d subsystem in the gpu is capable of doing the composition on its own, but neither X nor wayland tries to use it
19:22anholt: there is no 2d subsystem in the gpu.
19:22keithp: it should block until the rendering is done before queuing the blt, but that may not actually work
19:23clever: anholt: by "2d subsystem", i'm refering to the HVS/PV blocks
19:23anholt: clever: that's the scanout engine.
19:23keithp: and, yes, we should use it
19:23keithp: it's really hard though
19:23anholt: there's writeback, so you *could* use it as a bad 2d gpu, but...
19:24clever: anholt: from baremetal arm, i have used the HVS to animate ~170 sprites at once, in the drm lingo, that was basically 170 framebuffers? on a single crtc, sharing one bo
19:24clever: id consider that a 2d-only gpu
19:25anholt: ok, that's just a usage that doesn't match with the usage that is typical in Linux graphics.
19:25anholt: (usage of the term)
19:25clever: which term?
19:25anholt: 2d gpu.
19:25anholt: when you say "2d", people think fixed function blit engines from memory to memory
19:26clever: maybe i'm thinking in terms of 8bit era computers
19:26anholt: not scanout engines reading from memory as overlay planes on the screen.
19:26clever: it can essentially composite multiple 2d images, including scaling on either axis, pixel format conversion, and alpha blending
19:26anholt: having written the hvs implementation, yes. I know.
19:26clever: but yeah, it does it one scanline at a time, as the output hw (vec/hdmi) needs it
19:27clever: more explaining for others like keithp
19:27clever: i did see your name all over the git logs
19:27anholt: anyway: yes, xorg fails to use overlay planes. so do wayland compositors in general, unfortunately.
19:27clever: from what ive heard, the kms layer limits you to 16 overlays?
19:27keithp: I would love to spend some time fixing that
19:28clever: because x86 gpu's cant do more
19:28Lyude: wait, you can only have 16 universal planes? :(
19:28anholt: we just picked some arbitrary number because we had to have an arbitrary number
19:28keithp: Lyude: iirc, it's just a fixed number (no actual limit)
19:28keithp: because kms
19:28clever: anholt: what about ancient api's like xvideo, which was meant to do an overlay, can X map that to a kms overlay, and page-flip it independantly of the rest of the desktop?
19:28keithp: every time we try to describe how hardware works, we get it wrong
19:29keithp: clever: yes, Xv *can* do that
19:29Lyude: ah, because eventually nouveau will be supporting 32 planes + 1 cursor plane per head
19:29anholt: clever: xv hasn't done overlays since compositors showed up.
19:29anholt: and it's a disaster api because it's based on cpu rendering
19:29keithp: "sort of"
19:29clever: anholt: so when a compositors come into play, all of the BO's get handed off to a compositor program? and then that feeds a final BO back to X?
19:30clever: how would i debug what Xv is actually doing, and if a compositor is in the mix?
19:30keithp: you should explore if it works without a compositor first
19:30clever: let me boot the system back up...
19:30keithp: because a compositor means your application has no control over tearing
19:31anholt: also ensure that you're full screen.
19:35clever: round 1, mpv, windowed, -vo gpu, fkms, i can see 4 seperate tears at once
19:35clever: round 2, same config but fullscreen, still tearing, between 1 and 3 tears
19:36anholt: oh, I would expect fkms to be full of loss
19:36clever: *reboot to kms*
19:36clever: fkms is having to fake the vsync irq, by triggering a fake transaction on SMI
19:37keithp: that's fantastic
19:37clever: but i would expect the firmware to at least try and post the real update on the next vsync, but without source, i cant say for sure
19:38clever: anholt: ok, real kms now, the entire desktop BO is offset by 1 pixel to the right
19:38clever: the right most column of pixels is visible on the left edge of the screen
19:38anholt: never heard of that one
19:38keithp: that's a cool bug
19:39keithp: 'work around that in software'
19:39anholt: @mripard :)
19:39clever: round 3,
19:39anholt: (I find it totally believable, though, given the touchy fifos in the hvs/pv pipeline)
19:39clever: anholt: oh, and the entire monitor randomly blanks when i move the mouse
19:40dcbaker: hakzsam: there's still a bunch of blockers opened, so I think -rc4 (unless they all get closed in the next couple of days) :)
19:40clever: round 3, mpv, -vo gpu, windowed, still 3-4 tears visible, but it tears a lot less often
19:41clever: round 4, fullscreen now, same thing, a lot less tearing, but it can still tear
19:42keithp: sounds like fencing is not working then
19:42keithp: the flip shouldn't be queued until the rendering is complete
19:42clever: round3 and round4 also have a new bug
19:42clever: on startup, it plays 2 frames, then hangs for ~4 seconds, then plays normally
19:43keithp: that's using GL?
19:43clever: in theory, nothing, its just sitting at a terminal emulator under lxde
19:43clever: but i can murderize lxde...
19:44keithp: no, what rendering backend is mpv using?
19:44clever: it was using `-vo gpu`, not sure what that uses internally
19:45keithp: ah. well, sounds like it's using 'present' incorrectly (which isn't hard; present uses 'absolute' times, which is kinda a disaster for many apps)
19:45clever: one sec while i re-arrange things...
19:45anholt: are you on raspbian?
19:46anholt: I think they start an xcompmgr by default, which will cause tearing
19:46keithp: easy enough to kill that
19:46clever: thats why i'm switching to a naked X server
19:46clever: hmmm, but the hdmi shuts off when i run `X :0 tty7` ...
19:46clever: (II) modeset(0): Initializing kms color map for depth 24, 8 bpc.
19:47clever: root@pi400:~# echo 255 > /sys/module/drm/parameters/debug
19:47clever: [ 651.896879] [drm:drm_ioctl [drm]] pid=2091, dev=0xe280, auth=1, V3D_SUBMIT_CL
19:47clever: [ 651.897337] [drm:drm_ioctl [drm]] pid=2091, dev=0xe280, auth=1, V3D_WAIT_BO
19:47clever: logs do occur, when i run an xterm, but nothing is actually visible...
19:48clever: ah, found it on `chvt 2`, weird
19:50clever: round 5, naked X server, mpv -vo xv -fs, real kms, still tears
19:51keithp: xv will tear
19:51keithp: there's no vsync with tha
19:51clever: libEGL warning: DRI2: failed to create dri screen
19:51clever: libEGL warning: DRI2: failed to create dri screen
19:51clever: -vo gpu no longer works
19:51keithp: well, that'll take some debugging
19:52clever: -vo sdl still works...
19:52clever: and still tears
19:52clever: libGL error: failed to load driver: vc4
19:52clever: [vo/sdl] Using opengl
19:56vsyrjala: is it sdl1 or sdl2? sdl1 i think still enables backing store by default which is going to suck. not sure it affects sdl+gl though. anyways, i always use 'X -bs' to make sure this doesn't happen
19:56clever: pi@pi400:/media/videos/4tb $ ldd /usr/bin/mpv | grep SDL libSDL2-2.0.so.0 => /lib/arm-linux-gnueabihf/libSDL2-2.0.so.0 (0xb6d3b000)
19:56clever: vsyrjala: looks like its linked to SDL2
19:59clever: vsyrjala: X -bs, and mpv -vo sdl -fs, still tears
21:31karolherbst: if tearing is annoying: don't use X. Sorry, but that's the way it is
21:34clever: karolherbst: looking at it more as a software challenge then a real problem
21:34karolherbst: clever: it's not fixable
21:35karolherbst: sure, you can throw a lot of tricks at it to make it tear less, but ultimately you have to choose between oversyncing or performance
21:35vsyrjala: fullscreen gl/vulkan apps are fine with x
21:36clever: in the case of the rpi hw, its capable of compositing multiple 2d images together dynamically as it generates scanlines
21:36karolherbst: yeah okay, I'd say for fullscreen apps it's fixable ;)
21:36clever: so in theory, each app can issue its own atomic page-flip call to xorg
21:36clever: and X would then collect the most recent version of each frame, and issue them all to drm
21:36karolherbst: problem is, X doesn't know the concept of a frame
21:37karolherbst: but sure, rewriting X and changing the API could help
21:37clever: and rather then fix that, you could maybe jump ship to wayland?
21:37karolherbst: but that's why we have wayland
21:37karolherbst: exactly this
21:37clever: but from what ive heard, wayland doesnt even try to use drm composition
21:37clever: it throws the whole 3d core at the problem
21:37karolherbst: the compositor decides
21:37airlied: yeah gnome-shell wayland doesn't do planar composition at all
21:37karolherbst: you don't have to use GL for compositing
21:38karolherbst: which.. I'd say is a drawback with wayland as each compositor needs to do the same thing, but often they don't because it's a lot of work to get everything right
21:40clever: there are ~4 different graphical api+driver combinations you could potentially be using on the rpi
21:40clever: the original one on launch-day, was the mailbox framebuffer
21:41karolherbst: it's all about choice :p
21:41clever: you configure a framebuffer of a given virtual w/h (sometimes double the real height), and then use a mailbox function to set the xy pan within it (usually panning to the top or bottom half)
21:42clever: android for example, uses that for pageflip, when other drivers are missing (at least it did, when i read its source last)
21:42karolherbst: airlied: but yeah.. I'd also wish that gnome-shell would be a bit more efficient than it is atm. It's a huge CPU sink sadly :/
21:42clever: karolherbst: for extra fun, the rpi firmware had a bug, when you change the panning parameters, it would clear the framebuffer! lol
21:43clever: that led me in cicles for a month, until i just gave up
21:43clever: then i stumbled upon a forum thread of somebody else that had figured it out, a year later
21:43bnieuwenhuizen: do we know where all the CPU time is going?
21:43karolherbst: bnieuwenhuizen: at least I don't
21:43vsyrjala: .js ?
21:44karolherbst: bnieuwenhuizen: the biggest issue is also that intels CPU freq scheduling is bad
21:44bnieuwenhuizen: well if all the time is going to js then nothing we can do here
21:44clever: karolherbst: i think the 2nd api to come out was dispmanx: https://github.com/raspberrypi/userland/blob/master/host_applications/linux/apps/hello_pi/hello_dispmanx/dispmanx.c#L97-L159
21:44karolherbst: so you spend a lot of CPU time with lower clocks :/
21:44karolherbst: so the reporting might not be valid
21:44clever: karolherbst: basically, allocate object, write data to it (or mmap it), then create a list of operations to atomicly apply, to hide/show it, and set the xy position
21:44karolherbst: bnieuwenhuizen: what I know is, that when setting the min freq to let's say 2.4GHz, everything gets smoother :)
21:45karolherbst: but I am also on a 4k screen
21:45clever: karolherbst: behind the scenes, the mailbox framebuffer is just another dispmanx image (i think drm calls that a framebuffer too?)
21:45bnieuwenhuizen: res shouldn't really matter if it is the GPU doing the work (as long as you're talking about CPU freq)
21:45karolherbst: bnieuwenhuizen: well...
21:45karolherbst: I think it's a mix of both
21:45karolherbst: the GPU having to do quite a lot of stuff
21:45karolherbst: and the CPU being slow processing it
21:46karolherbst: so the CPU ramps up slowly
21:46karolherbst: and then the GPU is also slow with ramping up clocks
21:46clever: karolherbst: for extra fun, the raw images backing dispanx, can be defragged!
21:46clever: the firmware can freely move them around ram, to consolidate free ram
21:47clever: karolherbst: there is then an api similar to what palmos had, where you can allocate, lock/unlock, and free memory, all referened by a handle, rather then a pointer
21:47karolherbst: bnieuwenhuizen: eg. when I run glxgears at 1000x500 or so, everything is smoother :)
21:48clever: lock returns the current address of the object, and temporarily stops it from being moved, unlock allows it to move again
21:48karolherbst: it's still not smooth, but hey..
21:48clever: karolherbst: and the rpi foundation added this api call at my request, it returns the memory handle for a given dispmanx image handle
21:49karolherbst: and setting the clocks high makes everything smooth...
21:49karolherbst: it's super annoying
21:49clever: so you can then lock it, and mmap /dev/mem, to write directly into it (or tell 3d to blast right into it)
21:49clever: that can then save a whole frame copy
21:49karolherbst: yeah.. zero copy stuff is always nice :)
21:50clever: the 3rd api to come out (i think it was 3rd?) is the firmwre kms/fake kms
21:50clever: and its just the normal DRM/kms api, but with the kernel issueing dispmanx calls kernel-side, into the firmware
21:50clever: so it just translates all DRM layers into dispmanx layers
21:50karolherbst: clever: but I'd say that the GPU mapping the compositor buffer usually makes more sense from an architecture pov
21:50clever: the firmware then translates it further into HVS layers (the hw compositor), and it composites
21:51clever: and then the 4th api is real kms, where linux just drives the HVS directly, using the drm/kms api
21:51karolherbst: I just hope we get this modifier stuff all sorted out quickly :D
21:51karolherbst: that really sounds like a huge benefit overall
21:51clever: and finally...
21:53clever: karolherbst: page 71, opcode 113, the first 32 bits of operate (offset 0 bits 32), is the memory address the 3d hw writes a frame to
21:54clever: for zero-copy 3d rendering, you just arrange for opcode 113 to write to an address, that the HVS will layer fetch image data from
21:54clever: both 2d and 3d subsystems operate in the VPU space, and can either access uncached ram, or route via a 128kb L2 cache
21:55karolherbst: the issue is just, what do you do if your client is too slow and you get tearing? :p
21:55clever: in theory, speed shouldnt matter at all
21:56clever: once you tell the HVS to display an image, dont modify it at all
21:56clever: and issue the update during vsync
21:56clever: the next image goes to a second buffer, and then you swap
21:56clever: and you may need a 3rd buffer in the mix, so you can render to 3, when your not sure if 1 or 2 is visible
21:57clever: 2 has been scheduled to display, but the vsync hasnt happened yet, and it could flip at any moment, render to 3
21:58clever: karolherbst: i assume thats all a sane idea, right?
21:58karolherbst: ahh, yeah, sure
21:59clever: the problem then, is getting the graphics server (X or wayland) to deal with window A wanting a pageflip, and window B not doing any graphical updates
22:00clever: if i was to try and implement such a beast, i would want to just forward every window to the drm layer, but hmmm
22:01clever: karolherbst: can drm deal with atomicly swapping out a list of framebuffers, that are all rendered to a single output port?
22:01karolherbst: no clue
22:01karolherbst: I think I never actually looked at the drm API :)
22:01clever: and what if window 1 issues a pageflip, then window 2 issues its own pageflip....
22:01clever: ideally, both can happen on the same vsync
22:01clever: but getting the api to understand that
22:03keithp: clever: so, you get the app to tell you how long each frame will be shown, then you wait for all apps scheduled to update at the next frame, then draw the result and present. Handling late apps is a matter of figuring out when you have to bail and give up
22:04clever: keithp: another problem, is that the drm api has a limit on layers, and some gpu's are limited to 2 or 3 layers total
22:04keithp: clever: so you draw 'slow' apps together into a single layer using the GPU and put 'fast' apps in their own layer
22:04clever: i think in the old era, one layer was dedicated to the pointer, and one layer to xvideo
22:05clever: so just the video&pointer get composited in hw, and the rest in software
22:05keithp: yes, that was often true
22:05clever: yeah, i was thinking the same thing
22:05clever: by default, all apps are "slow"
22:05clever: but an app wanting to be tear free, asks to be "fast" and have pageflip support
22:05keithp: by asking apps to tell you in advance how long the current frame will be shown, you don't have to guess either
22:06clever: composite the slow ones howeevr you want (cpu or opengl), and then have drm merge that "slow" frame, and the fast ones
22:07keithp: that's been the plan, it's just a lot of work to get it implemented
22:07clever: is that the plan with X or wayland?
22:07keithp: "wayland" isn't a piece of software
22:08keithp: so you'd have to figure out what the plan for every different wayland implementation is
22:08clever: yeah, its more of a protocol
22:09clever: sort of like X11 vs Xorg?
22:10keithp: well, Xorg is the only current X implementation, since X/NeWS died
22:10clever: yeah, less variants
22:11clever: only other ones i know of are Xvnc, that windows one, and an android one
22:11clever: but Xvnc shares a lot of code with Xorg
23:25alanc: I thought Exceed was still alive for Windows, but I don't know if it uses Xorg code under the hood or it's own
23:31ccr: does Solaris also use Xorg nowadays (I've no idea, haven't touched Solaris since .. 2005 I think)?
23:31jekstrand: What else would they use?
23:32ccr: dunno, some proprietary thing. :D
23:32ccr: used to be plenty of those around back in the day
23:32jekstrand: Ok. If you'd said wayland, I would have fell off my chair.
23:33ccr: apparently it is Xorg
23:45alanc: ccr: yes, Solaris 10 (released 2005) had both Xorg & Xsun, Solaris 11 (released 2011) and later are Xorg-only
23:45Venemo: Are there any plans of how to express mesh shader outputs in NIR? Do we want to rename the current tess patch output stuff to per primitive, or do we want to introduce a new kind of output?
23:45alanc: (well, Xorg, Xeyphyr, Xvnc, & Xvfb, but all based on the X.Org code)
23:46alanc: jekstrand: right, no Wayland, too many Linux API's we'd have to recreate, and we'd need to port a lot of KMS drivers
23:47jekstrand: alanc: Or you could just switch to Linux kernel and GNU userspace. :-P
23:47alanc: Xsun was basically the old X Consortium server with proprietary DDX/drivers
23:47alanc: jekstrand: that product is named Oracle Linux
23:47keithp: alanc: surely Solaris is a reasonable alias still
23:48emersion: alanc: wayland doesn't _depend_ on linux APIs but… many compositors do
23:48alanc: the customers who still need to run GUI apps on Solaris are running legacy X11 apps - if they have Wayland apps, they're already running those on Linux
23:49alanc: emersion: yes, the compositors we'd be likely to ship I should say (either Weston or gnome-shell)
23:49airlied: is there directions for running the traces locally?