00:41anholt: jekstrand: we've got a bunch of arrays of BOs, and need to look up bo ptr index in those arrays. would you use sparse_arrays next to each of those with the bo ptr as the key to do that reverse mapping?
00:41anholt: (we're using the HT right now, and I keep wishing we had something else for this.
00:42jekstrand: I'm using sparse_array right now in ANV to manage our list of all BOs ever
00:42jekstrand: The anv_bo struct lives in the sparse array
00:42jekstrand: it's indexed by GEM handle
00:42jekstrand: Not sure if that's the kind of thing you're thinking of
00:43jekstrand: The primary reason for that is that it's thread-safe, more compact than a hash table of pointers, and the pointers survive forever unlike an array that you occasionally realloc.
00:44jekstrand: anholt: I'm afraid I don't really understand your use-case. It sounds like a pretty good use of a hash table on the surface.
00:44jekstrand: I'm sure there's some reason why it's not
00:46anholt: the HT is sparse, and the data pointer is sparse, and the hashing of the pointer as a key is expensive and I've had this problem in multiple drivers ("how do I put together my list of BOs used in a submit, given many sets of BO pointers")
00:47jekstrand: We have that problem in ANV right now
00:47jekstrand: I haven't done it yet, but I'm tempted, now that I have the sparse_array, to make them bitsets
00:47jekstrand: Oh, no, they are bit sets
00:47jekstrand: I did that
00:47jekstrand: indexed on GEM handle
00:48anholt: hmm. bitset by GEM handle for presence in the list. interesting.
00:48anholt: there's still the lookup from BO ptr or handle to index, for ORing in some submit flags attached per BO.
00:49jekstrand: Cut 3% off a benchmark
00:49jekstrand: Yeah, at the end, once you've ORed them all together, you have to walk the bitset and look up each BO in the sparse_array
00:50jekstrand: But the theory is that a bunch of sparse_array lookups at execbuf time is way less expensive than the hash table search/insert for every single BO reference.
00:50jekstrand: FYI: On Intel, a util_sparse_array lookup is about 3x as expensive as a bare array lookup.
00:50jekstrand: I have no idea what the expense is like on ARM
00:51anholt: we avoid the HT search in the common case by having a uint32_t in the BO for a "likely index" set by the last thread doing a lookup in its array
00:51jekstrand: Yeah, that's what iris does, roughly.[
00:51jekstrand: Actually, iris falls back to a linear array search if the index is ever wrong
00:52anholt: that's what I had in v3d
00:52anholt: it sucked
00:52jekstrand: Well, iris only has two batches going at a time and usually only one.
00:52jekstrand: That helps keep the miss count low
00:52anholt: (in vc4 for a long time at least I didn't even have the likely value!)
00:53jekstrand: over-all, I think the bitsets work pretty well.
00:54jekstrand: If you're using < 1/128 of your BOs in any given batch, it can be more space than the hash table.
00:54jekstrand: But also, if that's the case then your batches are either really small or else your app has a *lot* of resources.
00:54anholt: I think the bitset idea for us would be "when the likely value misses, check our bitset. if not, then add to the list and to the bitset. if we find a hit in the bitset, it means we're doing multithreaded submits with this BO, so go make the HT now"
00:54jekstrand: Why bother with the hash table if you're going to use the bitset?
00:55jekstrand: There's not that much advantage to building the array as-you-go.
00:55jekstrand: Sounds unnecessarily complicated to me
00:55anholt: because we still need to find our index, so this is keeping us from needing to ht in the common case of single threaded submits per bo
00:55jekstrand: Why? Are you accumulating more than just a set of BOs?
00:56jekstrand: Do you need to also accumulate per-BO usage flags or something like that?
00:56anholt: right now each reloc finds the BO's location in the submit's array and ORs in its flags (or appends to the submit array with the BO and the flags
00:56jekstrand: ORs its flags into what?
00:56anholt: the submit array's flags
00:57jekstrand: Ok, so you have some flags on the reloc that you need to collect per-BO like read/write or similar?
00:57anholt: ("this buffer is written" and "this is a buffer you should dump in a hang")
00:57jekstrand: Oh, in that case, I've got nothin'
00:57anholt: there's a read flag too, and I'm wondering what a read flag possibly does
00:57jekstrand: We make the "should dump in a hang" choice in ANV globally and don't bother tracking read/write.
00:59jekstrand: We set EXEC_OBJECT_CAPTURE on batches, shader buffers, and state buffers.
00:59jekstrand: All of which we know at allocation time
00:59anholt: I like this idea a lot.
01:00jekstrand: And if you really want to track "written", just have two bitsets. :-P
01:00jekstrand: Or use two bits per bo
01:00jekstrand: Or you can use a uint8_t * instead of a bitset and you get 8 bits!
01:43bnieuwenhuizen: anholt: we track the used buffers by GEM handle directly in a weird hashtable in RADV
01:43bnieuwenhuizen: if you store handle->flags we don't have any need to back to the actual struct
01:55jekstrand: bnieuwenhuizen: We do have to go back to the original struct. :-(
01:55jekstrand: Because we have to pass the kernel the address of each BO on every exec
02:00pinchartl: now that Android is switching from .mk to .bp for the build system, has anyone given a thought to how this will be supported in mesa ? looking at what they did in https://android.googlesource.com/platform/external/mesa3d/+/refs/heads/master for aosp makes me cry
04:25thaytan: airlied, ta
06:57danvet: tzimmermann, I think samuel zou doesn't have commit rights, are you going to push that patch?
06:57danvet: (good to leave a note to that effect to avoid coordination confusion)
06:57tzimmermann: danvet, already working on it
06:58tzimmermann: i actually did leave a not
06:58danvet: hm the one I looked at only has r-b: you
06:58danvet: but there's a bunch
06:59tzimmermann: danvet, at the top of my reply (?)
06:59danvet: oh indeed
06:59danvet:obviously not awake yet
06:59danvet: sry for the noise
06:59tzimmermann: get a coffee :)
06:59danvet: done, but I guess it's not kicked in yet
07:00tzimmermann: danvet: btw, could you ack my ast suspend fix? unless you really want to have the check in the helpers
07:01danvet: tzimmermann, oh just r-b stamped it
07:01danvet: and that "maybe should be in helpers" was idle ranting
07:01danvet: I dont' think we have a good solution for this
07:01tzimmermann: i guess, i have to get some coffee myself :D
07:01tzimmermann: danvet, thanks
07:02tzimmermann: i think i'd like to have that in a helper. but i have no good idea about how to do it in a flexible and less-complex way that this simple test
07:04danvet: it's kinda like suspend/resume helpers, we can boil it down to a oneliner, but drivers still need to wire it up correctly
07:07tzimmermann: exactly, but wiring it up correctly requires driver writers to understand whats going on and why. at that point, testing for !state->enable is easy
07:08tzimmermann: i could imagine something like drm_atomic_helper_check_plane_state(), but for crtcs. and it needs three types of return values: success, errno code, and early-out
07:10danvet: yeah the plane version has the same problem, solved by filling out ->visible
07:10danvet: same problem = needs a tree-kind return value
07:13tzimmermann: danvet, oh right! there's ->visible. if they have the same problem to solve, maybe a consistent solution can be found. driver writers would only have to learn semantics one for all those functions. that's at least something
08:35emersion: agd5f, hwentlan: hi, would it be possible to get this patch merged? https://lists.freedesktop.org/archives/amd-gfx/2020-March/047825.html
08:38kusma: How do I debug issues with marge-bot being broken on the inside?
08:38airlied: kusma: you wait for anholt :-P
08:39kusma: Then I'm tempted to bypass marge bot, but then I might get yelled at :P
08:43daniels: hang on
08:44daniels: let me look at her
08:45daniels: kusma: she has nothing to merge; can you please do whatever it was that made her break again, so I can watch the logs
08:46daniels: ah, got it
08:46kusma: Yeah, just assigning https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4914 to marge
08:50daniels: https://gitlab.freedesktop.org/snippets/999 - only one snippet to go to 1000
08:51kusma: daniels: who cares about base 10 magic? :P
08:52kusma: Only 25 snippets to go to 1024!
08:52emersion: i like how marge is referred to as if she was a regular human
08:59MrCooper: daniels kusma: seems to be something about that particular MR, Marge has merged others up to 1h ago
08:59kusma: MrCooper: yeah, indeed.
09:01MrCooper: no idea what it could be though :(
09:05daniels: gitlab hasn't changed at all, but for this particular branch, it's returning '#<API::Entities::Branch:0x00007f6a9c756170>' instead of JSON ...
09:06daniels: i think it's the '.txt' suffix throwing it
09:12daniels: this bug literally only happens if you have a branch which ends in '.txt'
09:12daniels: '.txt.foo' is fine, '.html' is fine
09:13kusma: daniels: haha, should I resubmit to test that hypothesis? :)
09:34daniels: kusma: sure, if you just use a different branch name, it works
09:36arora: Hey tlwoerner, jekstrand I was wondering if there was a mismatch with my skills and the project requirements? If so, what skill areas can I work on to be better qualified next year?
09:41kusma: daniels: OK, done
10:10daniels: kusma: thanks
10:11kusma: daniels: and yeah, surprise, surpris, that worked. Was this a marge-bot or gitab bug, you think?
10:12daniels: kusma: definitely a gitlab bug, cf. the snippet
10:13daniels: when you ask the GitLab API to describe a branch name, it returns a JSON blob. unless the branch name is suffixed '.txt', in which case the HTTP response is the string '#<API::Entities::Branch:0xdeadbeef>', which is how Ruby prints objects by default
10:15kusma: This is kinda what's great and at the same time horrible about high-level languages; they tend to never crash properly, so you more often get a bizarre path of "huh, how could this even happen" kind of behavior to go through ;)
10:16kusma: ref: watman.
10:17danvet: kusma, ime linux kernel also has a mad ability to keep limping along forever, despite massive corruption and races
10:17danvet: and the eventual death is pretty good wtf
10:19kusma: danvet: Right. Yeah, there's often cases where you want that, but maybe you want to "fork out" something that is allowed to die properly from times to times. Forking probably means something slightly different in kernel land here ;)
10:20daniels: implicit coercion and magic behaviour is _always_ helpful
10:20kusma: daniels: ALWAYS!
10:22kusma: This is kinda the reason why it's not always bad that we have fifteen gazillion programming languages out there, specialized for different tasks. It's just sad that we also get stuff like fifteen gazillion UTF-8 decoders that are slightly different in how they deal with incorrect encodings at the same time.
14:27agd5f: emersion, Nick merged it yesterday: https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next&id=e133020f92b9397eaad83ff1dada6d9786edcbd0
14:27emersion: agd5f: oh, thanks a bunch!
14:59alyssa: robclark (et al): what do you for perf counters on the userspace side?
14:59alyssa: I know there are efforts to pipe GPU counters into common interfaces, but in the meantime..
15:00robclark: "common interfaces" == amd perfcounter extension..
15:00alyssa: ...Right, yes.
15:01alyssa: I guess first question is, how important is counters-over-time for driver-side work? (as opposed to just accumulating samples for a frame and dumping)
15:03robclark: umm, well per draw is useful.. if you can't read counters from cmdstream, then you are kinda limited in what you can do
15:03alyssa: (counters for us are systemlevel right now)
15:04robclark: I have fdperf which does free-running counter reading.. which is sometimes useful for getting an overall view (ie. how busy is gpu, is it bottlenecked on CP, etc)
15:05robclark: but if you can read counters from cmdstream then tools like framemetrics/frameretrace are useful to drill down into things at a draw level
15:06alyssa: curses UI, I see :)
15:24pcercuei: drm_mode_config.prefer_shadow_fbdev looks pretty broken... It doesn't support double/triple buffering
15:31ajax: do i remember correctly that some gpus have zlib engines now, or has lockdown loosened my grip on reality?
15:43imirkin: ajax: i do seem to recall some sort compression accelerator on some older nvidia gpu's
15:44jekstrand: arora: Not really as far as I know
15:45imirkin: ajax: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/engine/ce/gf100.c#L51
15:48imirkin: i don't think anyone's really played with it much. it was available on some but not all fermi chips.
15:50imirkin: (the higher-end ones ... GF100, GF104, GF110, GF114)
17:24tlwoerner: arora: hey! i'm glad to hear you're interested in applying again next year! :-)
17:27tlwoerner: arora: one thing that we liked about the person who was chosen this year was that they had a number of code patches submitted and applied to not only a graphics project, but the linux kernel
17:27tlwoerner: arora: someone who is already active writing code anywhere in the free software community is going to be a huge bonus to a gsoc application
17:28jekstrand: The kernel isn't a requirement but we do look for an open-source contribution history.
17:28tlwoerner: arora: also, this person already had a blog and had posted a number of their thoughts learning about the various projects to which they contributed
17:29tlwoerner: oh yes, thanks for clarifying, i'm not saying any particular project (e.g. kernel) is a requirement, just that they were already showing they could contribute code to free software, anywhere
17:30tlwoerner: arora: so the combination of past contributions and a blog made that application very strong
17:31tlwoerner: arora: i look forward to your application next year, and hope that we hear from you between now and then! :-D
17:37alyssa: tlwoerner: oh hi :)
17:38tlwoerner: alyssa: hey! things good?
17:38alyssa: frosting pans full time :)
17:38imirkin: mmmm frosting
17:40jekstrand:usually puts cake in the pan before frosting it.
17:40imirkin: times are tough
17:40jekstrand: But, hey, if you want to skip the middle-man, who am I to say differently?
17:40imirkin: also ... usually?
17:41imirkin: iow you *have* just frosted the pan
17:41alyssa: jekstrand: cake is a lot of calories, skipping to the frosting is healthier
17:42jekstrand: imirkin: I can neither confirm nor deny whether I may have frosted just a pan at some point in the past.
17:52imirkin: i feel like there's a simpsons where they pull some prank that involves frosting pans...
17:54HdkR: "Simpsons did it! Simpsons did it!"
17:54imirkin: replace the filling of donuts with mayonnaise?
17:55imirkin: that was the sea people episode, right? good one...
17:56arora: tlwoerner: thanks a lot for the detailed feedback!
17:57arora: I will surely be sticking around and try my best to contribute :D
17:59tlwoerner: arora: excellent!
18:26robclark: hmm, for gitlab notification emails.. if you subscribe to label, are the notifications in addition to the configured notification levels (ie. watch/participate/on-mention/etc)? It seems like I'm not seeing issue/mr notifications for some of the labels I've subscribed to
18:26jekstrand: I think it notifies you when that label is added to the MR
18:26jekstrand: It doesn't auto-subscribe you to the MR
18:27robclark: hmm.. ok.. maybe that explains it.. it also wasn't quite what I was hoping for..
18:59pcercuei: my HW doesn't give me vblank interrupts when the two planes (primary + overlay) are disabled
18:59pcercuei: it triggers warnings in drm_atomic_helper_wait_for_vblanks()
19:00pcercuei: I expected the CRTC to be disabled when all attached planes were disabled, this is not the case?
19:05imirkin: pcercuei: iirc that's an awkward case
19:05imirkin: i don't think many drivers support that well ... i think they tend to turn the crtc off or something. not sure.
19:40HdkR: `Asynchronous wait on fence 0000:00:02.0:compton:17d84c timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])` That's a first. A hang with my Intel iGPU
19:41HdkR: Sadly it's a new Ice Lake laptop so I didn't have ssh-server installed
19:41jekstrand: HdkR: You clearly haven't been doing the right things on your laptop. :P
19:42imirkin: jekstrand: where "right things" are defined as intel driver development?
19:44jekstrand: imirkin: Hanging the GPU doesn't require driver development. :-P
19:44imirkin: but it certainly helps!
19:44jekstrand: Oh, it's truly the best way!
19:45imirkin: also making declarations like "the kernel driver is unhangable" is a great way to ... obtain counterexamples :)
19:46airlied:has seen some recovered hangs where gnome-shell gets stuck flipping between two frames
19:54pcercuei: Is it possible to attach a "scaling mode" property to a plane?
19:54pcercuei: the doc says it's for connectors, so I wonder
19:55pcercuei: I guess I'd need to handle it in my driver instead
19:56imirkin: pcercuei: planes can have scaling
19:56imirkin: that's the difference between fb size and crtc size
19:56pcercuei: imirkin: yes. My plane does scale
19:57imirkin: i mean in drm :)
19:57pcercuei: my DRM plane does scale :)
19:57imirkin: but then there's also the question of what to do for like an LCD screen
19:57imirkin: that has a native size
19:57imirkin: and your application is generating a less-than-full-size thing
19:57pcercuei: but I have a sysfs property (not drm_property) to choose the scaling mode: keep aspect ratio, or respect the aspect ratio of the source
19:58imirkin: some drivers provide the option to (a) center, (b) scale, (c) scale while maintaining aspect
19:58imirkin: but that's done on a per-crtc level
19:58imirkin: since presumably this happens at the very end
19:58imirkin: and/or because that's how some of the other hw works :)
19:58pcercuei: well that's how my plane works
19:59imirkin: the drm-level scaling is where the application supplies the target crtc size
19:59imirkin: the "scaling property" is for when the application just provides the fb and that's it
20:00pcercuei: my sysfs property is used to override the application's chosen crtc_w / crtc_h
20:00pcercuei: depending on whether or not you want to maintain the aspect ratio
20:00vsyrjala: can you have more than one plane on the same crtc?
20:01pcercuei: vsyrjala: yes
20:01vsyrjala: can all of them scale all the time?
20:01pcercuei: only one can scale, the other (overlay) cannot
20:01pcercuei: in my case, that is
20:02vsyrjala: then emulating scaling mode via plane scaling is not for you. it needs to scale everything coming out of the crtc
20:02imirkin: interesting. normally it's the overlay that supports scaling
20:02imirkin: perhaps rename them and all is well? :)
20:03pcercuei: vsyrjala: I do want to change the scaling mode *before* it's composed by the CRTC
20:04vsyrjala: that's plane scaling and there's no scaling mode for htat. the application specifies the coordinates exactly
20:05pcercuei: that'd be an enormous amount of work
20:05vsyrjala: what? doing what the application asked for?
20:06pcercuei: "the application" is SDL1 rendering to fbdev
20:07vsyrjala: sounds like one of those "doctor it hurts..." cases
20:11pcercuei: it requests a given resolution, and knows nothing about what's done behind the scenes
20:12imirkin: so if you have a 1024x768 panel
20:12imirkin: and you request a 640x480 mode
20:12imirkin: and then behind your back, the driver actually drives at 1024x768 and fills in all the details
20:12imirkin: then that's what the scaling property is meant to do
20:13imirkin: this is done at the crtc level
20:13pcercuei: exactly, except that I do it at the plane level
20:13imirkin: do it however you like, but the property is on the crtc
20:14pcercuei: the doc seems to say that it's on the connector actually
20:14imirkin: that's probably right.
20:14imirkin: trust the doc.
20:14imirkin: (can you have crtc properties? probably not)
20:14pcercuei: of course
20:14imirkin: since you want it on e.g. the LVDS/eDP connector, and not HDMI, and there's no tight binding between crtc and connector
20:15pcercuei: but again, that's not where it happens
20:16imirkin: i get that your hw is odd, but that's how it happens in the drm model
20:16pcercuei: my plane does scaling, the CRTC composes the picture, then the connector can scale again
20:17imirkin: i think you have all the information necessary.
20:18imirkin: plane scaling is not controlled by properties.
20:18pcercuei: I understand that
20:18pcercuei: but that's not optimal
20:19pcercuei: especially with fbdev emulation
20:19pcercuei: anyway. I'll just send my patchset and see what's the feedback on it
20:19imirkin: just make fbdev be the size the application wants
20:19imirkin: and then use the connector-level scaling
20:20imirkin: (or don't use fbdev, but i'm guessing you already thought of that one)
20:20pcercuei: I have no connector-level scaling
20:20imirkin: if there's no overlay, is there a difference?
20:21imirkin: and if there is, fail the modeset
20:21imirkin: or you can add a bunch of stuff to debugfs that only applies to your driver anyways
20:21imirkin: any sysfs additions are unlikely to be welcomed
20:22pcercuei: let's say your application requests 256x256
20:22imirkin: is that a valid mode?
20:22pcercuei: the CRTC is the LCD's size, so 320x240
20:23imirkin: i.e. do you support a 256x256 mode?
20:23imirkin: the CRTC in reality, or as far as the kms client is concerned?
20:23pcercuei: so there are two ways to scale it to 320x240. Either fullscreen, or fullscreen but keep the aspect ratio
20:24imirkin: or center it
20:24pcercuei: the CRTC is the size of the attached screen
20:25vsyrjala: emulating scaling mode via plane scaling will fail when you have more than one plane if you can't scale them all. you'd need to check for that and return an error
20:25pcercuei: why would it fail? The second plane has a fixed mode which corresponds to the size of the screen
20:26imirkin: vsyrjala: he basically just wants to get fbdev to pass different parameters to kms
20:26pcercuei: I only care about scaling mode on the primary plane, so I'm not trying to emulate scaling mode the way you understand it
20:27vsyrjala: plane input and output size is specified by userspace. you can't just ignore it
20:27imirkin: or in this case, fbdev
20:27pcercuei: I can't ignore the input size
20:27pcercuei: I can ignore the output size
20:29pcercuei: why not?
20:30pcercuei: I don't ignore the output size, I tweak it
20:31imirkin: sounds like you don't want a drm driver
20:31imirkin: but instead an fbdev driver
20:31vsyrjala: it's not allowed. you promised the app to put the planes at specific coordinates. if you don't then the app doesn't get the output it expects
20:31imirkin: (of course those aren't allowed anymore... fbdev is deprecated)
20:31pcercuei: imirkin: yes. I'm between a rock and a hard place
20:32imirkin: you're going to have a bunch of custom hacks for your application
20:32pcercuei: vsyrjala: I didn't promise anything ;)
20:32pcercuei: but again, this is for fbdev emulation
20:32imirkin: so you'll have to have your own tree
20:32vsyrjala: you did when you exposes the kms uapi
20:32imirkin: so do whatever you like
20:32pcercuei: no, that's not the plan
20:33vsyrjala: one could do this tweaking in the fb_helper
20:33imirkin: but if you want an upstream kms driver, then you have to implement the kms api the way it's written or convince people that it needs to be defined differently
20:33vsyrjala: but making that nice and generic and clean code is a bit of work probably
20:33imirkin: doing things behind the application's back is pretty much against the kms design though
20:33imirkin: adding options to the fbdev integration layer ... meh
20:34imirkin: (where the kms application is fbdev, basically)
20:35anholt: jekstrand: huh, bitset didn't work out to be a win on drawoverhead. https://gitlab.freedesktop.org/anholt/mesa/-/commit/613ff758762c9655a59d99b72804c37883e4efdb (look at ringbuffer_sp.c, not ringbuffer.c)
20:36pcercuei: vsyrjala: even for KMS apps, I do have to tweak the crtc_h/w but only slightly: crtc_w &= ~1
20:36pcercuei: otherwise the hardware locks up
20:37imirkin: pcercuei: i believe normally you just reject modesets which run afoul of such rules
20:37vsyrjala: pcercuei: you need to return an error when the user asks the impossible
20:38pcercuei: yeah, well. Then what? the userspace think "crtc_w++ and try again"?
20:38vsyrjala: usually userspace has a fallback as in "just use one fullscreen plane and compose everything with gpu/software/whatever"
20:38vsyrjala: if not it's broken anyway
20:39jekstrand: anholt: Interesting...
20:39jekstrand: anholt: Did it work out to be a wash or worse?
20:39anholt: -0.635795% +/- 0.27689% throughput
20:39pcercuei: and then the scaling etc. have to be done in software?
20:39pcercuei: that sounds terrible
20:40vsyrjala: less terrible than userspace getting essentially random output
20:41jekstrand: anholt: :-(
20:41imirkin: anholt: perhaps there are bitset optimizations that are done for x86 but not arm?
20:41pcercuei: sorry, but that's BS
20:41anholt: 160ish handles in the app
20:42pcercuei: who cares if my image on screen is scaled to 320x240 instead of 320x239? Especially if it allows it to render at 60fps and not 12fps
20:42imirkin: (either at the source level, by using intrinsics, or just isa being helpful)
20:43imirkin: pcercuei: discovery of the "right" parameters to the application has been an ongoing topic
20:44vsyrjala: if the image is composed with multiple planes then it may be crucial to get pixel perfect output. otherwise all kinds of artifacts may be visible
20:44vsyrjala: the kernel can't make that determination and hence isn't allowed to overrule userspace
20:45jekstrand: anholt: I have observed in the past that bitsets can have weird perf. For instance, uint64_t bitsets are way slower on x86_64 for some reason.
20:45pcercuei: I understand your point of view
20:46pcercuei: My point of view is, this device runs fullscreen games and emulators. It is absolutely not acceptable to lose performance because of something like that
20:47pcercuei: fbdev had a "check_var" ioctl that allowed the kernel to tweak a mode to acceptable values
20:49vsyrjala: fbdev never had to deal with multiple planes/atomic updates/all that good stuff
20:50Lyude: danvet/vsyrjala - new vbl work + nv_crc patches on the ML btw, feel free to take a look when you get a chance
20:52anholt: jekstrand: no obvious disaster in the asm, not that I'm great at arm64 asm
20:52jekstrand: anholt: I still find that a bit surprising
20:53jekstrand: anholt: Have you looked at it in perf? Maybe all the cost moved to sparse_array?
20:54jekstrand: anholt: With ANV we have sub-allocation helping us so the number of objects shouldn't be high compared to the number of times they're added to the batch.
20:56danvet: Lyude, I'd include your benchmark summary mail in the commit message for the drm vblank worker patch
20:56Lyude: danvet: sure thing, btw - did that look sensible to you?
20:57danvet: it's rather late on Fri
20:57Lyude: ah, understandable :P
20:57danvet: I dont think you want technical opinions from me :-)
20:57danvet: but looks cool
20:59danvet: ok, replied with one yolo r-b
21:02danvet: Lyude, btw did you test the other sources?
21:02danvet: iirc there was a fumble in the generic drm crc uapi
21:02danvet: and source selection wasn't actually forwarded
21:04danvet: oh the patch landed
21:04danvet: but there might be other lols
21:05Lyude: danvet: the branch I tested against was actually kind of old, but I don't really think we ever used any source other then the default
21:05Lyude: well, not that old
21:05danvet: yeah igt just goes with auto
21:05Lyude: as long as auto was working it should be fine then
21:45anholt: jekstrand: with the bitset, there's no sparse array (we still keep an array of BO pointers, along with the set)
21:45jekstrand: Oh, so it's really just a replacement for your hash table
21:45jekstrand: Very weird that it's slower then
21:46jekstrand: Like, unbelievably weird
21:48anholt: driver shas match expected, a fresh sample reproduces the perf hit.
21:49anholt: anyway, got other ideas to work on our overhead (particularly in the "too many separate allocations" vein)
21:56anholt: I'm now at the third obvious CPU reduction patch producing reduced perf in this testcase, so I suspect I need something more CPU bound.
22:47robclark: anholt: if you have a setup where you can do FD_MESA_DEBUG=nogmem with gfxbench gl_driver2, that will give you a CPU bound workload that is interesting
22:48anholt: is gl_driver2 a set of many testcases getting at different areas of driver overhead?
22:49robclark: it's a single benchmark.. with a gazillion tiny draws, and moderate mix of state changes between.. also a thing we actually have to care about
22:49anholt: working with open source software is much nicer, and lets me see how a change impacts driver overhead in multiple ways.
22:49robclark: mostly uniform updates, but there is a mix of other state changes... second most common is probably texture state
22:50robclark: sure, ofc.. just pointing that out since it is CPU limited and at the end of the day it matters because for some reason people care about gfxbench
22:50anholt: (for example, the big win I found is for changing a little bit of a large cb0 uniform buffer, or not changing it at all but changing program state, while hurting small uniform counts a little bit)
22:51robclark: gl_driver2 does frequent uniform updates, updating a small subset of the uniform state (iirc just updating a single uniform)
22:51anholt: that's one of the subcases of drawoverhead, with different uniform counts
22:52robclark: although because of how SDS works, I'm not sure how we can sanely do partial uniform uploads.. there is the idea of appending more CP_LOAD_STATE to the end of a SDS stateobj, which might work out
22:53anholt: I'm not doing anything on that front, it would be absolute horror to plumb through
22:53robclark: (but other than that, later stateobj replace rather than amend prior stateobj's.. and stateobj's can be skipped due to binning)
22:53robclark: yeah, I agree
22:53robclark: it's why I didn't try it yet :-P
22:55robclark: hmm, not sure why I didn't get an email about that one
22:55anholt: opened just a minute ago
22:58robclark:wishes gitlab were a bit more clever about showing diff's of stacked MR's
23:02Lord: i'm not sure here is the right place to talk about this but i try. On gentoo with a kernel 5.6.2 with latest mesa with mesa on a radeon r580, when i play some heavy games at 4k, after a while, the monitor shut down and the input stop responding. The computer continues to ping back so it's not a total freeze. I experience it in many different games (the witness, alien isolation, tomb raider). If
23:03Lord: I play at 1080p i mostly never experience this behavior.
23:03Lord: could this be attributed to mesa or amdgpu ?
23:04imirkin: ultimately amdgpu, but mesa's not helping
23:04anholt: Lord: that's basically going to be your kernel display driver.
23:05Lord: i should try installing openssh and doing a "dmesg -w" in it while triggering this behavior
23:11Lyude: anyone have an edid with interlaced modes?
23:16Lyude: oh-looks like my tv has some, I'm all set
23:21alyssa: Down to essentially no deqp regressions but mysterious app regression
23:21alyssa: isn't fp16 fun :o
23:24alyssa: (glmark2-es2 -bterrain, in this case)
23:29robclark: hmm, terrain looks a bit diff in es2/fp16.. not sure offhand if that is us or them, don't really have a reference to compare to..
23:29alyssa: robclark: more... bright, i guess?
23:30alyssa: less dark green and more blinding yellow?
23:30alyssa: So maybe that's two glmark bugs >:
23:31robclark: possibly.. or a mesa bug..
23:31alyssa: could be
23:31anholt: I see only minor pixel differences between i965 and freedreno on terrain
23:32robclark: need to use glmark2-es2
23:32alyssa: making `lVector` highp on L109 of ` terrain.frag` makes the bug go away
23:32anholt: sigh, yep
23:47alyssa: I guess next is fixing my RA so I can actually extract benefit from fp16 :p