00:35 raket: DrNefario: I sometimes use ULMB on my monitor, my trick is to start linux with the nvidia driver and reboot and it still works with nouveau... :)
00:36 DrNefario: wait, is ULMB part of the issue?
00:43 raket: No it's not.. i'm just saying ULMB works with nouveau after a reboot after enabling it on the monitor when the nvidia driver is loaded ...
01:36 DrNefario: ULMB seems to be forced off no matter what due to the refresh rate being set too low
01:36 raket: DrNefario: < DrNefario> wait, is ULMB part of the issue? raket: No it's not.. i'm just saying ULMB works with nouveau after a reboot after enabling it on the monitor when the nvidia driver is loaded ...
01:37 DrNefario: my monitor is probably cursed tbh
01:43 DrNefario: well I think I'll give up on it for now. if someone starts working on the bug, I'll be happy to provide more debug info in the issue. https://gitlab.freedesktop.org/drm/nouveau/-/issues/106
09:24 karolherbst: imirkin: are we running out of VRAM here? https://gitlab.freedesktop.org/drm/nouveau/-/issues/110
13:04 imirkin_: karolherbst: running out of vram is usually a submit failure
13:04 imirkin_: like pushbuf_validate: -12
13:04 imirkin_: or something like that
13:04 karolherbst: yeah...
13:04 karolherbst: hence me wondering a little on what's happening here
13:04 imirkin_: VRAM_LIMIT is we hit some arbitrary limit on something
13:04 imirkin_: i wonder if we're configuring the dma object wrong somehow? i dunno
13:05 karolherbst: channel -1 is also a bit weird
13:05 imirkin_: yea
13:05 imirkin_: MCP89 was always a bit weird
13:05 imirkin_: iirc i always suspected that it had some sort of fifo-related problem
13:06 imirkin_: like ... more so than the rest of the nv50 series
13:06 imirkin_: which also has a separate fifo problem
13:09 karolherbst: mhh
13:19 imirkin_: iirc VRAM_LIMIT is "you tried to go out of bounds on the vram dma object"
13:20 imirkin_: which is complete crap, since we set the bounds to be 0xffffffff i think
13:23 imirkin_: it's saying that it hits this while trying to read the pushbuf
13:23 imirkin_: at least that's how i interpret that error message
13:23 imirkin_: not that on MCP89 the vram is stolen, but it's "real vram" as far as the chip is concerned
13:24 imirkin_: we had some issues on MCP77/79 that RSpliet fixed iirc relating to the stolen vram setup
13:24 karolherbst: mhh yeah
13:24 imirkin_: it's wholly possible we screw something up on MCP89 as well
13:25 karolherbst: I am just surprised we only see it like.. now
13:25 karolherbst: and not like the past 10 years
13:26 karolherbst: but I guess firefox just becomes very big and is doing a lot of stuff :)
13:31 imirkin_: also people are getting kicked off the ddx
13:31 imirkin_: more GL for all!
13:31 imirkin_: also people are more likely to use nouveau since blob support esp for older GPUs is going to be more lacking
13:31 karolherbst: truer
13:31 karolherbst: *true
13:32 karolherbst: I mean.. nothing against using GL, we just don't have enough people to fix all the bugs :(
13:33 imirkin_: nothing against GL, it just doesn't work
13:33 imirkin_: ;)
13:33 imirkin_: my recommendation would actually be that distros stop shipping nouveau_dri.so
13:33 karolherbst: well we can't have vulkan on those older GPUs
13:33 imirkin_: (by default)
13:34 karolherbst: uhm...
13:34 karolherbst: no?
13:34 karolherbst: ever used llvmpipe on 4k?
13:34 imirkin_: not sure how that's my problem.
13:34 karolherbst: yeah well, it's the distros problem getting "laggy desktop" bugs
13:35 imirkin_: nouveau_dri.so is not ready for production use. the fact that llvmpipe isn't a great solution to everyone deciding to sue GL isn't my problem.
13:35 imirkin_: use*
13:35 imirkin_: that doesn't make nouveau_dri.so any more ready for production use.
13:35 karolherbst: yeah well.. convince people to not use GL :)
13:35 imirkin_: not my problem.
13:36 imirkin_: (and i've tried, even people here aren't interested. i'm sure people will be even less interested in the wider community)
13:36 karolherbst: yeah, because the solution is not being grumpy and tell everybody how wrong they are but to get people interested in that project
13:37 imirkin_: let me try ...
13:37 imirkin_: "hey, come work on this project where the manufacturer provides no documentation, and there's no hope of ever getting more than 10% of perf on newer-than-10yr old hw"
13:37 imirkin_: the crowds are converging already!
13:39 karolherbst: yeah. I am aware of that problem
13:39 karolherbst: and all I can say is that "we are working on it"
13:39 imirkin_: but that's a lie
13:39 karolherbst: it's not
13:40 imirkin_: wtvr
13:40 imirkin_: prove me wrong.
13:40 karolherbst: I wish I would be allow to talk more about it, but I can't
13:40 karolherbst: *allowed
13:40 imirkin_: i've been hearing that line for 5+ years
13:40 karolherbst: yeah, I am aware
13:41 imirkin_: when it happens, great, maybe people will be interested, fix things up, etc
13:41 karolherbst: it's as frustrating for us as for everybody else
13:41 imirkin_: in the meanwhile, nouveau_dri.so is not ready for production use
13:41 karolherbst: yeah.. I guess so
13:41 imirkin_: and should not be shipped as part of any distro by default
13:42 imirkin_: the solution to llvmpipe being slow on 4k is "don't use llvmpipe on 4k"
13:42 karolherbst: but RH uses it in production and at least RH funds some of the work, which I think is fair
13:42 karolherbst: and they are free to ship it
13:42 imirkin_: everyone is free to ship it
13:42 karolherbst: but they also pay the price for it
13:42 imirkin_: i didn't say it was illegal or they weren't able to
13:42 imirkin_: maybe. i feel like i pay the price for it.
13:42 imirkin_: in tons of wasted time
13:43 karolherbst: right, that's common in community driven projects
13:43 karolherbst: and it's furstrating that like cose to all distribution demand bug fixes
13:43 karolherbst: but rarely fix it themselves
13:43 karolherbst: and hopefully that changes somewhat in the future as well
13:44 karolherbst: I also heard about some group of people (trying to be as vague as possible) who would be interested in contribtuing, but let's see how that turns out
13:44 orbea: tbf there aren't many package maintainers up to the task of fixing nouveau bugs
13:44 imirkin_: so don't ship it.
13:44 imirkin_: or don't install it by default
13:45 karolherbst: if fedora or RHEL ships it, that's fine, I get poked about bugs the first, and I get paid for working on it anyway :p
13:45 karolherbst: or well skeggsb_ or Lyude or somebody
13:45 karolherbst: but there is at least a path from a support perspective
13:45 karolherbst: for other distributons? yeah well.. open a bug upstream and hope they have time to fix it
13:46 karolherbst: I mean.. distributions can ship it, it just annoys us if people come back every other week and ping on bugs because theirs is the most important one anyway
13:46 karolherbst: I think there is mainly a lack of what users can expect they deman more than we can deliver
13:46 karolherbst: *demand
13:46 imirkin_: i'll write up some wiki pages.
13:47 karolherbst: and it's fine if it still gets shipped, it should just be clear that this is "best effort" and "your bug might never get fixed"
13:48 karolherbst: uhh wiki
13:48 karolherbst: this one needs cleaning up a lot anyway
13:48 imirkin_: yes.
13:48 imirkin_: i'm going to delete 99% of it
13:48 imirkin_: the info was stale on it when i did a cleanup of it back in 2014 or so.
13:48 imirkin_: by now it's beyond ancient
13:49 imirkin_: and is not usefully serving the community
13:50 karolherbst: probably for the best
13:51 omegatron: if I wanted to contribute in the "near" future, by hunting and fixing bugs in nouveau (because of my own personal interest in performant hardware), I assume I would actually need the respective hardware in the machine to test the software? so, I would be limited for now to my available cards ( gt 240, gtx 650 ti, gtx 760 and gtx 1060 ) or otherwise would have to buy new (or used) cards .. !?
13:51 omegatron: or are those cards already so old, noone cares for them?
13:51 karolherbst: fixing bugs is already good enough
13:52 imirkin_: omegatron: as long as you care about them, who cares what other people care?
13:52 omegatron: my only "fear" in this matter would be, to fry my hardware .. that wouldn't be good (because no replacement parts available on the market)
13:53 imirkin_: omegatron: that's always a danger. however i'm not aware of any nouveau developer ever frying hardware through software they had written
13:53 omegatron: good to know
13:53 imirkin_: i believe mupuf fried a gpu or two in his oven, but ... i feel like one can anticipate such risks going into that activity
13:54 orbea: why would you stick a GPU into an oven? (just curious)
13:54 imirkin_: orbea: i think he was reflow-soldering them
13:54 orbea: ah
13:54 imirkin_: this is the generation of GPUs that had bad solder joints
13:54 imirkin_: so they were dying already anyways
13:54 karolherbst: he also treid a hair dryer once
13:55 karolherbst: but apparently they are not hot enough either
13:55 imirkin_: the hair drier was to figure out the temperature sensors i think
13:55 imirkin_: so they were plugged in and working while he did that
13:55 imirkin_: and didn't fry them doing that...
13:55 imirkin_: omegatron: you could be the first though ;)
13:56 karolherbst: but yeah...
13:56 karolherbst: personally I'd be fine with a "you ship it, you support it" statement
13:56 karolherbst: we can't support it anyway
13:56 karolherbst: sonds sad, but it's reality
13:56 imirkin_: karolherbst: well, i'll write something, if you hate it you can update it or whatever
13:56 karolherbst: *sound
13:56 imirkin_: this isn't the new york times in terms of distribution, so even if there's info you hate for a few days, i doubt too many eyes will be on it ;)
13:57 karolherbst: I'd probably just rephrase it to make it sound nicier :p
13:57 imirkin_: nicer than me?! inconceivable!
13:57 karolherbst: imirkin_: uhm.. I know the stats on the wiki in terms of reader
13:57 imirkin_: "mr nice" is my middle name
13:57 karolherbst: there are a lot
13:57 karolherbst: you'd be surprised
13:57 imirkin_: yes, Mr Googlebot seems to like those pages a lot
13:58 karolherbst: ehh no
13:58 karolherbst: I meant stats from google :)
13:58 karolherbst: apparently we have around 5k clicks a month
13:58 imirkin_: not surprising
13:58 karolherbst: from google
13:58 imirkin_: should put up some ads for AMD :)
13:59 karolherbst: lol
13:59 karolherbst: top query 3rd place "nvidia architecture names" :D
13:59 imirkin_: that's all me
13:59 karolherbst: maybe :D
14:00 karolherbst: but google konws unique users
14:00 imirkin_: not anymore, but it def used to be ;)
14:00 imirkin_: alright. time to get some work done.
14:00 karolherbst: that's what I am thinking about for 5 hours already
14:01 imirkin_: lol
14:01 imirkin_: i just minimize the hexchat window. (was about to do it, but saw your comment.)
14:01 karolherbst: :D
14:15 mupuf: orbea, imirkin_: indeed, one board got its GPU drop from the PCB :D
14:17 mupuf: But the most important thing is, I actively TRIED to break a GPU and failed :o I was blasting a hair drier on a GPU with its fan disabled, and ran benchmarks. The result was that the HW was limiting the power effectively using what I called the FSRM (a crude downclocking)
14:17 mupuf: on maxwell 2+, one cannot modify the registers controlling this... so I would not fear frying a GPU
14:18 mupuf: well, that is to say if you trust that the board has been competently designed and the power delivery is not gonna die on you
14:19 omegatron: sounds promising
15:21 RSpliet: mupuf: meanwhile -> https://www.pcgamer.com/amazon-new-world-killing-rtx-3090-gpus/
15:22 mupuf: RSpliet: yep :D That's what I had in mind :D
15:22 RSpliet: But yes, in a normal setting it's incredibly hard to fry a GPU. I've had a GeforceFX running for 1.5 years before I found out that the fan fell off.
15:22 mupuf: ROFL
15:24 RSpliet: And that didn't involve nouveau ;-)
15:25 karolherbst: I mean I am not surprised that an application is able to fry GPUs
15:26 karolherbst: I bet that running furmark on that GPU would fry it as well
15:26 karolherbst: :P
15:26 RSpliet: I am, the GPU should have plenty of security measures to clock back or shut down before it dies
15:26 RSpliet: They've had such measures since forever
15:27 RSpliet: And they're crude, requiring zero software involvement
15:27 karolherbst: depends
15:28 karolherbst: the FSRM can only do so much
15:28 RSpliet: Sure, and as voltages got lower it is able to do less too
15:28 karolherbst: the thing is.. with furmark the driver actually has to half the clocks anyhow :D
15:29 RSpliet: But there's also just a hard shutdown switch on a temperature trip point AFAIK
15:29 karolherbst: but yeah...
15:29 karolherbst: yeah
15:29 karolherbst: there should
15:29 karolherbst: be
15:29 karolherbst: but
15:29 karolherbst: that's configured by the VBIOS
15:29 karolherbst: or rather devinit
15:29 karolherbst: the GPU by itself doesn't come with defaults
15:30 karolherbst: and who knows what the PMU is doing nowadays
15:30 RSpliet: Yeah, and I wonder whether these days, with their 2m² chips, they may need multiple sensors for different regions of the chip
15:47 karolherbst: ups...
15:48 karolherbst: in pushbuf_kref we ref a bo with refcnt 0 :)
15:48 karolherbst: imirkin_: we might want to assert on that?
15:53 imirkin_: karolherbst: it can happen "naturally"
15:53 karolherbst: well
15:53 karolherbst: it shouldn't :D
15:53 imirkin_: two threads
15:53 imirkin_: one that is ref'ing
15:53 imirkin_: one that is unref'ing
15:53 karolherbst: yeah.. not the case here
15:53 karolherbst: I am not talking about bo_ref
15:53 imirkin_: oh. pushbuf_ref ... yeah, that's not great
15:54 karolherbst: yeah...
15:54 karolherbst: need to figure out why that happens though
15:55 karolherbst: annoying thing is, it gets unrefed by bo_ref a couple of times until the process just crashes for odd reasons :)
15:55 karolherbst: so bo_del gets called a lot
15:55 karolherbst: btw, it's the text bo after resizing.. so I guess my MT fixes still ahve an issue there somewhere
15:57 imirkin_: hmmm
15:57 imirkin_: so after resizing
15:57 imirkin_: i explicit ref the text bo
15:57 imirkin_: but it could be that i don't hold a reference to it
15:57 imirkin_: and it gets deleted anyways?
15:57 imirkin_: i thought the pushbuf would hold a reference...
15:57 karolherbst: no, I think this part is fine
15:58 karolherbst: it's probably some weird interaction with other contexts, _but_ ... I don't know how it comes to it :D
15:59 karolherbst: let me check something...
16:03 karolherbst: why is gdbserver so CPU hunbry.. *sigh*
16:13 karolherbst: okay.. so it seems like to only happen if the application indeed uses multiple contexts
16:13 karolherbst: so my mt patches aren't that broken