07:50mbuf: I tried the Asus GeForce GTX 1050Ti on my desktop (Parabola GNU/Linux-libre x86_64), and X does not start. I am using the nouveau driver.
07:52mbuf: Also, I tested with the following kernel command line options 1. modeset=0, 2. nouveau.accel=1, 3. nouveau.config=NvMSI=0, but, the TTY just stops with "[OK] Started LVM2 metadata daemon"
07:53mbuf: With "nouveau.modeset=0", I am able to open console, but, X still does not load. The X works fine with this desktop with the GeForce GT 730 that I currently use.
07:53mbuf: Finally, I tried "drm.debug=14 log_buf_len=16M" in the Grub kernel command line entry, but, I did not see any log messages, just the system boot TTY messages. Any help in getting X up and running with nouveau will be useful. Thanks!
08:36linkmauve: karolherbst, re SHA-1 collisions, assuming no malicious user, the probabilities of an unwanted collision is so extremely low that you shouldn’t have to worry about it, in the generic case about one in 2⁸⁰ (reduced to 2⁶³ with a conscious attack).
08:36karolherbst: linkmauve: right.. but millions users and millions shaders :p
08:36linkmauve: But millions of billions of billions of hashes?
08:36karolherbst: it's not that unlikely
08:37linkmauve: You have vastly different orders of magnitude here.
08:38karolherbst: you have to remove 2^40 from the 2^63 because of the vast amount of hash operations
08:38linkmauve: And besides caches between different users don’t affect each other.
08:38karolherbst: so you end up at 2^23 roughly
08:38linkmauve: karolherbst, first, it’s not 2⁶³ unless there is an active attack against your shader cache, it’s 2⁸⁰.
08:39karolherbst: then 2^40 :p
08:39linkmauve: I’m not sure I understand your math.
08:40karolherbst: well, it's about likelyhood of _any_ user hitting it
08:40karolherbst: not _one_ user
08:40karolherbst: so you have to assume we have 2^40 tries per "try" so to speak
08:41karolherbst: but that kind of depends on how many hash operations we have per "time frame"
08:41karolherbst: if we consider a single target/user then sure, your 2^80 is valid
08:41karolherbst: but we have 2^20 users I guess
08:41karolherbst: and each can be hit by that
08:42karolherbst: in the end it's not about everybody having a collision
08:42karolherbst: it's about any user having a collission
08:42linkmauve: Are there even 2⁴⁰ shaders having been written?
08:42karolherbst: that was users * shaders
08:43linkmauve: I’d assume most users will compile the same shaders as other users.
08:43karolherbst: yeah.. I know
08:43karolherbst: it was oversimplified
08:43karolherbst: but you also have webgl where shaders are generated on the fly, etc...
08:43karolherbst: driver versions generating different hashes again
08:43linkmauve: More than on desktop GL?
08:43karolherbst: mesa does check for collisions anyway
08:44karolherbst: so it's already in there
08:44karolherbst: linkmauve: browsers parse the glsl code and generate proper glsl at runtime
08:44karolherbst: with bound checks and stuff
08:44karolherbst: but yeah.. probably not as much
08:44karolherbst: but I don't think a million of shaders is a lot
08:45linkmauve: I wouldn’t consider it useful to check for collisions tbh, given the infinitesimal probabilities it’s much more likely you’ll get some memory corruption than a hash collision.
08:45karolherbst: I alone have 40k shaders here and that's just... 30 games?
08:46karolherbst: linkmauve: but what _if_ it happens? :p
08:46karolherbst: but I said it before: checking is cheap
08:46linkmauve: How do you protect against a memory corruption?
08:46karolherbst: you won't gain anything for not doing it
08:46linkmauve: What do you do _when_ it happens?
08:47linkmauve: People who care run ECC memory, people who don’t get silent corruption, random crashes, etc., and deal with it.
08:47karolherbst: right... but still, there is no perf gain for not checking :p
08:47linkmauve: It’s many many orders of magnitude more likely.
08:48linkmauve: But there is no gain for checking either.
08:48karolherbst: not quite sure. Even non ECC memory is quite reliable these days
08:48linkmauve: So why do you do it?
08:49karolherbst: because checking for hash collisions just is the proper way of dealing with hashes :p
08:49linkmauve: karolherbst, last time I heard someone run ECC memory, it reported 2-3 errors *a day*.
08:49karolherbst: it's an easy preventable bug
08:49linkmauve: That was in June.
08:50karolherbst: sounds like broken RAM to me
08:51karolherbst: google did some studies and they didn't find that many
08:51karolherbst: and I am quite sure they have a proper way of knowing
08:52HdkR: I have 128GB of ECC here in my server and zero ECC logs
08:52karolherbst: kind of depends on how much money you spend on RAM and stuff
08:54karolherbst: in the end it highly depends on the DIMM
08:54karolherbst: one is totally broken, the others are fine
08:55karolherbst: anyway, if there is an easy way to prevent bugs, I opt in for preventing those bugs
08:55karolherbst: just piling up and ignoring causes for bugs doesn't help :p
08:55karolherbst: and with checking for hash collisions you remove one cause at least
08:56karolherbst: and you can be pretty sure a bug wasn't caused by a hash collision if everything else was ruled out
08:59karolherbst: anyway, in the end nobody of us is knowledgeable about hashes in this case and my iniitial questions was more like: did it happen? :p
08:59karolherbst: but it's pointless to discuss as we check already
09:00HdkR: I setup ccache with xxhash a few days ago from these discussions :D
09:01karolherbst: I don't use ccache at all after running into several bugs with it
09:01karolherbst: maybe it's better now.. who knows
09:02HdkR: I've had the cache corrupt a few times, but as long as I remember that being a first debug step then it isn't too bad
09:02karolherbst: but in the past it always caused more pain than it solved
09:02karolherbst: HdkR: yeah.. it's just annoying as building doesn't take long usually
09:02karolherbst: the kernel maybe
09:02HdkR: I switch between an assert build and release build a lot on a slow ass ARM device, so it ends up being a bit nice there
09:02karolherbst: but I don't want to risk building the kernel with ccache...
09:02HdkR: On the 2990WX it chews through the world
09:03karolherbst: HdkR: cross compile or set up a build server :p
09:03HdkR: Preprocessor becomes a bottleneck then, which kind of sucks :/
09:03karolherbst:is crosscompiling ppc64le with docker now
09:03karolherbst: HdkR: preprocessor is kind of cheap though compared to the actual compiling
09:04karolherbst: and it's still better :p
09:04karolherbst: the pain about crosscompiling is nfs or however you get the binaries to your target system
09:04HdkR: I have autofs mounting a bunch of NFS folders, so it isn't too bad there
09:05HdkR: Cross compiling might be interesting...
09:06HdkR: but I could also just deal with the heartburn of ccache corrupting if I fault the device :D
09:08HdkR: Oh right, I had ccache issues when the version string didn't change but compiled output did. That was a fun thing
09:09HdkR: Most people aren't doing compiler dev work
09:09karolherbst: ohh.. sounds painful
09:09karolherbst: like shader caches in mesa not updating either :p
09:11HdkR: caches can be hell
09:15karolherbst: it all sucks and preventing bugs is the key to success :p
09:23HdkR: Less bugs means better products
14:50catalinp86: I'm running Kubuntu 20.04 and my computer randomly crashes
14:50catalinp86: I found the following relevant error in the syslog:
14:50catalinp86: Catalin kernel: [131647.456973] nouveau 0000:03:00.0: Xorg: nv50cal_space: -16
14:51catalinp86: as far as I understand this means that the GPU is out of space and it's a problem with the nouveau driver (that came preinstalled) with Kubuntu 20.04
14:51catalinp86: any ideas what could make it stop randomly crash
14:52catalinp86: brb 2 secs
14:53catalinp86: so, yeah,... .any ideas?
14:54catalinp86: forgot to mention... the message "Catalin kernel: [131647.456973] nouveau 0000:03:00.0: Xorg: nv50cal_space: -16" is spammed tens of times from for 2 minutes... like a hundred times...
14:54catalinp86: only the value between the square brackets changes
14:54catalinp86: i'll wait for a reply
15:04AndrewR: catalinp86, how much VRAM your card have? I just got 1Gb version of GT240 in addition to 384Mb of G92 ..and now heavy benchmarks run better (Unigine heaven, 3dmark2005 via wine). May be modern KDE is also heavy on vram usage .....
15:10catalinp86: Andrew, there's 2GB
15:10catalinp86: nouveau 0000:03:00.0: DRM: VRAM: 2048 MiB
15:12AndrewR: catalinp86, then sorry, no other simple idea what can cause this ... /reboots
17:01AndrewR: hi all (again). I hanged my second card, but send email to nouveau list with vdpau related traces and logs
17:21AndrewR: recompiling mesagit with #define NOUVEAU_VP3_DEBUG_FENCE 1 (in src/gallium/drivers/nouveau/nouveau_vp3_video.h) Any additional checks I should enable? Also, aren't those asserts better to bedebug messages/printks? I mean in https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv50/nv98_video.c - switch statement at line 196 and down ....
17:25AndrewR: I mean, pipe driver should never feed nouveau with invalid values there, right? But at least one assert was firing for me recently (not those, one at nv98_decoder_decode_bitstream ) ..so something was not decoded correctly.. and is there possibility such assert will leave some GPU internal in even more inconsistent state ?
17:57AndrewR: reboot time, I guess (i think there were some scripts for re-setting secondary GPU, but I don't have them, and too dumb for recreating them ...)
18:04AndrewR: more asserts! cin: ../src/gallium/drivers/nouveau/nouveau_vp3_video_vp.c:368: uint32_t nouveau_vp3_fill_picparm_h264_vp(struct nouveau_vp3_decoder *, const struct pipe_h264_picture_desc *, struct nouveau_vp3_video_buffer **, unsigned int *, char *): Assertion `dec->refs[refs[j]->valid_ref].vidbuf == refs[j]' failed. Got this one while trying CinelerraGG with vdpau acceleration ;}
18:09AndrewR: strabge, it doesn't hang so far :}
18:23AndrewR: so, with NOUVEAU_VP3_DEBUG_FENCE 1 it decodes slower a bit (238 vs 226 seconds on 3:36 h264 1080p, 60 fps video), but no hang so far ...time to try FIVE mplayers at once .....
18:24AndrewR: it survived!
19:36AndrewR: so, interesting ..I removed one assert in /src/gallium/drivers/nouveau/nouveau_vp3_video_vp.c at line 368, and now Cinelerra loads h264 files w/o asserting, and even plays them. but still I run into this disp: ERROR 1 [PUSHBUFFER_ERR] 02  chid 0 mthd 0000 data 00000400 / DRM: core notifier timeout - and thus most likely will reboot machine soon
22:16AndrewR: so, it was VP4, not VP3 .. I mis-remembered those :/ But I also got new assertion :} gst-play-1.0 + mpeg2 file via VA-API = gst-play-1.0: ../src/gallium/drivers/nouveau/nv50/nv98_video.c:56: void nv98_decoder_decode_bitstream(struct pipe_video_codec *, struct pipe_video_buffer *, struct pipe_picture_desc *, unsigned int, const void *const *, const unsigned int *): Assertion `ret == 2' failed.
22:36AndrewR: and also this accumulating Xorg load under dri3 apparently comes from nouveau_drv.so itself (according to operf/opreport from root user on X's pid). Time to rebuild with debug symbols ... but not now, another task for tomorrow.