02:38darksurf: Hey, can I get some assistance in here. I'm trying to learn how nouveau works.
02:39darksurf: I'm a software tester for Sabayon Linux and we've considered dropping the legacy nvidia-drivers for nvidia cards Geforce 500 and lower and move them to nouveau. But after some testing on both drivers nouveau seems to only get 6-10fps on a GTX 460, 560, and 580.
02:40imirkin: darksurf: pastebin output of 'glxinfo'
02:40darksurf: after testing the same exact cards on the same install with nvidia-drivers 390.XX I get 50+fps
02:40darksurf: sure, thanks for the assistance man :)
02:40imirkin: fermi-era gpu's don't get reclocking
02:40imirkin: but ... depending on which one it is, you should get a middle- or low-end clock speed on boot
02:41imirkin: as you can imagine, clock speed greatly affects, well, performance
02:41imirkin: so having 10fps in nouveau and 50fps with blob drivers isn't surprising.
02:41imirkin: (faster memory = faster rendering? who knew)
02:43imirkin: (fermi-era = GT 4xx/5xx)
02:43imirkin: yeah, that all looks fine
02:43darksurf: Oh, that would make sense :(
02:44darksurf: its definitely low end clock speeds
02:44imirkin: you can check with cat /sys/kernel/debug/dri/0/pstate
02:44imirkin: the last line which says "AC: ... " lists the current clock speeds
02:44darksurf: I'm afraid that debugging directory doesn't exist.
02:45darksurf: most likely not enabled in kernel
02:45imirkin: you must be root to access it
02:45imirkin: but also must be enabled in kernel, yea
02:45darksurf: spot on man! you were right. just a perms issue
02:45darksurf: tester /home/darksurf # cat /sys/kernel/debug/dri/0/pstate
02:45darksurf: 03: core 50 MHz memory 135 MHz
02:45darksurf: 07: core 405 MHz memory 324 MHz
02:45darksurf: 0f: core 810 MHz memory 2004 MHz
02:45darksurf: AC: core 50 MHz memory 135 MHz
02:46darksurf: so yeah :(
02:46imirkin: so this particular card boots into the lowest perf level
02:46imirkin: and things go faster when RAM is clocked at 2ghz than when it's at 135mhz
02:47darksurf: this is a GTX560. took it out of a junk bin for testing purposes.
02:47karolherbst: that's an evil card
02:47imirkin: iirc the higher-end fermi boards boot into the lowest perf level
02:47imirkin: while the lower-end ones boot into a middle perf level
02:48imirkin: so you actually get better perf with nouveau with the lower-end fermi boards
02:48karolherbst: but thise low perf state is like seriously low :D
02:48darksurf: evidently people still use the GTX500 series.. Got a couple complaints when I mentioned possible deprecating.
02:48imirkin: darksurf: i have a NV34 plugged in =]
02:48karolherbst: I _think_ reclocking the core to 405 MHz at least should work with the current code we have though
02:49imirkin: (last supported by the nvidia 170.x series iirc?)
02:49darksurf: I'm not willing to leave users out in the cold, but if I was able to find a way to get nouveau as a proper alternative, I was willing to go that route.
02:49karolherbst: sadly performance wise that's not an option so far with nouveau :/
02:49imirkin: karolherbst: yeah, i think core reclocking is easy
02:49imirkin: memory ... not so much
02:49imirkin: darksurf: depends what users need
02:49imirkin: if they need a working desktop, nouveau tends to be quite sufficient
02:49karolherbst: imirkin: maybe I should get myself some fermi cards and continue working on that or something...
02:50imirkin: karolherbst: yeah, i bet you could get RH to pony up a few hundred E for that
02:50imirkin: or just look for the local junk bin :)
02:50karolherbst: yeah... should have thtought about this earlier, because now would be the right time to just buy some :)
02:50karolherbst: ohh maybe I can do so tomorrow
02:51karolherbst: tomorrow would be essentially the last day if I want them next week :)
02:51darksurf: well, if they're using a card that old, I didn't think gaming was an option, not to mention it was a power hungry demon :( 180W for 650 Ti . might as well get a 1050 ti 3x the performance 75w
02:51karolherbst: darksurf: well, with nouveau you usually want to have a tesla or kepler/maxwell card anyway
02:52darksurf: but to be fair, the 560 Ti's performance was somewhare on par with the Ryzen R3 2200G APU graphics... even if it drank stupid amounts of power.
02:52gnarface: no the 560 Ti is still perfectly suitable for most games if it weren't for nvidia deprecating support and the commercial publishers spending money to remove support at the same time.
02:52karolherbst: gnarface: mhhh, not really
02:52gnarface: karolherbst: really. i'm a gamer.
02:52karolherbst: the 560 is quite a slow card
02:52gnarface: it's fine, most games these days don't max out even my 650
02:52karolherbst: I have a 970m which isn't even enough for most
02:52darksurf: You're talking Low quality 720P gaming
02:52karolherbst: gnarface: that's simply not true
02:53gnarface: karolherbst: i'm assuming your "most" is actually a subset of currently games, mostly just AAA titles. you're factually inaccurate. i'm sorry but i'm not gonna just let you say that.
02:53imirkin: darksurf: the GTX 480 was a true power hog. the rest of the fermi series pales in comparison...
02:54darksurf: yikes. I told one guy he' d probably save power on his electric bill, enough to pay off the 1050 Ti for 50pounds on Amazon.
02:54karolherbst: gnarface: answering by quantity isn't the way to go here. there are many games one might want to play which the 560 is simply not enough for
02:54karolherbst: sure, there will be always be low spec games
02:54gnarface: karolherbst: yea but it's factually not MOST GAMES
02:54karolherbst: but waht _if_ the user want to play something high spec?
02:54imirkin: the GTX 480 was the very first DX11 card
02:54gnarface: *most games* are low-spec games, in that sense
02:54darksurf: OMG DX11 is that old?!
02:54imirkin: 600 WMinimum System Power Requirement (W)
02:55darksurf: It doesn't feel like that long ago :(
02:55imirkin: and the G80 was the first DX10 gpu (2006)
02:55imirkin: time flies when you're having fun
02:55karolherbst: gnarface: that kind of makes your argument useless, because by definition the later we go in time, the more games will become relatily low spec, by definition
02:56karolherbst: anyway, it's not something which actually helps a user make a proper decision
02:56karolherbst: also, it's not always about AAA titles what I meant
02:57darksurf: So, basically nouveau is not a real replacement for "legacy" NV cards I take it?
02:57karolherbst: also not AAA titles sometimes just not care about perf and run stupidly slow on even modern cards
02:57karolherbst: darksurf: depends on the card
02:57imirkin: darksurf: all depends what one is looking for
02:57imirkin: i just need something that displays things
02:57imirkin: and maybe the odd gaming run
02:57imirkin: nouveau's more than sufficient
02:57gnarface: karolherbst: my argument is just that stating "most games" the way you stated it shows a lack of a realistic understanding of what is actually available just on Steam
02:58gnarface: (let alone the rest of the market)
02:58darksurf: only reason I say "legacy" is NV calls anything older than Geforce 600 legacy and they require legacy drivers.
02:58darksurf: 600 and higher series are still supported by latest LTS 410.XX drivers
02:59darksurf: I really dislike proprietary blob drivers :(
02:59gnarface: karolherbst: it's fine if you admit to hyperbole, and it's fine if you claim that one or two AAA titles nullify my argument because of your personal tastes for games, but that's a far cry from just flat out saying most games can't be played with only 1GB of video ram. that's just not true by any stretch of the imagination
02:59gnarface: there aren't THAT many AAA titles out there
02:59imirkin: the DX10 gpu's that played games just fine in 2009 continue to play those same games just fine in 2019 :)
02:59imirkin: and those games were and continue to be plenty of fun
02:59gnarface: there are a ton of indie titles that many may discount, but they're still commercially published so they still count
03:00darksurf: I'm an AMD fanboy and even went through the ringer when AMD couldn't write drivers for a knife to cut themselves out of a wet paper bag :( Glad its all opensource now.
03:00imirkin: yeah, i recommend AMD to anyone who'll listen
03:00imirkin: (and to lots of people who won't)
03:01darksurf: rocking a Radeon VII in my box right now :D AMDGPU FTW
03:01karolherbst: gnarface: we'd both have to admit to hyperbole
03:01darksurf: 5.0 kernel + mesa 19 = Linux Freesync
03:02gnarface: karolherbst: sure, i could qualify my statement by saying that 30fps is fine for a lot of games, and some people will maintain that if it can't push 60fps minimum, it doesn't count as "supported" - but now that's a question of semantics, and whether you count actually being able to run the game as "playing" it
03:02darksurf: My old HD7850 is sitting in my server. I was hoping to use it for transcoding but plex....... why only nvidia?
03:03karolherbst: darksurf: because there are only shitty APIs for encoding on linux
03:03karolherbst: or none
03:04karolherbst: what APIs should plex use on AMD?
03:04darksurf: ?? vaapi?
03:04darksurf: vaapi works on intel and amd
03:04karolherbst: is encoding even hooked up for AMD?
03:04darksurf: AMDGPU supports VCD and VCE
03:04karolherbst: well, then write patches
03:05darksurf: lol, I wish I could code like that.
03:05karolherbst: allthough I think plex supports vaapi somehow
03:05darksurf: I'm really a sys-admin/net admin
03:06karolherbst: darksurf: that's no excuse :p
03:06darksurf: I can script, I don't write good code. I've written minor patches for stuff, but nothing low level like C
03:06karolherbst: but yeah, I totally forgot about vaapi being able to encode... was only thinking about vdpau
03:06gnarface: isn't there an opengl to vaapi wrapper that should work for just about everything ("work" here meaning run, not necessarily reach some arbitrary threshold of performance improvement)
03:06karolherbst: darksurf: before I started working on nouveau I only did Java :p
03:07karolherbst: gnarface: it's about encoding
03:07karolherbst: not decoding
03:07gnarface: oh that wrapper only works for decode?
03:07gnarface: i guess that makes sense
03:07karolherbst: it essentially wraps vaapi to gl/vdpau
03:07darksurf: actually AMDGPU supports both vaapi and vdpau
03:08karolherbst: yeah.. nouveau should do as well somehow? don't know how the vaapi situation looks like with nouveau. Most likely very sad
03:08karolherbst: or maybe it works.. dunno
03:08karolherbst: video decoding is in a sad state with nouveau anyhow
03:08gnarface: isn't it just about the reclocking issues though?
03:09karolherbst: first off you need firmware
03:09karolherbst: you have to extract them from the binary driver
03:09karolherbst: so nobody is able to ship those files
03:09karolherbst: have to run a script locally
03:09gnarface: oh, for all of them, not just the old ones, i see.
03:09karolherbst: second, we don't even support it on anything later than kepler1
03:09darksurf: don't they have to be signed by nvidia, not just extracted from the blobs?
03:09gnarface: right, i guess that's why i thought only the old ones needed the firmware
03:09karolherbst: that's a different issue
03:10karolherbst: we just never fully reverse engineerd how those engine work for decoding video streams
03:10karolherbst: so we never wrote any firmware ourself
03:10darksurf: bummer, thats a lot of work too I can imagine.
03:10karolherbst: maybe they even need to be signed to get beyond copy protection and all that crap
03:11darksurf: I don't know why nvidia feels supporting nouveau would hurt them...
03:11karolherbst: darksurf: yeah... for encoding it actually makes sense, but that's a totally different kind of beast to re as well
03:11karolherbst: decoding is fine for most CPUs as long as you stay within fullhd
03:12karolherbst: and even for 4k it kind of works on higher end CPUs
03:12darksurf: Its really helped AMD a ton! Valve, redhat, and swaths of the open source community have super charged their drivers.
03:12karolherbst: well nvidias driver isn't/wasn't in the sad state the AMD one was
03:12darksurf: LOL true
03:12darksurf: fglrx was a nightmare.
03:13karolherbst: I think fglrx didn't even relate to their win driver
03:13karolherbst: I don't know for sure
03:13darksurf: I also remember the non-existant support for my ATI Rage laptop cards back in the day.
03:13karolherbst: but for nvidia it's all the same code for every plattform
03:13karolherbst: more or less
03:13darksurf: Oh, it shared code. thats one reason it was so bad.
03:14darksurf: they wrote as much OS agnostic code as they could when they got serious, but windows was still the priority.
03:14karolherbst: well, nvidia has a huge compute market
03:14karolherbst: and compute is done on linux
03:15darksurf: Oh, I'm aware of that. I work in a datacenter. Teslas everywhere.
03:16darksurf: AMD is making an interesting comeback this year though in both CPU and GPU business markets. Epyc looks very impressive and the M150 instinct cards look promising.
03:16karolherbst: doesn't change the core issue
03:16karolherbst: AMD is a classic hw company, nvidia... is not
03:17darksurf: nothing like non open opengl calls :(
03:17karolherbst: especially for the shaders, AMD still solves a lot of stuff inside the hw, nvidia gave up on that
03:18darksurf: stupid gameworks and its sabatoging methods.
03:18karolherbst: not saying that nvidia doesn't focus on hardware, but they have no issue solving a lot of stuff in software too
03:19karolherbst: everything you expect from a CPU you have to do inside your compiler on nvidia, like instruction reordering, scheduling, etc...
03:19karolherbst: AMD hw still seems to do it
03:19karolherbst: scheduling meaning you have to stall instructions and declare read/write barriers yourself
03:19karolherbst: and stuff
03:20karolherbst: or even if a value of a register is reused, you have to tell the hardwer
03:20karolherbst: hardware doesn't know it
03:20darksurf: wow, thats interesting overhead :(
03:21karolherbst: darksurf: yeah,.... on maxwell getting the sched stuff more or less right gave us a 2x perf improvement
03:21HdkR: It's just software overhead
03:21darksurf: thats huge!
03:21karolherbst: well, the alternative was to fall back to the longest possible stall count
03:22karolherbst: back then kepler, even with equal clocks was fastere due to that
03:22karolherbst: anyway, ROCm is a much better data point to see AMDs change, maybe that will be a success, maybe not
03:23karolherbst: ROCm is a pain in the ass to package though
03:25darksurf: Yeah, but it's opensource ;)
03:26karolherbst: it's not a community project
03:26karolherbst: try to add patches for adding support for nvidia gpus
03:26karolherbst: see how well that would go
03:27darksurf: I dunno. I bet they wouldn't care. You have to realize, they donate standards to the VESA foundation all the time.
03:27karolherbst: actually, if I would have enough time, I'd even try that out just to see the reaction
03:27darksurf: Freesync was donated. "Dockport" tech was also donated to USB-C.
03:28airlied: they already have a standards body for "ROCm" it's just nobody cares
03:28darksurf: they give away a lot of tech to make industry standards.
03:28airlied: it's pretty much the anyone but nvidia and intel stds body
03:28darksurf: OMG airlied, you're in here?
03:28karolherbst: airlied: :D because nvidia or intel didn't care or because they are on the "not invited" list?
03:29airlied: made no sense why it didn't use Khronos, except it's amd :-)
03:29airlied: karolherbst: bit of both
03:29airlied: http://www.hsafoundation.com/members/ - guess how many of them have contributed anything :-P
03:30airlied: off by one :-P
03:30karolherbst: ahh, who is the third? :p
03:30imirkin: other way
03:30airlied: but yeah I'd love to troll by porting ROCm to nouveau
03:31imirkin: what is ROCm precisely?
03:31karolherbst: airlied: actually, we could wire up clover maybe? :D
03:32airlied: imirkin: a userspace compute stack that doesn't look anything like someone renamed chunks of cuda
03:32imirkin: i see :)
03:32karolherbst: well, it's llvm based, so it kind of fits with clover :p
03:33airlied: well afaik it only uses AMD GCN binary as it's "IR"
03:33karolherbst: I guess the kernel APIs stuff is a bit more challenging
03:33karolherbst: airlied: okay, sure, but that's something which could be changed I guess
03:33karolherbst: allthough yeah, the final binaries would only contain GCN code I'd assume
03:33karolherbst: but... we could just add the llvm ir as well or something
03:33karolherbst: still painful
03:34karolherbst: or just be sane and embeded spir-v
03:34karolherbst: anyway, AMD did a sad attempt to promote ROCm at XDC :/
03:35imirkin: is there anything wrong with it at the surface?
03:35imirkin: i.e. is it a proper useful API, just no one uses it?
03:35karolherbst: imirkin: well, just another vendor lockin
03:35imirkin: is there any non-demo-app usage of ROCm anywhere at all?
03:36karolherbst: it does implement CL I think
03:36imirkin: that doesn't count
03:36karolherbst: and more or less cuda
03:36imirkin: i mean the CUDA-ish API of it
03:36airlied: yeah it's vendor lockin with a drop of source code
03:37airlied: but you don't really write ROCm, you write CUDA/HIP or OpenCL on top
03:37airlied: and the binary you get out then runs on the ROCm runtimne
03:37airlied: which is like the CUDA runtime, a bunch of APIs to get the gpu to execute stuff and manage memeory
03:40karolherbst: imirkin: so my assumption is: none
03:40karolherbst: it's mainly a sad attempt to try to attack CUDA by just being able to consume the CUDA stuff
03:41karolherbst: airlied: how is the tooling around ROCm by the way?
03:41karolherbst: never looked that much into it
03:41karolherbst: like, do we get a proper gdb with it?
03:41airlied: karolherbst: not sure about the quality, there's a bunch of rocm specific stuff, don't think there's a useful gdb
03:41karolherbst: ahh, sad...
03:42karolherbst: the cuda-gdb thing is super usefull, tried it once and completly convinced me that that's a good thing to have
03:42gnarface: am i correct to assume that we're not bringing up porting cuda to nouveau because that has been deemed insane or impossible?
03:42karolherbst: kind of sad nvidia doesn't support it for CL :/
03:42HdkR: Nvidia supports CL? :P
03:42karolherbst: HdkR: not with gdb
03:43HdkR: The joke is that they are still stuck in CL 1.2 land with no idea if that will ever change
03:43karolherbst: HdkR: there is support for CLC 2.0 though
03:43karolherbst: well "support"
03:43karolherbst: you can use CLC 2.0
03:43HdkR: Wasn't that a driver drop over a year ago that wasn't ever touched againm?
03:44airlied: gnarface: just pointless working on a standard you don't have access to, and that is so vendor specific
03:44karolherbst: and what would be the benefit of cuda on nouveau?
03:44HdkR: Also zero documentation on the ELF format for taking the blob emitted from a compiler for running :P
03:44karolherbst: HdkR: not an issue
03:44karolherbst: PTX is the bigger one
03:44karolherbst: need to write a PTX lexer/parser first :p
03:45gnarface: karolherbst: only a couple things even use it, a few games, and steam in-home streaming... there is admittedly not a huge need
03:45karolherbst: for some reason there is llvm to ptx but no ptx to llvm
03:45karolherbst: wondering why llvm devs never demanded that as a req for upstreaming it
03:45karolherbst: well, nvidia wouldn't have added it anyway
03:46karolherbst: for all nvidia cares, PTX is a one way road
03:48karolherbst: airlied: allthough I think nothing really in CUDA would be a huge problem for other GPU vendors to implement to be honest. and PTX is high level enough
03:48gnarface: don't coin mining rigs also use cuda?
03:48karolherbst: gnarface: because CL sucks
03:48karolherbst: well with the nvidia driver
03:48karolherbst: compared to cuda
03:48airlied: karolherbst: yeah but then you just give control over your future to nvidia
03:48airlied: which isn't really a path to success
03:48karolherbst: not saying it's a good idea ;)
03:49karolherbst: gnarface: right... which still leaves us with the usefull workloads ;)
03:49karolherbst: there isn't really that much stuff you really need CUDA for
03:49karolherbst: CL would do as well if you'd care enough, just that CL is a bit broken in a few areas
03:50gnarface: true, nothing i've seen it used for so far hasn't been obviated by modern hardware
03:51karolherbst: airlied: ohh btw, did you figure out what you had to change in regards to your sycl issue?
03:51karolherbst: or did no llvm pass actually help?
03:51gnarface: as far as i know, the first big game it debut in was Rage, which now runs faster on my (relatively newer but still archaic) hardware without it than with it
03:51karolherbst: gnarface: for physx, right?
03:52gnarface: uh, yea i think that was what it was
03:52karolherbst: mhh, there was this hack to just get a second slower nvidia GPU and offload physx on that
03:52gnarface: hah that would be awesome!
03:52karolherbst: also on linux there was never really hardware support for it from the start
03:52karolherbst: at some point they've added it
03:52karolherbst: but games never updated
03:52karolherbst: so physx on linux was always software
03:53karolherbst: gnarface: well, some people did it on windows with great success
03:53karolherbst: put GPUs like a 640 as the physx one or something
03:53airlied: karolherbst: got sidetracked into packaging tasks, but I couldn't find a nice way to remove the memcpus yet
03:53airlied: memcpys and other C++ crap
03:53karolherbst: yeah... that C++ stuff in sycl looks really annoying
03:53karolherbst: all that templating
03:55karolherbst: especially because every vendor reimplements it
03:55gnarface: the physX thing is really extra annoying now because games programmed with it rarely even look the same without it... there's no option to just accept a performance hit to get the same visual effects on your screen. they literally vendor lock game content with it. i hate that.
03:55karolherbst: airlied: I'd assume there is some API definition for the C++ stuff?
03:55airlied: yeah but I'll probably just push ahead and see what I can make execute next week
03:55airlied: karolherbst: the SYCL spec
03:55karolherbst: mhhh, mhhh
03:55HdkR: PhysX is more of a physics library than anything. So removing that would require reimplementing everything there
03:55airlied: karolherbst: then worry about making the produced code prettier if at all possible
03:55karolherbst: maybe we should just write our own implementation which doesn't do much of the insane things?
03:56karolherbst: allthough maybe the spec is written in a way where that isn't possible
03:56gnarface: if there were a hack to get physX working in the linux version of borderlands2 without recompiling borderlands2, people would buy it. i wonder if 2k games would sue about that though?
03:56airlied: karolherbst: there is also triSYCL, but really templating is the only way
03:56karolherbst: gnarface: well, physx is a compute API
03:56airlied: the whole idea is to make it look and feel like normal C++ and even compile on a nomal compiler and run on a CPU
03:56karolherbst: but right, we have compute shaders for that now
03:56airlied: then if you want a GPU you just use the special compiler and runtime
03:57airlied: I can't imagine that abstraction working without templates etc
03:57HdkR: The combined host and guest code in a single file is something that people really like
03:57gnarface: karolherbst: does that mean there would be no possible way to write some sort of wrapper or replacement library like dxvk?
03:57HdkR: I can understand the appeal :)
03:57karolherbst: airlied: uff, sad
03:57karolherbst: HdkR: yeah... but in CUDA you call functions
03:57karolherbst: in sycl... you have that weirdo templating stuff
03:58HdkR: Yea, haven't looked at how sycl does it. Cuda just adds <<>> around compute invocations
03:59HdkR: Fairly clean :D
03:59airlied: HdkR: https://pastebin.com/raw/6rGPqdP4
03:59airlied: cgh.parallel_for is the gpu code
04:00HdkR: wow....uh, verbose
04:00karolherbst: airlied: do you need a special sycl compiler for that btw?
04:00karolherbst: or just gcc/clang and the magic is hidden inside some libsycl.so or something?
04:01karolherbst: HdkR: that Accessor thing is a template class of course ;) you don't want to see the llvm ir/spir-v generated out of that
04:02karolherbst: at some point airlied started to wondering why there is a cast to char* inside the spir-v
04:02karolherbst: uhm, uchar
04:02karolherbst: but essentially
04:03airlied: karolherbst: to execute on cpu no special compiler
04:03airlied: on gpu special compiler
04:04karolherbst: I would be interested if on intel hw that produces sane code in the end
04:05karolherbst: I mean that thing shouldn't be more then 3 instructions on the GPU
04:05karolherbst: maybe 4 if you do weirdo int64 stuff
04:08airlied: yeah I'll have to go and build their backend at some point when I've got nothing better to do :-P
10:55RSpliet: karolherbst: ptx is an assembly-like language, lexing should be fairly trivial and it's already SSA. Don't think converting PTX into NIR should be that problematic, just... someone needs to throw hours at it :-P
10:56HdkR: Someone caring is hard :P
10:56RSpliet: But... do CUDA binaries ship PTX in its textual form, or in a binary format?
10:57RSpliet: HdkR: Lots of people care, not many people have the resources to do something about it.
10:59HdkR: Interesting. I figured people would care more about an ecosystem shift rather than compatibility for a vendor specific API
11:05RSpliet: Depends on who you talk to. Ecosystem shift is mainly pursued by OSS enthousiasts and hobbyists alike. Most people I speak to "just want their GPU code working", and are under the impression that CUDA as a developer model is more productive than say OpenCL
11:07RSpliet: My comment was more generic however. People care about lots of problems within and outside software, hence the favourite pastime of pretty much anyone is complaining. Making a difference is harder, and it takes some real effort to move it far enough up the priority list to be able to make time for it.
11:09HdkR: That's why I understand the appeal of a cuda-esque API that isn't vendor specific
11:10HdkR: Time consuming problem to resolve though
11:13RSpliet: API is one side. Tooling is the other, often overlooked, problem. Devs need to understand code behaviour and performance, and graphical IDEs/performance analysis suites play a big role in that.
11:14RSpliet: The Khronos approach should be to have that go hand in hand. Have public APIs in the standard that query performance metrics. That's not easy with an API targetting (SIMD) CPUs, (SIMT) GPUs and (custom pipeline) FPGAs though, the metrics to look at differ.
11:16RSpliet: SYCL appears to be a step in the CUDA direction API-wise
11:17RSpliet: For now I think it'd be most productive for mesa to focus on implementing existing APIs, rather than try to take a trailblazing role.
11:21HdkR: I believe in that as well
11:21HdkR: I don't want to imply otherwise :P
12:39karolherbst: RSpliet: it's not SSA
12:40karolherbst: why do you think it is?
13:18imirkin: skeggsb: do your latest changes enable vulkan-style memory allocation?
13:19imirkin: (on pre-pascal GPUs)
13:21imirkin: hm, looks like not, but it could be added
13:22imirkin: basically add a gem create that takes an explicit address
14:21RSpliet: karolherbst: oh gosh, I could swear I looked at PTX code where register numbers were only counting upwards.... but this PLANG presentation I just found very adamantly states it isn't. My bad! Converting non-SSA registers to SSA is something we do in various code generators anyway though, shouldn't be the big blocking issue.
14:22RSpliet: Or is it like very subtly non-SSA? As in, it assumes infinite single-write registers, but has already resolved phi-nodes or something odd like that?
14:26karolherbst: RSpliet: ptx doesn't even have phi nodes
14:27karolherbst: you even have branches into random labels
14:27karolherbst: but control flow in ptx is a little weird
14:28RSpliet: karolherbst: with "resolved phi nodes" I mean they have been removed and all producers of the value in a phy node have been assigned the same "register". But it looks like they encourage you to do register allocation in PTX and limit the number of "registers". Despite, presumably, the ptx-to-binary pass performing register allocation after a bunch of optimisations...
14:28karolherbst: imirkin: his patches basically make use of pascals fault recovery feature
14:28karolherbst: RSpliet: you don't have to care about register amounts in PTX
14:28karolherbst: PTX isn't assembly
14:29karolherbst: PTX is really a high level language which isn't all that high level in the end
14:29karolherbst: comes with a full compiler though
14:31RSpliet: I guess it depends on your definition of assembly. PTX instructions don't one-on-one map to machine instructions. But its means of control flow are ones you'd only expect in assembly... and maybe very old spawns of Basic :-P
14:31imirkin_: karolherbst: the fault recovery thing is a part of it
14:31imirkin_: but that's not the part i was talking about
14:31imirkin_: i was talking about the "managed" vmm option
14:31RSpliet: Oh it does have curly bracket grouping. I guess thats slightly more fancy than control flow in assembly
14:32karolherbst: imirkin_: mhh, right.. that's more of the HMM stuff where you essentially have a shared SVM on both devices, or do you mean some of the previous commits?
14:32karolherbst: RSpliet: even without that PTX isn't assembly
14:32imirkin_: i mean the features that were needed to implement SVM
14:32karolherbst: imirkin_: right, so HMM
14:32RSpliet: I guess it depends on your definition of assembly.
14:32imirkin_: specifically that guy
14:33imirkin_: which has nothing to do with SVM or HMM
14:33imirkin_: (except that SVM requires something like that in order to exist. but it could be used for other things too.)
14:34pmoreau: imirkin: Going through Nouveau’s RA to get a better grasp of what is going, for reviewing https://github.com/karolherbst/mesa/commit/9de3e3e46c7ba36fdd161fd78dcaed25d2e061dc, so I might ask you some questions. O:-)
14:35karolherbst: imirkin_: well, it looks like it can be used for vulkan indeed
14:36karolherbst: RSpliet: if you need more than an assembler it's not assembly ;)
14:44imirkin_: karolherbst: well, there's no way to make use of it, at least, so definitely not *all* that's needed :)
14:45imirkin_: pmoreau: that's Connor's patch, right?
14:45imirkin_: reading his description so far, seems to be spot on.
14:47karolherbst: yeah.. without that patch we would have issues doing a merge(merge(r0, r1), r2d) thing for example
14:47karolherbst: I think...
14:47pmoreau: Good to know; the description makes sense to me as well.
14:49karolherbst: maybe I could dig up some examples
14:51imirkin_: pmoreau: at least the things in the description appear to be 100% accurate. i haven't thought about whether his approach to solving it makes sense, but the problem statement is correct.
15:08imirkin_: pmoreau: i'd encourage you to write a SHORT piglit which encounters this issue
15:08imirkin_: shouldn't be too difficult
15:08imirkin_: (by which i mean shader_test, of course)
15:09pmoreau: I should do that, good idea
16:33RSpliet: karolherbst: Well... I can't help the urge of getting all academic right now, but: if the assembly targets a binary format for a virtual machine rather than a real machine, is it still assembly? Like... Jasmin is an assembler, even though you also need a virtual machine for execution ;-)
16:34karolherbst: RSpliet: well, it's still just an assembler you use to convert the text input into machine code
16:34karolherbst: or bytecode
16:36karolherbst: also, what's important for an assembler language is, that everything maps 1:1 into some machine instruction, be that a virtual machine or real hardware
16:36karolherbst: with PTX you don't have that
16:36RSpliet: It's grey area, but the follow up question is whether you can get by with running PTX through an assembler that targets say the "GPGPUSim" ehh... "virtual machine".
16:36karolherbst: you can't
16:36RSpliet: Why not?
16:37karolherbst: because there is no ISA where PTX instructions map 1:1 to the ISA
16:37RSpliet: GPGPUSim works directly on PTX.
16:40RSpliet: Also, it's a misconception that an assembler performs 1:1 translation. There's definitely cases where the translation is 1:n (like an assembly "ret" instruction mapping to "addi sp,4, j [sp-4]" or something weird like that. Not to mention machines translate machine instructions (macro-ops) 1:n to micro-ops.
16:43RSpliet: Anyway, it's all semantics, it doesn't make a difference to how easy or difficult it is to translate PTX to say NIR :-D
16:49karolherbst: not really, right
16:55pmoreau: Grr, GL_ARB_gpu_shader_int64 requires OpenGL 4.0. Thankfully ARB_gpu_shader_fp64 only requires OpenGL 3.2. :-)
16:59HdkR: Interesting. I wonder why it requires 4.0
17:00HdkR: Oh. fp64 is core in 4.0 I guess is why?
17:02pmoreau: No idea
17:02HdkR: I'm going to assume that until proven otherwise :P
17:03pmoreau: Sadly Nouveau doesn’t expose ARB_gpu_shader_fp64 on Tesla, so I’m screwed anyway. :-/
17:03RSpliet: HdkR: I presume historic reasons? From my naive understanding, I don't think many games really need a lot of integer arithmetic in the first place. If I have to guess, OpenGL 4.0 was announced roughly around the time CUDA started taking off and NVIDIA got pointers longer than 32-bits - so they started needing 64-bits arithmetic stuff. Bright minds within NVIDIA probably went "If the driver/HW supports it, why not expose it as a G
17:03pmoreau: HdkR: Could be from one of the other extensions listed as dependencies? https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_gpu_shader_int64.txt (float is not part of those.)
17:04RSpliet: Hah, AMD announced it. I bet the story is pretty similar, but with OpenCL instead of CUDA :-P
17:05HdkR: Sure. float64 is core in 4.0 so you don't need to mention it. They could have said 3.2 + float64 for the conversions added. :P
17:06pmoreau: Anyone running Nouveau that is advertising fp64 support? Would appreciate if you could give https://hastebin.com/ejuzosujin.cs a run and give me back the output with NV50_PROG_DEBUG=1.
17:06RSpliet: pmoreau: that would take compiling mesa with debugging support? :-P
17:07pmoreau: RSpliet: You no longer have such a thing on your computer? :o
17:07RSpliet: pmoreau: haven't rebuilt in years... probably don't want to rely on my results for that.
17:08HdkR: RSpliet: I'm not disputing why it became an extension. Just why the minimum required GL version is 4.0 when it could have been GL 3.x + extensions. I can only assume reasonings for that :P
17:08pmoreau: I wouldn’t mind if you could just run it with your system Mesa.
17:10RSpliet: pmoreau: are you deliberately comparing "selector" against an integer immediate value?
17:10RSpliet: Oh... wait
17:10RSpliet: selector != selection
17:11pmoreau: Yeah, variable names could be better
17:11RSpliet: So could my eyes
17:11pmoreau: I can’t do much on that front though
17:12pmoreau: Besides lending you my glasses, but they might not suit you (and I still need them :-D)
17:17imirkin_: pmoreau: i have some patches for enabling fp64 on G200
17:17imirkin_: they were incomplete though
17:17imirkin_: need fallbacks for a bunch of things
17:19imirkin_: pmoreau: i think in your example, it'll use a pair of SLCT's
17:20imirkin_: or a pair of SELP's
17:21pmoreau: television: Eh, that’s unexpected. :o Does it reliably trigger that warning?
17:22pmoreau: imirkin_: :-/ I can probably swap the if-statement by a for-loop and get around that.
17:22imirkin_: pmoreau: i'm not sure i understand what you're trying to check for
17:23imirkin_: you'd get a pair of phi's
17:23imirkin_: iirc we (almost) never end up with 64-bit phi's
17:23pmoreau: A phi-node around a 64-bit value
17:23imirkin_: the tgsi frontend splits everything up
17:24imirkin_: and then later on, we try to merge things back together, but that's post-ssa
17:24imirkin_: by then the phi nodes are there, adn they don't get merged into one
17:24imirkin_: (not truly post-ssa, but post the initial pass to make things ssa)
17:25imirkin_: you can write the tgsi and feed it into nouveau_compiler btw
17:25imirkin_: to see what the compiler will do with it
17:25pmoreau: I’ll need to learn TGSI first
17:42imirkin_: well it's easy to generate
17:42imirkin_: just write the glsl you want
17:42imirkin_: then do ST_DEBUG=tgsi MESA_EXTENSION_OVERRIDE=whatever-you-need
17:42imirkin_: that should crash the driver
17:42imirkin_: but not before it gives you the tgsi it woudl have sent down
18:38television: pmoreau: no, first time I've seen it
18:39television: also before that warning, last night, i had strange issues with display output
18:39imirkin_: television: i've seen that before
18:39imirkin_: reported to skeggsb
18:39imirkin_: he says he's never been able to reproduce it
18:39imirkin_: i can't reproduce it at will either
18:39imirkin_: and it's unclear why it happens
18:39imirkin_: so ... that's the current situation
18:39television: imirkin_: this is after sleep/wake btw
18:40television: a seperate issue, before that warning, and before that sleep/wake cycle, i plugged my laptop to a VGA, LCD monitor
18:40television: and the monitor displayed scrambled pixels
18:40television: looked like multicolored random noise
18:41imirkin_: probably got fed an uninitialized buffer
18:41television: i have pics of it if you'd like to see
18:41television: but the stranger thing is
18:41imirkin_: aka "application issue"
18:41television: when i move my cursor away from that monitor
18:41television: it goes black
18:41imirkin_: it = what?
18:42television: monitor that had random colored pixels.
18:42television: when i moved the cursor back to that same monitor
18:42television: i could see the cursor fine
18:42imirkin_:blames the application
18:42television: but behind it was still noise
18:42television: i messed with resolution settings
18:43television: i set it to 640x480
18:43television: and back to 1080
18:43television: guess what
18:43television: now my INTERNAL monitor has the noise
18:43television: and the external works fine
18:44imirkin_: yeah. whatever application is supplying framebuffers is messing up
18:44television: i tried read-edid and it said no EDID things found on i2c
18:44imirkin_: not a lot nouveau kms can do about being passed in bad framebuffers
18:44television: so it tried VBE and resulted in 128 bytes of FF FF
18:46television: i unplugged/replugged the vga cable several times in rapid sucession
18:46television: both displays blinked
18:46television: then they both worked fine
18:47television: also remember how i had that strange 5 flashes to black between logging in and those flashes before video started and stopped? imirkin_
18:47television: today i have two flashes to black before every login
18:48television: before messing with the external monitor it was five
18:48television: now its two
18:50television: 65603.418762] nouveau 0000:01:00.0: devinit: 0x00005dfa: script needs OR link
18:50television: what does this mean?
18:50imirkin_: it's trying to execute a vbios script
18:50imirkin_: which uses an op that's meant to be executed in a different contetx
18:50imirkin_: or ... something
19:46pmoreau: Eh eh eh, just had to go via the CL part of piglit, and there we go, 64-bit support on Tesla. (And it goes via the NIR frontend which doesn’t split things.)
19:55Lyude: karolherbst: poke, any idea where we actually do the _PR3 ACPI call for suspend/resume?
19:55karolherbst: Lyude: in the acpi_pci code somewhere
19:55Lyude: trying to answer some of the questions here: https://lkml.org/lkml/2019/2/14/1270
19:55karolherbst: I never find it myself either
19:55karolherbst: but it's either in the pci or the acpi subsystem
19:55Lyude: karolherbst: perfect, should make this review a bit easier then as well :)
19:56karolherbst: and inside a file with a name containing acpi and pci :)
19:56karolherbst: I think
19:58imirkin_: Lyude: _PR3 is handled by the pci core afaik
19:58imirkin_: _DSM is handled by the drivers
19:59imirkin_: er, s/handled/called/g
19:59imirkin_: the acpi code then twiddles magic bits which magically do magic
21:02pmoreau: karolherbst: Regarding Tesla, one thing I had forgotten is that “by default”, `st u32 # g[$r0+0x4] $r1` gets actually emitted as `st u32 # g[$r0] $r1` (the offset is dropped).
21:02karolherbst: that sounds annoying
21:03karolherbst: I guess it's just missing in the emitter?
21:03pmoreau: I had hacked in an explicit `add u32 $r0 $r0 0x4` automatically on my old branch, but there might be a better way of doing it. I haven’t checked the TGSI path.
21:03karolherbst: well, the TGSI can't do it
21:03karolherbst: no compute shaders
21:03karolherbst: and most likely no ssbo support either
21:04karolherbst: I assume it's either missing from the emitter, some short encoding funky stuff going on or we have to lower offsets :)
21:04imirkin_: iirc the encoding just can't support it
21:06karolherbst: ohh right, because we already use 5 bits for the g file index as well :/
21:06karolherbst: and that addressing stuff is also quite funky
21:06imirkin_: but it's been a looong time since i looked
21:06karolherbst: tesla has those weirdo address registers?
21:07pmoreau: The a things? I always ignored those.
21:08karolherbst: instead of %r0
21:08imirkin_: the address registers are for constbuf addressing
21:08imirkin_: not for g addressing
21:08karolherbst: ohh, I see
21:09imirkin_: (and for input-space addressing in GS ... iirc that's all kinds of weird)
21:09karolherbst: anything special about those? like only 16 bit sizedd?
21:09imirkin_: with a sticky bit :)
21:09imirkin_: i.e. if you store a value > 64K, the sticky bit gets set, and then doing any kind of math on them will end up with the stuck value
21:10imirkin_: or ... something