02:38imirkin: pmoreau: you have logic to lower all 64-bit integer ops to 32-bit based things right?
02:38imirkin: pmoreau: if so, you might want to send that series out
02:38imirkin: and/or make it available for testing alongside with ARB_gpu_shader_int64
09:16pmoreau: imirkin: Well, not all 64-bit integer ops, only the MUL/MAD ones, of which I have a new version of the patch. And some stuff for CVT, but I haven’t went back to that one, to really test all the different paths.
09:16pmoreau: imirkin: And a patch for folding U64/S64 constants.
09:17pmoreau: I’ll go through them again this week-end and (re-)submit them.
09:17pmoreau: imirkin: Have fun with the GM107! :-)
09:23GaivsIvlivs: pmoreau what's going on with the GM107?
09:23pmoreau: GaivsIvlivs: "imirkin | [01:43:41] alright. this weekend is the weekend of making xf86-video-nouveau work on GM107"
09:24GaivsIvlivs: I had it working on my system, until something with this latest round of updates made GNOME quit and need a login again
09:25pmoreau: But using the modesetting DDX instead of the nouveau one, right?
09:26GaivsIvlivs: nope just nouveau
09:30pmoreau: You should be getting a "Unknown chipset NV117", because it only supports up to NV10X.
09:31pmoreau: Oh wait, maybe you have a version prior to its removal
09:31GaivsIvlivs: It reads as GM107
09:32pmoreau: Yes, but it is also known as NV117, as that is the ID read from "register" 0 on the hardware: https://nouveau.freedesktop.org/wiki/CodeNames/#NV110
09:33pmoreau: Anyway, support for it was removed in https://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/src?id=3e2e0faa2ee1cce9c1bb5c7ad80d0592460f3edc
09:33pmoreau: As it wasn’t great, and it looks like Ilia is going to try to fix that.
09:34GaivsIvlivs: Excellent, I eagerly await further support for my card
11:11karolherbst: I am updating my branches to 4.8 now
11:12karolherbst: or at least only that reclocking branch
11:29karolherbst: I think CodeEmitterGK110::emitFMAD misses a NEG in the short imm case
11:30karolherbst: or let me think
11:34karolherbst: no clue, don't want to touch that thing
12:25karolherbst: it seems like nv50 is a little broken
12:26karolherbst: shaders/warzone2100/1.shader_test crashes
12:26karolherbst: ran as 0xac
12:27karolherbst: imirkin: do you mind checking if this is a regression? it would take a week on mine :/
12:29karolherbst: quite a lot shaders fail
12:30hakzsam: karolherbst: I would prefer to wait for the features freeze before pushing new things (should be today I guess)
12:30hakzsam: but I will try to have a look at your stuff over the weekend :)
12:30karolherbst: I don't have commit access anyway :p
12:30karolherbst: and second
12:30karolherbst: it isn't a regression of my stuff
12:30karolherbst: master fails a lot
12:31hakzsam: I didn't say that your stuff will introduce regression
12:31karolherbst: ahh, k
12:31karolherbst: I just wanted to run my patches through the tesla stuff and noticed this
12:31hakzsam: nv50 might have been broken since mesa 12
12:31karolherbst: yeah, might be
12:31hakzsam: I will run a bunch of piglit after the features freeze anyways :)
12:33karolherbst: maybe ac859d68f474694f9cb1de007997c936d735a48c is fine
12:33karolherbst: that would be good
12:33karolherbst: but I highly doubt it
12:33hakzsam: doubtful yes
12:34karolherbst: it is messy to apply a patch while bisecting :/
12:34karolherbst: but I could test at least releases
12:36karolherbst: uhhh, that patch only touches nvc0
12:37karolherbst: where is the nv50 version of that?
12:37hakzsam: I didn't implement it for nv50
12:37hakzsam: I actually only care about nvc0 :)
12:37karolherbst: then I wonder why I could use it for 0xac
12:38hakzsam: well, time to look at F1 2015 again
12:38karolherbst: soo maybe it crashes cause that thing isn't checked
12:38hakzsam: that game is a real pain
12:38karolherbst: seems like it
12:38karolherbst: a lot of fancy features are just plain painful?
12:39hakzsam: yeah, and the trace is a monster trace, 1.5M GL calls
12:39hakzsam: 175K lines of glsl
12:39hakzsam: fun :)
12:39karolherbst: well, happens
12:39karolherbst: I saw worse
12:39karolherbst: and I am not even joking :D
12:39hakzsam: you can always see worse, but it's crazy, trust me :)
12:40karolherbst: I guess the fancy feature part messes it up
12:43karolherbst: over 1M calls per frame is crazy anyway...
12:45karolherbst: no idea how bad the port of civ 5 is, but a 2010 game doing over 500k gl calls is also pretty intense...
12:46karolherbst: but the os x version was out on release date, so I guess they didn't change much for linux
12:46karolherbst: or we just hit all the bad paths
12:46tobijk: karolherbst: no use anyway, its unplayable in late games :D
12:46karolherbst: tobijk: exactly the issue
12:46karolherbst: it burns your cpu like nothing and the gpu is bored
12:46tobijk: <- 3m round change times, and that is not graphics related
12:47karolherbst: tobijk: every little building int eh cities are rendered on their own ;)
12:48karolherbst: would be nice to find way to reduce the CPU overhead
12:48karolherbst: I have a trace of that game with many calls
12:51tobijk: not sure how you'd like make consume less cpu, i guess our path (besides compiling shaders) are not that bad
12:51karolherbst: you can always improve things
12:52tobijk: right, but i doubt that worth for now, there other "low hanging fruits"
12:53karolherbst: yeah, but also for reducing cpu overhead
13:32karolherbst: it seems like chances are high we get the cts stuff for free
13:33karolherbst: allthough we shouldn't get our hopes up until we really get it
13:33karolherbst: https://www.khronos.org/members/ip-framework might be interesting
13:44tobijk: karolherbst: would be nice though :)
15:44iterati: hi. I see that the kepler reclock v5 branch from https://github.com/karolherbst/nouveau removed. Which branch supports kepler reclocking on 4.8 ?
16:11pmoreau: iterati: There is a v6 branch, but I don't think it has been rebased on 4.8 yet.
16:23iterati: pmoreau: thanks. Anything else with working reclocking for gtx 650 on 4.8 ?
16:28karolherbst: pmoreau: I actually did it today
16:30karolherbst: this sounds... interesting
16:34pmoreau: iterati: See Karol's message above
16:45karolherbst: hakzsam, imirkin: if no new issue comes up, I guess my cse patch can be pushed after the branching? I don't really feel like resending the same patch again ;)
16:54imirkin: karolherbst: what was the issue on nv50? want to check it out before i unplugit
16:54karolherbst: I am not quite sure, it could be that I messed up. shader-db/shaders/warzone2100/1.shader_test crashed here
16:54karolherbst: but then hakzsam told me I can't fake the chipset for nv50
16:54karolherbst: so maybe it is just me
16:54imirkin: yeah, you can't
16:55imirkin: i mean you can - it just won't work properly
16:55karolherbst: I noticed
16:57imirkin: seems fine - http://hastebin.com/raw/nufetugevo
16:58karolherbst: okay, then it was just me
16:58karolherbst: sadly my tesla machine is crapy slow :/
17:08karolherbst: mad(a, a, imm0) => mul(a, 2*imm0) ?
17:08karolherbst: or did I miss anything?
17:08karolherbst: ohh wait, in my shader I did
17:08imirkin: a*a + imm0
17:08imirkin: not quite the same
17:08karolherbst: ohh right
17:08karolherbst: wrong order
17:09karolherbst: silly me
17:09imirkin: math is hard :p
17:09karolherbst: it sure is
17:09karolherbst: especially if you have like 3600 equations :D
17:10karolherbst: mhh but still, I had something like that: mad(a, a, mul(b, imm0))
17:11karolherbst: mhh, looks like nothing indeed
17:12imirkin: the only interesting thing with fmul is that it can have a post-multiplier (or divider)
17:12karolherbst: so a*b / 2?
17:12imirkin: or * 2
17:12imirkin: up to 8 i think
17:13imirkin: i.e. 2, 4, 8
17:13imirkin: we try to pick up those optimizations
17:13imirkin: but they're not always possible without algebraic manipulation, so we'd miss them
17:13imirkin: we also miss out on cases like a * 2 * b * 2
17:14imirkin: whereby it really ought to become a * b * 4
17:14karolherbst: mhh true
17:14karolherbst: but somehow I never saw this
17:15imirkin: and we never do any tree rebalancing
17:15karolherbst: mhh, maybe now I have something
17:15imirkin: i.e. the difference between (a + b) + (c + d) vs (a + (b + (c + d)))
17:15imirkin: the former case can be dual-dispatched, for example
17:15imirkin: but it uses more registers. nothing is perfect in this world ;)
17:16karolherbst: mad(mul(a, 0.5), b, c) => add(mul_2(a, b) , c)
17:17karolherbst: the question is now: what would be the benefit of this...
17:18karolherbst: or would be a add + mul div 2 cheaper than a mad+mul?
17:18karolherbst: well assuming the post divider is pretty much for free
17:18imirkin: i think that's a reasonable assumption
17:21karolherbst: but this could lead to more instructions whenever the previous mul can't be eliminated
17:21hakzsam: karolherbst: yes, I will take care of that after the branching
17:21imirkin: karolherbst: life's a bitch
17:22karolherbst: hakzsam: awesome :) thanks
17:25karolherbst: imirkin: any other fancy thinks we don't use yet?
17:26imirkin: zcull ;)
17:27karolherbst: I know :D
17:27imirkin: i know you know
17:28karolherbst: but I meant more like from the ISA
17:29karolherbst: I still have that one LiveOnlyTex thing, but it is broken and I don't see where, have to start from scratch there
17:36karolherbst: pretty special case though
17:39karolherbst: imirkin: any idea how to do that in non ugly?
17:40imirkin: so that's the algebraic manipulation stuff i mentioned we were missing
17:40imirkin: you can look at what intel does... i think their thing is also relatively primitive
17:40imirkin: this stuff can be tricky.
17:41karolherbst: mhh right
17:44karolherbst: that actually happens quite often in this shader here...
17:45imirkin: the problem is generally known as "rebalancing" i believe
17:45imirkin: although it's obviously more subtle than plain tree rebalancing
17:45karolherbst: yeah, I can figure why
17:58karolherbst: imirkin: I have an idea for an experiment: pass which marks sources with abs or neg+abs if the signess is already known and see what benefits we could get from this
17:59imirkin: not sure i understand
17:59imirkin: (or rather, i'm sure i don't understand...)
17:59karolherbst: k, like this
17:59imirkin: oh, like a value range propagation pass?
17:59karolherbst: if you have max %r1 %r2 1.0, you know that %r1 is positive
17:59imirkin: value range propagation (aka vrp) is quite useful
18:00karolherbst: then you could turn a max %r3 $r1 %r2 into max %r3 abs $r1 %r2
18:00karolherbst: just as an experiment though
18:00imirkin: not just for abs stuff
18:00imirkin: for cmp things as well
18:00karolherbst: I already worked on somethign like that, but just in stupid
18:00karolherbst: it was just an example
18:01karolherbst: it would be really usefull if we could just do if (src1 >= src2) in our passes , but I have no good idea how to do it right...
18:02karolherbst: especially because src1 >= src2 and src2 >= src1 could both be false, because it simply doesn't know yet
18:03imirkin: i've seen this a lot
18:03karolherbst: imirkin: does vrp work like that you mark every instruction result with a min/max value (or maybe even a list of value ranges) and do checks against those?
18:04imirkin: so the sequence i've seen
18:04imirkin: is like
18:04imirkin: set $r0 $r1 < $r2
18:04imirkin: or whatever
18:04imirkin: er hm, we should handle this already
18:04imirkin: but basically under some circumstances
18:04imirkin: it'll do the compare again
18:04imirkin: against 0
18:04imirkin: which is really the value itself all over again
18:05karolherbst: yeah I think I saw situations like this too
18:05karolherbst: especially sequences like those: https://gist.github.com/karolherbst/b1cb4522c8697cff6ef8ab63308ed88d :/
18:05karolherbst: so many immediates
18:06karolherbst: 90% of the pixmark shader looks like this
18:06karolherbst: and then a ton of cos/sins ...
18:07karolherbst: I doubt it has anything significant besides mul,add,sin,cos,sqrt,max,min
18:08karolherbst: a few loops, but meh
18:08karolherbst: mhh, I think I would want to implement something like this, but it may get messy really fast
20:16docmax: is it possible to switch from nvidia to nouveau without reboot?
20:17karolherbst: docmax: depends
20:17docmax: on what?
20:18karolherbst: it is possible without many issues if you can turn off the nvidia card
20:18karolherbst: otherwise you depend on a bit of luck
20:18docmax: i do a rmmod nvidia which works
20:18karolherbst: sure, but the gpu stays on
20:18karolherbst: is it a normal desktop gpu?
20:18docmax: nvidia gtx 970
20:18imirkin: depending on your gpu, nouveau may not be able to properly reinitialize the display unit after nvidia has had a go at it
20:19karolherbst: imirkin: I assume that a force POST may mess things up too much, too?
20:19docmax: when doing modprobe nouveau the screen goes black
20:19karolherbst: if you can't turn off your gpu, you have to be lucky
20:19docmax: how do i know gpu is off?
20:20karolherbst: you can't on a desktop
20:20karolherbst: well, usually you can't
20:20karolherbst: don't think there is any desktop gpu which would support it
20:20karolherbst: you can unplug the gpu and plug it in again
20:20karolherbst: PCIe _should_ support it
20:20imirkin: man, i just update X + freetype ... it's gonna take time to get used to the new fonts
20:20tobijk: lol :D
20:20docmax: if modprobe nouveau goes black doesnt it mean gpu was off?
20:20karolherbst: docmax: I mean off in like no power
20:21imirkin: docmax: screen != gpu
20:21imirkin: last i checked, which admittedly was a while ago, nvidia did something which caused nouveau to no longer be able to update the displayed image.
20:22imirkin: perhaps the failure mode is different now. either way, it's not a "regularly" supported action
20:22karolherbst: allthough moast MB don't implement the pcie hotplugging thing right, so you may even damage your gpu
20:22imirkin: it can work in some, esp laptop, setups
20:22karolherbst: well, usually it should work with all laptop setups where the gpu is turned off
20:22tobijk: mh spirv is in need of proper var names: var->var->data.mode = nir_mode;
20:23imirkin: what's wrong with all vars being named var? :)
20:23tobijk: if know all code by heart, nothing ;-)
20:24karolherbst: allthough I don't belive that a variable contains a variable anyway
20:24karolherbst: it could though
20:24karolherbst: ohh no, it can'T
20:24karolherbst: silly me
20:25karolherbst: nir_variable *var;
20:25karolherbst: there you go
20:26imirkin: unfortunately you kinda have to know the history of the code to really make sense of it
20:26karolherbst: but I also don't get why the spirv thing should have _any_ references to nir...
20:26karolherbst: even if it might make sense for some
20:26karolherbst: technically it is wrong
21:13imirkin: oh, well the anv code to process spirv is 100% tied to nir
21:13mooch3: karolherbst, yeaaaaah, on my machine, if you unplug a gpu and plug it back in, the thing turns off
21:13imirkin: its function is to convert from spirv to nir, so ... that kinda makes sense :)
21:15mooch3: i know this because my radeon has to sit unscrewed in the socket due to the fact that it's not tall enough to meet the screw bracket
21:18karolherbst: mooch3: no shit...
21:19mooch: no i mean
21:19mooch: the whole MACHINE
21:19karolherbst: ahhh .D
21:19mooch: the entire machine turns off
21:19karolherbst: I see
21:19karolherbst: yeah, silly mbs
21:19mwk: that sounds brutal
21:19karolherbst: you usually need a server mb for that
21:19karolherbst: mwk: lazy bios devs
21:19karolherbst: 95% of all uefis are garbage
21:20mooch: mwk: you said you had nv1 emulation code right? if so, can i see it? i'm thinking about implementing nv1 into the 86box emulator
21:20karolherbst: and then the pcie controllers are shit too
21:20mwk: mooch: hwtest/nv01_pgraph.cc
21:20mwk: that's pretty much all I have
21:20mooch: oh that
21:21karolherbst: mooch: sorry for that then, I just thought you were being sarcastic or something like that :p
21:21mooch: i'd definitely need more than that
21:21mwk: I also know a few things about PFB/PFIFO/PDAC/PRM/PDMA
21:21mwk: but nothing that resembles an emulator
21:21mooch: tbh we're going to need a lot of RE work done on NV1 to emulate it
21:21mwk: tbh I think we're closer than for nv3
21:22mwk: the display engine is dead simple to emulate, for one...
21:22mooch: eh, at least nv3 is svga
21:22mooch: so you can emulate the vesa modes pretty easy
21:22mwk: svga == not dead simple :)
21:23mooch: yeah, but there's already code and docs for it
21:23mooch: i have the vesa modes mostly working on nv3 and nv4
21:36mjg59: mooch: I admire your dedication
21:41mooch: mjg59, dedication? pffft. this is just me being stupidly stubborn
21:42mwk: it's not that different...
21:43mwk: should I try to RE nv1's cliprects, or sleep...
21:44imirkin: mwk: do you have any idea what the deal is with VP_A vs VP_B on nvc0 (and maybe nv50 had that bs too?)
21:44mwk: uh, no
21:44mwk: NFI about that thing
21:44mwk: and Tesla had nothing like it
21:48mjg59: mwk: Was it you trying to get text mode switching working on NV1?
21:49mjg59: I think I'd tried to blank that out
21:50mjg59: How far had you got?
21:50mwk: I succeeded, sort of
21:50mjg59: The idea of a working kms driver for nv1 is still just utterly hilarious
21:50mwk: I think my specimen has a defective DAC that screws up pallette in 4bpp mode
21:51mjg59: That would explain the weirdness you were seeing
21:51mjg59: You weren't able to test it under DOS?
21:51mwk: my specimen also has a defective BIOS that hangs the machine when POSTed
21:51mwk: I think it belonged to some other hacker before, who flashed it and fucked it up
21:52mwk: I mean, the code is just... broken
21:52mwk: in the place where the entry point should be, there's a near return instruction
21:52mwk: in a context that is far-called
21:53mwk: and the following bytes look like someone just patched a jump to a ret
21:53mwk: gods only know what else happened to that card
21:54mwk: so... I managed to POST the card by putting it as secondary and writing a libpciaccess program to POST it
21:54mwk: based on a part of the BIOS that looked like the init script
21:55mwk: which ALSO has been patched and cannot be parsed by the BIOS' own init script parser
21:55mjg59: The only NV1 on Ebay is 500 Euros
21:55mwk: it's a collector's item
21:55mjg59: And it's even marked as not working
21:55mwk: that's better than I expected, last time I checked there were no NV1s on ebay, period
21:56mwk: got a link?
21:56mwk: I managed to use all parts of the card already, I think
21:57mwk: the only problems I ran into are the fucked BIOS and broken 4bpp
21:57mwk: well, I haven't tried MIDI synth yet, or audio capture
21:57mjg59: Sounds like you should just merge it
21:57mjg59: Support for the Saturn joypad port would be wonderful
21:57mwk: that's quite easy, actually
21:58mwk: I don't have an actual saturn pad, but the whole thing is just bitbanging through some DAC registers, and I verified that stuff with a multimeter
21:58mooch: mjg59, i think my feats of emulation are even more impressive given the fact that i don't even have a riva card
21:58mooch: wow mwk
21:59mwk: mjg59: merge what?
21:59mooch: so i could just look at your findings, and a saturn emulator, and get the correct results???
21:59mwk: I didn't write a driver...
21:59mwk: mooch: ... for saturn gamepad support on NV1, yes
22:00mwk: but that's not particularly interesting...
22:01mjg59: mwk: Boo
22:01mwk: also, my NV1 is an early revision
22:01mjg59: mwk: All I want is to be able to get an arbitrary resolution kmscon on nv1. And an nv1. And a machine with PCI slots.
22:01mwk: from what I've seen in the windows driver, newer DACs have some proper saturn support
22:02mwk: as in, they actually support talking some weirdo protocols
22:02mwk: but mine has only the bitbang thing
22:02mwk: mjg59: by "arbitrary resolution" you mean "as big as you can fit in 2MB", right? :)
22:03mwk: sorry, this nv1 is taken, till death do us part
22:03imirkin: mjg59: don't most desktop mobos still have a PCI slot?
22:05imirkin: hakzsam: you've been looking at blob-generated maxwell code... have you noticed if they do 128-bit const loads or not?
22:06mwk: mjg59: also, there's lots of other fun to be had with an NV1
22:06mwk: an ALSA driver, for one :)
22:06mwk: that'd be one fun driver
22:06mwk: ALSA + KMS + input device
22:08mwk: too bad there's no chance of supporting DRI..
22:09mjg59: mwk: That's quite a lot in 2bpp
22:09mjg59: imirkin: Ha I uh the only desktop I own is a Mac Pro
22:10mwk: mjg59: it would, if the thing supported 2bpp
22:10mjg59: Which has zero slots
22:10mjg59: mwk: Is it all one PCI device?
22:10mwk: you can have 4bpp, 8bpp, 16bpp, or 32bpp
22:10mwk: yes, single function
22:10mjg59: Well, even 4bpp is pretty high res
22:10mwk: and you can't have hw accel with 4bpp
22:11mwk: I wonder how high can the pixel clock go...
22:11mjg59: Yeah that was going to be my next question
22:11mwk: I can set the PLL quite high
22:12mwk: the question is what will happen...
22:12mjg59: Does it have any similarities with nv3, or would it be logical as an entirely separate driver?
22:12mwk: oh fuck no
22:12mwk: the 2d engine is similiar
22:12mwk: other than that, everything is different
22:14mwk: the audio and graph parts are tied in many funny ways, btw
22:14mwk: same DMA engine
22:14mjg59: That doesn't surprise me
22:14mjg59: I think people used to think of DMA engines as magical expensive things
22:14mwk: and you need to allocate structs in VRAM to describe your sounds
22:14mjg59: Ha that's wonderful
22:15mwk: for that matter, the DOS driver puts MIDI fonts in VRAM and plays them from there
22:17mjg59: mwk: omfg
22:17mwk: mjg59: actually, there are three DMA engines on that thing
22:17mwk: one for graphics, one for audio, and one for DOS craziness
22:17mjg59: mwk: This is like the kind of custom chip you found on systems in the 80s
22:17mjg59: …or, I guess, console hardware
22:17mwk: but they have a little problem
22:18mjg59: Which kind of makes sense
22:18mwk: they share a "reset" button
22:18mwk: so if you fuck up graphics DMA, you have to blow up audio too
22:19mwk: I haven't figured out what exactly the 3rd DMA engine is for, yet
22:19mwk: AFAICT it's not connected to anywhere in the card
22:20mwk: and it's only supposed to be used by the DOS driver's sound blaster emulation to read/write memory over 1MB mark
22:20mwk: because, you know, protected mode is expensive
22:20mwk: it's better to use a PCI device for that purpose
22:20mjg59: Is the DMA engine limited to 24 bit addresses?
22:20mwk: it's fully 33-bit
22:20mjg59: Better than some hardware, then
22:20mwk: and I wish that was a typo.
22:21mwk: I mean, I guess the high bit is clipped away at some point before the PCI bus
22:21mwk: but the DMA engine clearly counts in 33-bit registers
22:22mjg59: Yeah ok it's 9 whole bits better than some hardware
22:25mwk: mjg59: it's even better, it supports demand paging in the DMA engine
22:26pmoreau: Damned, completely forgot to add the v2 to the mail subject… --"
22:31mwk: mjg59: but the most hilarious part of that GPU has to be the double-buffer mode
22:31mjg59: mwk: That's wonderful
22:31mwk: to turn double buffering on or off, you have to flip a bit in memory controller, which makes contents of the whole memory basically invalid
22:31mwk: ... including the structs describing audio playback
22:32mjg59: mwk: Hahaha
22:32mjg59: mwk: Please write all of this up somewhere
23:13pmoreau: imirkin_: Thanks for the review! I’ll take care of those tomorrow and send out at least a patch for adding 64-bit integer constant folding (for almost all of them). CVT might need some extra time.