02:15dardevelin: imirkin, hello are you there ?
02:15dardevelin: imirkin, I have a friend that has a patch to solve what seems to be an apparent known issue, which involves nvidia.
02:17dardevelin: imirkin, last time you were the best at helping guiding this, so I think you can probably help checking this out
02:17dardevelin: imirkin <-> WizardGed
02:34mupuf: Finally making some progress on this freaking fan calibration table :)
02:35mupuf: seems to have found something that works for all the nvc0+ in our vbios repo
02:36mupuf: but it fails miserably on nvaX
02:55mupuf: maybe we'll finally get rid of all these "my fan is louder than my leaf blower" type of bug reports
04:07imirkin: mupuf: people will find something else to complain about... like how come they have to go out and buy a leafblower now instead of using the gpu
04:27imirkin: dardevelin: WizardGed: it works best if you ask actual questions rather than sit quietly.
04:28imirkin: i look at scrollbacks and will get back to you in due time
04:28dardevelin: imirkin, we are here, I was mostly pointing you guys to each other so that you could do it
04:28dardevelin: WizardGed, knows the situation better, and could better present the case
04:29WizardGed: I have a couple patches that dont seem to be in upstream linux for the tegra x1 : 3 links to follow with diffs and patches
04:29WizardGed: https://patchwork.ozlabs.org/patch/744215/ https://patchwork.ozlabs.org/patch/782871/ https://patchwork.ozlabs.org/patch/782875/
04:31WizardGed: was wondering if I'm a doofus that cant find them
04:32WizardGed: or if they havent been submitted in which case I'd like to get them submitted and am stuck on who to poke
04:32WizardGed: or if I hsvr to
04:33imirkin: probably the latter
04:35imirkin: as for who to ping... check what the MAINTAINERS file syas
04:37imirkin: tagr's a good bet (Theirry Reding)
04:37imirkin: try asking this in #tegra
04:37imirkin: where they might even know the specific status of those patches
04:38WizardGed: thank you I didn't even know #tegra was a thing
04:52friedbrn: Hey, just wondering what the deal is with this signed firmware on the newer GM20x GPU's, looks like it's more then just protecting there code from competition now
04:52imirkin: they never were, nor are they trying now
04:52imirkin: it's not encrypted; just signed.
04:54friedbrn: I'm not technical enough to understand this signed issue, all I know is that it says on the nouveau site that there will be no more reclocking support, that there is little hope for it now
04:54friedbrn: so a closed door
04:54imirkin: pretty much
04:55imirkin: (a) to use nouveau firmware (in secure mode), we'd have to get nvidia to sign our firmware. that ain't happening.
04:55friedbrn: I thought is was to protect there tech from the competition, but now they are just being mother funkers
04:55imirkin: (b) in order to develop the firmware in the first place, there's a lot of trial and error, so development for GPUs with different ram types (like pascal) is not going to be a thing
04:56imirkin: now... there's no hard reason to have to use nouveau firmware. there's some crazies out there that claim they care, but my position is that i don't.
04:57imirkin: however nvidia would have to distribute the firmware with e.g. linux-firmware in order for that approach to be viable (using nvidia's firmware)
04:57imirkin: so far they have not done that.
04:57imirkin: nor do i expect that to ever happen.
04:57imirkin: so basically... buy intel or amd.
04:58friedbrn: yep, I only buy free hardware
04:58friedbrn: or hardware that is made free
04:59imirkin: no hardware out there ships with the ASIC designs (at least not in the consumer GPU space)
04:59imirkin: but intel and amd public docs are half-decent, and each company has full-time staff with access to additional docs who perform the majority of the work on the open-source drivers
05:02friedbrn: ok, so for gaming it's the older nvidia, or amd
05:03imirkin: i suspect most people would not say that nouveau is appropriate for gaming on any gpu
05:03imirkin: depends on what one's looking for, of course
05:04friedbrn: same thing happened to libreboot, the newer laptops have signed firmware
05:04friedbrn: or might apply to coreboot as well
05:04friedbrn: I don't do any gaming
05:05imirkin:sees no problems with signed firmware
05:05imirkin: the problem is with people not distributing the signed firmware.
05:05friedbrn: I'm trying to be a smart guy, do parallel programming stuff
05:06friedbrn: death to nvidia
05:08friedbrn: I can understand protecting there code from competition but this is to much
05:18friedbrn: well, this rules anyway, this development
05:38friedbrn: yep, this rules on high, die nvidia die
05:38friedbrn: oh well, this is all I'll be needing
05:38friedbrn: I mean, sorry for those of you hooked on the latest games
05:39friedbrn: well, what can I say, all the appreciation in the world
05:39friedbrn: maybe someday I will be able to do something with the free code
11:38mupuf: karolherbst: hey, could you ask travis to use either xenial or zesty as a distro?
11:38mupuf: that should fix the build issue
11:57whqc: imirkin: any news on https://bugs.freedesktop.org/show_bug.cgi?id=102349 ?
11:57whqc: we are two weeks further now. would be nice to see something there to start getting those issues fixed
13:07karolherbst: mupuf: yeah, I'll try
13:12karolherbst: mupuf: well no, it doesn't fix it
13:12karolherbst: ohh wait, it used 14.04
13:13mupuf: yes, use at least xenial
13:13mupuf: zesty would be better
13:13karolherbst: well it shouldn't fail for silly reasons to begin with
13:13mupuf: cmake's interface changed
13:14mupuf: starting from cmake 3.5
13:14karolherbst: then we should requre a newer version
13:14mupuf: and we rely on that
13:14mupuf: yep, we should :)
13:14karolherbst: cmake changes behaviour according to the minimu version specified as well
13:17karolherbst: mupuf: so yeah... there is nothing newer than trusty
13:18karolherbst: "As of July, 18th 2017, we’re switching the default Linux distribution on Travis CI from Ubuntu Precise 12.04 LTS to Ubuntu Trusty 14.04 LTS. " :D
13:18karolherbst: any other questions?
13:19karolherbst: wait a second
13:19mupuf: karolherbst: well, maybe we want to add a workaround for travis then
13:19karolherbst: but we can get a newer cmake somehow
13:19karolherbst: the osx image has cmake 3.9.4
13:22mupuf: sounds good then, we could compile on osx :D
13:22karolherbst: no libpciaccess
13:24karolherbst: mupuf: https://github.com/travis-ci/travis-ci/issues/7437 ;)
13:25mupuf: but yeah, go for it
13:26mupuf: they are hilarious seriously, They probably run virtual machines with containers inside for running the tests anyway
13:26mupuf: so how is that much work to maintain multiple distros? :D
13:27karolherbst: mupuf: :D
13:27mupuf: or they are just swamped in infrastructure work to keep the jobs running. But seriously, trusty is not going to be supported for too much longer
13:27mupuf: it is > 3 yo
13:27karolherbst: mupuf: well usually you have some container clusters
13:27karolherbst: you don't need virtual machines for this except you only host virutal machines
13:28mupuf: yeah, they could have chroots too
13:29karolherbst: well, who knows _how_ they build those containers
13:29mupuf: anyway, let's go back to this modelling effort
13:29mupuf: I hate these off by ones
13:29mupuf: 0: Check freq=59812, duty_max=244, duty_offset=2963, fan speed=0: INVALID DUTY (real=16, computed=15) :s
13:30karolherbst: mupuf: https://github.com/travis-ci/travis-ci/issues/5821 ;)
13:30karolherbst: for reference
13:30mupuf: karolherbst: yeah, that's the one I read
13:31karolherbst: mupuf: are you 100% you round the same as nvidia? From what I know they tend to round always up with stuff like that
13:31karolherbst: so even if the "computer" duty is 15.001 they use 16
13:31mupuf: well, that's what a fuzzer is supposed to allow me to know :p
13:32karolherbst: reminds me of my voltage work, but I think you have more parameters?
13:33karolherbst: or more parameters you have to change in the vbios?
13:34mupuf: I have 3
13:34mupuf: for now,but I will get 6 at some point
13:35karolherbst: yeah, I had 1...
13:37karolherbst: mupuf: the fuck, their "solution" to this problem: just use proper docker images. ... the fuck
13:37mupuf: yep :D
13:37mupuf: and can we do this for free?
13:37mupuf: just use osx, seriously
13:37mupuf: and only care about clang
13:37karolherbst: everybody can push to the public docker repository...
13:37karolherbst: or not?
13:37mupuf: oh, if there are already images, then use that
13:53karolherbst: I should get me a hipster CI job: "write working travis-ci docker CI scripts on first try"
13:53karolherbst: mupuf: https://travis-ci.org/karolherbst/envytools/builds/297551820
13:54mupuf: karolherbst: well done!
13:55karolherbst: not sure if the build fails if something went wrong though
13:56karolherbst: mhh "Warning: nvapy won't be built because of un-met dependencies (python3 and cython)"
13:56karolherbst: do I need to install cython3?
14:48imirkin: karolherbst: ping re running that piglit on blob
14:48karolherbst: imirkin: yeah, it is on my todo list for today :)
14:49imirkin: hehe ok
14:49imirkin: i sent a v2 of it: https://patchwork.freedesktop.org/patch/186449/
14:49karolherbst: yeah, already seend that
14:50imirkin: both with and without -compat would be nice.
15:13tobijk: imirkin: so if you want piglit results on the blob, do you now have recent ones for kepler? :D (i want to compare them with my pascal results)
15:13imirkin: tobijk: i just want the result for that one piglit
15:14tobijk: meh :/
15:14imirkin: which isn't really a test in the first place, it's just the only way i know how to write GL apps :)
15:14imirkin: [and there are convenient helpers like piglit_build_simple_program() and so on]
15:15mupuf: Stats: pass=621/1478 (42.02%), fail=857/1478 (57.98%)
15:16mupuf: I guess I still have more work to do on this fan calibration model :D
15:16mupuf: but not so bad!
15:16imirkin: tobijk: looks like last time i ran piglit on kepler was: 585552 Jul 7 2015 nv108-2015-07-06-ilia.xz
15:16karolherbst: mupuf: nice
15:16karolherbst: mupuf: well, depends on how big the error is
15:16tobijk: imirkin: yeah thats a bit old
15:16mupuf: karolherbst: yeah, I need to quantify that. A lot of them are likely to be off-by-ones
15:17imirkin: i have someone's gm204 run from april 2016
15:17mupuf: which I will likely accept as they are not the end of the world
15:17karolherbst: mupuf: yeah, we can ignore all where we get the right result by rounding up
15:17imirkin: mupuf: keep in mind ... no floating point
15:17mupuf: imirkin: yes! There are no floating points in my current algo :)
15:18imirkin: ok good
15:18imirkin: also... round numbers get used a lot more often than random ones
15:18imirkin: can you show your current algo?
15:19mupuf: imirkin: https://github.com/envytools/envytools/commit/e2fccb8f306c60c6e7d30e2bcc78073ea724fb07#diff-9b8faddf90d0a947a6cb6243bb8f1411R290
15:19imirkin: i thought you said no float
15:20imirkin: 13.5e5 = float
15:20karolherbst: I thought it as first too, but ther is no float ;)
15:20karolherbst: imirkin: ;) mupuf was just lazy and "safed" some zeros
15:20imirkin: which means that first *div division is a float one
15:20imirkin: and then converted to int
15:20mupuf: fair point, I'll change that
15:21imirkin: also, how sure are you about that number?
15:21imirkin: e.g. are you sure it's not 0x150000 ?
15:21mupuf: imirkin: this is crystal clock / 2
15:21mupuf: I can't get that in nvbios
15:22feaneron: i have a nvidia 920m (kepler appearently). tried to change the power mode by "# echo 0f > /sys/kernel/debug/dri/1/pstate", and it worked the first time. after a reboot, though, when i try to do that again, the shell hangs really hard (blocks even rebooting).
15:22mupuf: but all gpus use a 27 MHz crystal nowadays (AKA, at least starting on geforce 8+)
15:22karolherbst: feaneron: make sure the GPU isn't suspended
15:22imirkin: mupuf: well, the number you use is 0x149970. just saying... round numbers are more frequent ;)
15:22feaneron: now, 'pstate' is showing "DC: core 0 MHz memory 0 MHz"
15:22feaneron: ok, let me check (not sure how...)
15:22imirkin: feaneron: that means it's powered off
15:22karolherbst: feaneron: upstream nouveau doesn't like to reclock while the GPU is off
15:23karolherbst: feaneron: well clocks being 0 means the gpu is off ,)
15:23tobijk: feaneron: make sure the card is not sleeping while reclocking, e.g run glxgears and then reclock
15:23mupuf: imirkin: :) I have 0 div-related issues
15:23mupuf: so far, at least
15:23feaneron: yeah. i thought either that, or it's hardware blocked (is that even possible?)
15:23karolherbst: feaneron: the driver is blocked
15:23karolherbst: waiting for the hardware to do something...
15:24imirkin: mupuf: also are you sure you're not getting overflow?
15:24imirkin: you made all those things int16
15:25karolherbst: mupuf: uhm... are you sure about the div comming out of the second call?
15:25karolherbst: ohh wait, no. it is fine
15:25mupuf: I am not testing the C model just yet. I use one in python. But since I just moved away from having synchronous testing, I should probably not waste time with python for verifying the model anymore.
15:26tobijk: karolherbst: you had luck with mainline and the vblank bug btw? (havent read the logs)
15:26mupuf: yeah, let's go back to the C model
15:26karolherbst: tobijk: as it seems on master it just disallows me to rmmod
15:27tobijk: yet suspend is working now?
15:28feaneron: "$ DRI_PRIME=1 glxgears" appearently worked, but pstate still shows that the gpu is off... "$ DRI_PRIME=1 glxinfo" shows the correct card and architecture
15:28feaneron: does it matter i'm on wayland? those tools are x11 centric, but xwayland should have taken care of that
15:30feaneron:wants to learn how to do that from linux since booting on windows seems to wake the gpu correctly
15:30tobijk: feaneron: could be the problem, you could try with an XServer session
15:31feaneron: sure. brb
15:35feaneron: ok. in a x11 session, running glxgears woke the gpu up indeed
15:38feaneron: perhaps i should've increased fan speed before changing the power mode of the gpu?
15:45feaneron: oh boy. sorry if someone pinged me. that command halted my whole system :(
15:49feaneron: AHA. there must be a program running on top of the nvidia gpu to make that command run!
15:49feaneron: sorry for the noise
15:49imirkin: and note that when the GPU powers off, the setting is lost, i think
15:50ejwf: hi imirkin, nice to see you here. Have you took a look into the mesa nv4x problem?
15:50feaneron: imirkin: that's right
15:50imirkin: ejwf: nope
15:51feaneron: do i have to manually set the fan speed to prevent overheating?
15:51imirkin: feaneron: laptop, right?
15:51ejwf: imirkin: any plans on that?
15:51imirkin: feaneron: fan is *generally* controlled by the EC (embedded controller)
15:57imirkin: or ... what are you comparing?
15:57mupuf: right now, I am dumping the duty at fixed points: 0, 20, 40, 60, 80, 100%
15:58imirkin: does the blob also do it 0..100?
15:58imirkin: i see.
15:58feaneron: ejwf: i really avoid using windows :) also, looks like nouveau doesn't fully use the gpu yet, so the chances of overheating are somewhat low
15:58mupuf: I have hundreds of lines like this in my db now: 50573,2763,-69,20,26,3
15:58imirkin: what are those numbers?
15:58mupuf: they mean: freq, duty_max, duty_offset, speed, div, duty
15:59imirkin: and that's from blob, yeah?
15:59mupuf: that give me this ground truth DB
15:59imirkin: if you'd like assistance, send me that full data
15:59mupuf: and then I simply open that and read line per line and check if I can match the numbers or not
16:00mupuf: thanks :)
16:00mupuf: I am now writing the standalone C program that will compute and compare
16:00imirkin: a lot easier to change stuff
16:00imirkin: and it's not like speed matters - you have a tiny amount of data
16:00ejwf: feaneron: then try lm-sensors
16:01mupuf: sure... but then I get issues like rounding and non-16 bit arithmetics
16:01mupuf: so... not sure it helps
16:01imirkin: python has integers :p
16:01imirkin: and numpy can help with stuff too
16:01imirkin: anyways, do whatever you're comfortable with
16:02mupuf: imirkin: https://pastebin.com/raw/P5jkXm7L <-- that's my python code right now
16:03imirkin: div = int(1350000 / freq)
16:03imirkin: so like stuff like that
16:03imirkin: shouldn't be necessary
16:04imirkin: dunno, maybe python3 broke stuff
16:04karolherbst: python3 defaults to / producing floats
16:04karolherbst: even if both arguments are ints
16:04imirkin: yet another reason to not touch python3.
16:04karolherbst: use // if you want an int division
16:05mupuf: karolherbst: thanks for the info
16:05imirkin: in py2:
16:05imirkin: >>> 1/2
16:05karolherbst: in py3: 1//2
16:05karolherbst: dunno. I think if I would have designed python from the start I would have choosen the py3 behaviour
16:06karolherbst: rounding should be explicit
16:06imirkin: apparently guido agrees with you
16:06karolherbst: well a programming language is there to help programmers, not to shorten the code
16:07imirkin: yeah. this isn't helping.
16:07karolherbst: depends on what your assumptions are
16:07imirkin: my assumption is... dividing an int by an int gives me int division.
16:07karolherbst: coming from a C world yeah, it kind of makes sense
16:07karolherbst: but python behaves more like a dynamic typed language anyhow
16:07imirkin: coming from any world that has integers.
16:07karolherbst: so type changes can happen kind of randomly anyhow
16:08karolherbst: it is even worse for languages like SmallTalk and ObjC, where the object itself changes its type after a method call ;)
16:08mupuf: karolherbst: that's the thing I do not like about python. I wish I could force a type on demand
16:08karolherbst: not the variable
16:08karolherbst: mupuf: don't use a dynamically typed langauge then ;)
16:09mupuf: karolherbst: I may not want to type everything, but when I want to, I should be able to
16:09mupuf: and for interfaces, you WANT to type shit
16:10pmoreau: mupuf: You can give type hints for functions, not sure about variables though https://docs.python.org/3/library/typing.html
16:11karolherbst: mupuf: well true, but nobody until now figured out a perfect way which fits everybody
16:13imirkin: mupuf: just use python (aka python2) and move on with life.
16:17mupuf: imirkin: this will not solve my signedness problem
16:33karolherbst: mupuf: http://www.mypy-lang.org/
16:44karolherbst: imirkin: on nvidia both gives the same result
16:44karolherbst: and few output
16:45karolherbst: so nvidia does "0*inf = 0" for compat and core
17:16imirkin: karolherbst: ok, thanks for confirming.
17:17imirkin: karolherbst: this is on GK106 btw?
17:17imirkin: mupuf: numpy
17:17imirkin: mupuf: has types like uint32 and int32 which do the things you expect
17:17karolherbst: imirkin: if you want I can check onf gm107 as well
17:17imirkin: (and probably int16/uint16)
17:17imirkin: karolherbst: nah, i'm sure that's the same. tesla would be good to check tho
17:18karolherbst: imirkin: ask again on Monday, I _might_ be able to test on tesla
17:18imirkin: mupuf: you still got that nva3 plugged in?
17:18karolherbst: imirkin: tesla as in gtXXX right?
17:18imirkin: karolherbst: as in G80 - GTxxx
17:19karolherbst: imirkin: I have some 8600 GS cards here, I could check in the office on those
17:19imirkin: i have a G80 trace here that indicates that the 0*inf=0 behavior is used
17:19imirkin: but i'd like to confirm in both core and compat
17:19mupuf: imirkin: nope, not right now
17:19imirkin: although... if nvidia does this for kepler, it probably also did it for tesla
17:19mupuf: but I could plug that
17:20imirkin: mupuf: well, your call. it's not super important.
17:20karolherbst: mupuf: do you have a detailed list of GPUs you have? I might be interested in some of them if you have duplicates.
17:20mupuf: hmm, the vbios repo is a good start, but no, I do not have a clear inventory
17:20mupuf: maybe I could start a google doc for that
17:21karolherbst: mupuf: I should visit you, then I force you to do that :p
17:21mupuf: better, I would rather have them plugged and ready for people to use
17:22karolherbst: mupuf: well, I am most likely won't exactly need reator anymore in the near future, depending on what cards I have access to
17:22mupuf: sounds good :)
17:22mupuf: more control for you!
17:22karolherbst: currently I have direct access to a pascal/maxwell1/kepler1/and those old teslas
17:23mupuf: the problem with randomizing the parameters for the fan is that they are often loud, and sometimes abnoxiously high-pitched!
17:24karolherbst: mupuf: maybe it is a good thing I never got involved into that fan stuff, because then I won't annoy others at the office as well :D
17:24karolherbst: now that I think about that
17:53imirkin: oh. that's why i never looked into bioshock. i don't have a copy.
17:53imirkin: o well.
17:53karolherbst: imirkin: what is wrong with bioshock?
17:54imirkin: the perf seems to be *especially* bad
17:54karolherbst: imirkin: most likely the shader cores were caped
17:54karolherbst: I mean we don't use the boost clocks by default
17:54karolherbst: the NvBoost config
17:55imirkin: i mean relative to blob, it's much worse than other games
17:55imirkin: nvboost means it's off by like ... 10%?
17:55karolherbst: for me it is 705 vs 862 MHz
17:55imirkin: that's with =1 or =2?
17:56karolherbst: 862 on 2
17:56karolherbst: in avarage it should be around 15%
17:56karolherbst: but yeah, the result still looks bad
17:56imirkin: anyways, feels like it's extra-bad
17:56karolherbst: For me I think it was much faster? dunno
17:56karolherbst: the benchmark is crappy anyhow
17:57karolherbst: I got like 40% difference between the worst and the best run
17:57karolherbst: we have a lot of stuttering though
17:57karolherbst: ohh yeah, that explains
17:57imirkin: anyways, i really should look at getting a non-shit gpu so that i can look at perf things
17:57imirkin: coz it's a bit of a joke to do it on my GK208 with DDR3
17:58karolherbst: imirkin: I think with bioshock we are slow when the game loads scene data or something like that
17:58karolherbst: as long as you are kind of in the same area, perf is pretty good
17:58karolherbst: but moving too much causes sudden stutters
17:58karolherbst: wondering what that's about
17:59imirkin: what's like the shittiest GPU i could get that's still supposedly "decent perf"?
17:59imirkin: GTX 660?
17:59mupuf: yeah, the X60 series are usually OK performance-wise
18:00karolherbst: imirkin: 650 Ti shouold be fine as well
18:00karolherbst: my GPU is around the perf of a 650 Ti
18:00imirkin: man, these things are huge
18:01imirkin: 650ti is more compact
18:04mupuf: imirkin: make sure it has GDDR5, that's a good indication that the performance won't be the absolute worsr ;)
18:04imirkin: i only have one GPU with GDDR5 - the GT215 :)
18:04imirkin: which naturally doesn't support reclocking
18:17pmoreau: imirkin: Really? Shouldn’t all Tesla past G94 be reclockable? Or just not the ones with GDDR5?
18:18imirkin: pmoreau: yep :)
18:18pmoreau: Ah, too bad :-(
18:20mupuf: pmoreau: the tesla gpus are not properly reclockable
18:20mupuf: you may or may not be lucky
18:20RSpliet: pmoreau: also, GT21x with DDR2 are "reclockable", but I observed display corruption in the highest perflvl
18:20imirkin: mupuf: no, they are
18:20mupuf: imirkin: the nvaX series is OK (except GDDR5)
18:21mupuf: but before that, no guarantees
18:21RSpliet: mupuf: (GDDR3) G98 and G200 should be fine as well
18:21pmoreau: RSpliet: OK
18:21RSpliet: MCP7x reclocks well
18:21imirkin: RSpliet: what about G94/G96?
18:22RSpliet: The GDDR3 ones I own change DRAM freqs fine, but there's some work that I wanted to pick up that was mandatory for G92 which I suspect will benefit all other cards too
18:24mupuf: RSpliet: but I guess you do not have all the necessary GPUs too
18:24mupuf: maybe we could do something about this next time we see
18:24RSpliet: Yeah... don't know when that will be
18:25mupuf: I guess there is no urgency anyway
18:25RSpliet: PhDs eat up a remarkable amount of time too... I'm generally keeping myself entertained quite well
18:26mupuf: yep :D They do have this tendency!
18:28mupuf: gosh I hate these off by ones!
18:32mlankhorst: I imagine it's easier than learning finnish though :)
18:34mupuf: mlankhorst: I see similarities, when you make a mistake, it could be from anywhere ;)
18:34mupuf: and speaking about it, I should probably do my exercises :s
18:35karolherbst: mupuf: going into a pub and speaking finnish to random people ;)
18:35karolherbst: it is getting easier the later the night is :P
18:35RSpliet: that's what I always do after a couple pints of whiskey
18:36imirkin: RSpliet: go into a pub and speak finnish to random people?
18:36imirkin: you might come off as eccentric if you do that in the uk...
18:36RSpliet: that's what mupuf always tries, but he talks so much they never let him finnish
18:36karolherbst: my eye...
18:37RSpliet: imirkin: well, I don't know what language you speak after a bottle of whiskey, but I bet it bears little resemblance to anything Anglo-Saxon
18:37imirkin: yeah, a couple pints of whiskey, might start sounding finnish
18:49mlankhorst: learn swedish, get drunk, talk danish :)
18:51pmoreau: I should try that
18:56mupuf: mlankhorst: ah ah
18:56mupuf:thinks pmoreau and mlankhorst should try their swedish on each other ;D
19:02mupuf: well, seems like I was not crazy before. The behaviour of the duty cycle when I vary the duty_offset parameter with the fan speed set to 0 is still discontinuous
19:05imirkin: perhaps there's another value you're missing?
19:06mupuf: imirkin: perhaps, but I am only changing one parameter at a time here :s
19:08imirkin: hmmm ok
19:09imirkin: but i mean, perhaps the gpu has another output which makes this all "continuous"?
19:12mupuf: imirkin: http://fs.mupuf.org/pwm_offset.png
19:12mupuf: this is for fan_speed=0%
19:12mupuf: and duty cycle, is what you need to write to the HW
19:13imirkin: feels like it's a bitfield and is being misinterpreted. or there's a bit missing. or something.
19:13mupuf: there is no HW involved otherwise here. I am trying to find the function that maps fan_speed to the duty cycke
19:13imirkin: (it = pwm offset)
19:13pmoreau: mupuf: Before or after drinking? :-)
19:14imirkin: mupuf: anyways, if you give me the raw data, i can play around with it
19:14mupuf: imirkin: yes... except ... the frequency influences linearly the period of this oscilation
19:14imirkin: otherwise i doubt i can contribute much
19:14mupuf: I will package this up
19:15imirkin: what's "duty cycle"?
19:15imirkin: [i know what it is generically, i mean specifically here.]
19:16mupuf: how many cycles the output should be ON
19:16mupuf: and DIV == period in cycle
19:16imirkin: that's the generic explanation
19:16imirkin: i mean specifically here... where is that value coming from?
19:18mupuf: not sure what else I can say. It really is just that. But maybe this will help: https://pastebin.com/raw/U4Um1syh
19:19imirkin: ret = sscanf(line, "%hu,%hu,%hi,%hu,%hu,%hu\n", &e.pwm_freq, &e.duty_max,
19:19imirkin: &e.duty_offset, &speed, &div, &duty);
19:19imirkin: where does that "duty" value come from?
19:19mupuf: this one comes from the blob, read from register e118
19:19mupuf: div == register e114
19:20imirkin: so it's literally nv_rd32(e118) ?
19:20imirkin: k cool
19:20imirkin: that's what i meant ;)
19:20mupuf: that's the only two regs that matter
19:20imirkin: but there's no (strong) reason that it MUST be the "duty" value. could be a bitfield or whatever.
19:21imirkin: anyways, i'll have a look when you send me that CSV
19:21mupuf: it is still acquiring data
19:21mupuf: the image I sent you was from last year
19:21mupuf: and the format was crap and unusable
19:22mupuf: each data point takes ~1s to make, that makes it quite annoying
19:22mupuf: but that also means I can collect data in the background
19:22imirkin: how many data points do you have?
19:23mupuf: 1 frequency, 400 points
19:23mupuf: pwm frequency*
19:23imirkin: should be more than enough
19:23mupuf: ah ah, well, if you want, I can give that to you right now
19:26mupuf: imirkin: http://fs.mupuf.org/nvidia/fan_calib/
19:26mupuf: 6b11_sweep_new.csv is when changing only the pwm_offset (offset 6b11 in my vbios)
19:26mupuf: and ground_truth_db is random datapoints
19:27imirkin: i just want the csv right?
19:27mupuf: yeah, probably
19:28mupuf: I honestly do not think that the DUTY register is a bitfield
19:28mupuf: it would make very little sense and would mean that they change the chipset based on which fan is connected
19:29mupuf: (as duty functions just as expected on most GPUs)
19:32imirkin: it's mostly groupped in 6's
19:32imirkin: which does not lend a lot of credibility to the bitfield thing
19:32mupuf: imirkin: I uploaded a new one, with the frequency set to 2048Hz
19:32mupuf: and ... they are grouped in 3's ;)
19:33imirkin: but kinda 6's
19:34mupuf: if you try another frequency, you'll see it change... And the pwm_,max had no impact on the value when the fan speed is 0%
19:35imirkin: yeah wait, why are all these speed values 0?
19:35mupuf: because this is what I am trying to reverse engineer here. What is the impact of the pwm_offset?
19:36mupuf: and when speed is set to 0%, only the frequency and the pwm_offset influence the duty cycke
19:36imirkin: right yeah
19:37mupuf: any other value depend on this base... which makes it hard
19:39mupuf: I spent so long on trying to figure out this parameter... I think I will just give up and ask nvidia
19:39mupuf: but at least I can show everything I gave
19:41mupuf: imirkin: FYI, this was the best model I found: http://fs.mupuf.org/nvidia/fan_calib/pwm_offset_model.py
19:41imirkin: yeah, that's not logical though
19:41imirkin: better to ask yourself ... why on earth does it even look like that
19:41mupuf: as you say
19:42mupuf: well, I wish I knew
19:42imirkin: and i think understanding that will present a more logical answer
19:42mupuf: So, this table is trying to say what range of the duty cycle you should use to represent 0 -> 100%
19:43imirkin: i.e. there's no reason for this periodicity without further understanding
19:43mupuf: the first parameter is how high it can get
19:43mupuf: and it makes sense
19:43mupuf: and this is the pwm_max parameter
19:43imirkin: unless you're varying more than just the pwm offset
19:44imirkin: duty_max,duty_offset are in the vbios right?
19:44mupuf: the second one is an offset on the x axis, that allow shifting the values in both directions
19:44imirkin: adjacent 2-byte things?
19:44mupuf: yes, and so is the frequency
19:44mupuf: and yes, the 3 of them are next to each other
19:44imirkin: and the fact that you call them that is just for convenience
19:45imirkin: i.e. a totally different interp is theoretically possible right?
19:45mupuf: although the frequency is ignored and read from the thermal table, as it used to before this table got introduced
19:46mupuf: imirkin: well, given that nvidia always interpolates linearly between the low and high points, it makes sense to think that you are simply mapping a linear function into an affine one
19:46mupuf: which only requires two parameters
19:46mupuf: and to some extent, this is what they have... except discontinuous
19:47imirkin: have you tried varying duty_max while keeping duty_offset constant (and speed = 0)
19:47imirkin: ideally that should have no effect on the output
19:47imirkin: (assuming your interp is correct)
19:47mupuf: and this is what I saw
19:47mupuf: let me check again
19:48imirkin: ok. what about futzing with some of those other unks?
19:48mupuf: no impact has been found :s
19:48imirkin: basically, other than changing duty_offset, with speed = 0, can you cause anything to change?
19:48mupuf: I suspect they are for the other modes
19:48imirkin: also, i wonder if speed = 0 is a valid thing to even look at
19:48imirkin: iirc most things limit down to like 30
19:49imirkin: so perhaps it's a weird artifact of some other computation
19:49mupuf: well, I found that there is always a linear interpolation between the 0 and 100% points
19:49mupuf: so... I need to find these two points
19:50imirkin: do you see the weird alternating behavior at speed = 100?
20:47imirkin: RSpliet: was your scheduling stuff pre- or post-RA?
20:47imirkin: RSpliet: i'm like 99% ready to write the dumbest instruction scheduler known to man
20:48imirkin: because i'm sick of these stupid issues
20:49imirkin: so basically i want to just move all loads closest to their first use
20:49imirkin: and nothing else
20:49imirkin: i realize this isn't perfect, but it's a whole lot better than loads being super-far from their first use and taking up a reg
20:49imirkin: so like x = load(); other op; use(x);
20:50imirkin: becomes other op; x = load(); use(x)
20:50RSpliet: Hmm... but too close and you'll stall up to 400 cycles
20:50imirkin: and yeah ... latency bla bla bla
20:50karolherbst: imirkin: I am all for it
20:50imirkin: but the alternative is the current situation
20:50imirkin: which is
20:50imirkin: 75 loads
20:50imirkin: which starts spilling/etc
20:50imirkin: and then those loads get used
20:52karolherbst: imirkin: what do you think has a bigger perf impact: reordering those loads or disabling merging of 32bit loads to bigger ones?
20:52RSpliet: Could you check whether my branch makes improvements first? It should do something slightly more clever - try and enforce a distance of idk 400 or 600. There's some params to play with there
20:52imirkin: RSpliet: pointer?
20:52imirkin: karolherbst: reordering the loads
20:53RSpliet: imirkin: https://github.com/RSpliet/mesa/tree/insn_sched
20:53imirkin: RSpliet: ok. i haven't looked yet, but please don't be offended if i end up going with the simple thing.
20:53karolherbst: imirkin: mhh maybe we should stop merging those loads then, because this workarounds a serious bug within the spilling code
20:53imirkin: basically if your thing is too clever it may prove too much to review :)
20:53RSpliet: imirkin: wouldn't be offended
20:53karolherbst: don't know if there is another bug related to spill such stuff
20:53RSpliet: It's all quite ad-hoc
20:54imirkin: the thing i'm proposing is trivial to implement and understand
20:54imirkin: karolherbst: the spilling issues are unrelated.
20:54karolherbst: my solution is even more trivial ;)
20:54imirkin: karolherbst: your solution is for an unrelated problem.
20:54karolherbst: imirkin: ahh, then why reordering?
20:54RSpliet: but it does have the "keep like 400 cycles distance" (or w/e I have in the ArsePull Cost Model(tm) at this time of day)
20:54karolherbst: imirkin: to improve perf where we have bad orders?
20:54imirkin: RSpliet: yeah, i definitely don't like the idea of sticking them in next to each other
20:54karolherbst: or does it have other impacts?
20:55imirkin: RSpliet: but the alternative is more complicated.
20:55imirkin: karolherbst: no. to decrease reg usage when we have loads that are far away from uses.
20:55karolherbst: yeah, okay, that's what I meant
20:55imirkin: which, in extreme cases, can lead to spilling which is doubly-idiotic
20:55imirkin: esp since we don't have rematerialization
20:55imirkin: so you end up storing a 1 in lmem and then loading it from lmem
20:56karolherbst: I see
20:56imirkin: imagine you load up values 1..255 into regs
20:56imirkin: (moral of the story: don't do that.)
20:57imirkin: but this is what ends up happening
20:57imirkin: due to no fault of the shader writer
20:57RSpliet: imirkin: got a shader that does that? Sounds like a fun shader to play with, 'cos I'm pretty sure my branch still is shite in that case if the loads are all in the same BB
20:58RSpliet: (on account of "highest cost first, line 227)
21:04imirkin: i'll find it
21:04imirkin: some piglit obviously
21:04imirkin: or deqp test
21:04imirkin: but basically it's like store(vec4(1,2,3,4)); store(vec4(5,6,7,8)) etc
21:05imirkin: and somehow the immediate loads end up on top
21:05imirkin: and all the stores on bottom
21:05RSpliet: Oh yeah, the immediates need to be pushed wa-ha-hay down afaik
21:05imirkin: right. but they have to be in registers
21:05imirkin: coz that's how store works :)
21:06RSpliet: sure! Think my "highest potential liveness reduction" heuristic should take some care of that
21:08RSpliet: because they're low-cost and increase liveness by 1. Same doesn't hold for a string of ld's though, because they're high cost and that trumps the liveness-effect currently
21:09RSpliet: not clever enough to realise that 75 lds in a row isn't particularly useful for masking latency (bet there aren't that many MSHR equivalents)
21:09imirkin: might have only been 64 loads :)
21:09imirkin: but still ... i'd rather have the 64 regs
21:10RSpliet: any idea whether NVIDIA is clever with that shader?
21:10imirkin: [no idea]
21:10imirkin: i can't seem to locate it
21:11imirkin: i think it was in dEQP somewhere (and it was failing)
21:26imirkin: RSpliet: but a simple example is https://hastebin.com/hihuyuteza.cs -- see how with my latest patch to use the LOAD_CONSTBUF stuff, those const loads end up on top instead of near the exports
21:26imirkin: RSpliet: same basic idea
21:26imirkin: RSpliet: i.e. this is the result now: https://hastebin.com/ebanuqinin.bash
21:27imirkin: just because of some changes in the code regarding precisely where the load ops are getting inserted
21:28RSpliet: as long as reg usage stays below 32 that looks desirable, but I got the general idea yes :-)
21:28imirkin: constbuf loads are not *that* slow
21:29RSpliet: is it a separate SRAM, or "merely" a cached read-only buffer (so no coherence traffic)?
21:30RSpliet:regrets setting his pupils such tedious questions... because it makes for tedious marking
21:30imirkin: i think it's pre-cached
21:31imirkin: i.e. it all gets loaded into L2 or something
21:31imirkin: not sure though
21:31imirkin: there's definitely a staging buffer for all the UBO stuff
21:31imirkin: outside of the VRAM-backed stuff
21:31imirkin: that also enables parallel draws to work
21:32imirkin: (with different uniform values for each draw)
22:02pmoreau: RSpliet: Ah ah! Good luck with the marking! I’m almost done with mine. O:-)
22:08RSpliet: Thanks. Just finished... but I realise I'm a timezone behind you
22:10pmoreau: I’ll be finishing tomorrow anyway, as I don’t have access to the other half of the exam.