02:15 dardevelin: imirkin, hello are you there ?
02:15 dardevelin: imirkin, I have a friend that has a patch to solve what seems to be an apparent known issue, which involves nvidia.
02:17 dardevelin: imirkin, last time you were the best at helping guiding this, so I think you can probably help checking this out
02:17 dardevelin: imirkin <-> WizardGed
02:34 mupuf: Finally making some progress on this freaking fan calibration table :)
02:35 mupuf: seems to have found something that works for all the nvc0+ in our vbios repo
02:36 mupuf: but it fails miserably on nvaX
02:44 jayhost1: Nice!
02:55 mupuf: maybe we'll finally get rid of all these "my fan is louder than my leaf blower" type of bug reports
04:07 imirkin: mupuf: people will find something else to complain about... like how come they have to go out and buy a leafblower now instead of using the gpu
04:27 imirkin: dardevelin: WizardGed: it works best if you ask actual questions rather than sit quietly.
04:28 imirkin: i look at scrollbacks and will get back to you in due time
04:28 dardevelin: imirkin, we are here, I was mostly pointing you guys to each other so that you could do it
04:28 dardevelin: WizardGed, knows the situation better, and could better present the case
04:29 WizardGed: I have a couple patches that dont seem to be in upstream linux for the tegra x1 : 3 links to follow with diffs and patches
04:29 WizardGed: https://patchwork.ozlabs.org/patch/744215/ https://patchwork.ozlabs.org/patch/782871/ https://patchwork.ozlabs.org/patch/782875/
04:30 imirkin: ok
04:31 WizardGed: was wondering if I'm a doofus that cant find them
04:32 WizardGed: or if they havent been submitted in which case I'd like to get them submitted and am stuck on who to poke
04:32 WizardGed: or if I hsvr to
04:33 imirkin: probably the latter
04:35 imirkin: as for who to ping... check what the MAINTAINERS file syas
04:36 imirkin: https://hastebin.com/oreyopuroq.pl
04:37 imirkin: tagr's a good bet (Theirry Reding)
04:37 imirkin: try asking this in #tegra
04:37 imirkin: where they might even know the specific status of those patches
04:38 WizardGed: thank you I didn't even know #tegra was a thing
04:52 friedbr[41]n: Hey, just wondering what the deal is with this signed firmware on the newer GM20x GPU's, looks like it's more then just protecting there code from competition now
04:52 imirkin: they never were, nor are they trying now
04:52 imirkin: it's not encrypted; just signed.
04:54 friedbr[41]n: I'm not technical enough to understand this signed issue, all I know is that it says on the nouveau site that there will be no more reclocking support, that there is little hope for it now
04:54 friedbr[41]n: so a closed door
04:54 imirkin: pretty much
04:55 imirkin: (a) to use nouveau firmware (in secure mode), we'd have to get nvidia to sign our firmware. that ain't happening.
04:55 friedbr[41]n: I thought is was to protect there tech from the competition, but now they are just being mother funkers
04:55 imirkin: (b) in order to develop the firmware in the first place, there's a lot of trial and error, so development for GPUs with different ram types (like pascal) is not going to be a thing
04:56 imirkin: now... there's no hard reason to have to use nouveau firmware. there's some crazies out there that claim they care, but my position is that i don't.
04:57 imirkin: however nvidia would have to distribute the firmware with e.g. linux-firmware in order for that approach to be viable (using nvidia's firmware)
04:57 imirkin: so far they have not done that.
04:57 imirkin: nor do i expect that to ever happen.
04:57 imirkin: so basically... buy intel or amd.
04:58 friedbr[41]n: yep, I only buy free hardware
04:58 friedbr[41]n: or hardware that is made free
04:59 imirkin: no hardware out there ships with the ASIC designs (at least not in the consumer GPU space)
04:59 imirkin: but intel and amd public docs are half-decent, and each company has full-time staff with access to additional docs who perform the majority of the work on the open-source drivers
05:02 friedbr[41]n: ok, so for gaming it's the older nvidia, or amd
05:03 imirkin: i suspect most people would not say that nouveau is appropriate for gaming on any gpu
05:03 imirkin: depends on what one's looking for, of course
05:04 friedbr[41]n: same thing happened to libreboot, the newer laptops have signed firmware
05:04 friedbr[41]n: or might apply to coreboot as well
05:04 friedbr[41]n: I don't do any gaming
05:05 imirkin:sees no problems with signed firmware
05:05 imirkin: the problem is with people not distributing the signed firmware.
05:05 friedbr[41]n: I'm trying to be a smart guy, do parallel programming stuff
05:06 friedbr[41]n: death to nvidia
05:08 friedbr[41]n: I can understand protecting there code from competition but this is to much
05:18 friedbr[41]n: well, this rules anyway, this development
05:38 friedbr[41]n: yep, this rules on high, die nvidia die
05:38 friedbr[41]n: oh well, this is all I'll be needing
05:38 friedbr[41]n: I mean, sorry for those of you hooked on the latest games
05:39 friedbr[41]n: well, what can I say, all the appreciation in the world
05:39 friedbr[41]n: maybe someday I will be able to do something with the free code
11:38 mupuf: karolherbst: hey, could you ask travis to use either xenial or zesty as a distro?
11:38 mupuf: that should fix the build issue
11:57 whqc: imirkin: any news on https://bugs.freedesktop.org/show_bug.cgi?id=102349 ?
11:57 whqc: we are two weeks further now. would be nice to see something there to start getting those issues fixed
13:07 karolherbst: mupuf: yeah, I'll try
13:12 karolherbst: mupuf: well no, it doesn't fix it
13:12 karolherbst: ohh wait, it used 14.04
13:13 mupuf: yes, use at least xenial
13:13 mupuf: zesty would be better
13:13 karolherbst: well it shouldn't fail for silly reasons to begin with
13:13 mupuf: cmake\
13:13 mupuf: cmake's interface changed
13:14 mupuf: starting from cmake 3.5
13:14 karolherbst: then we should requre a newer version
13:14 mupuf: and we rely on that
13:14 mupuf: yep, we should :)
13:14 karolherbst: ;)
13:14 karolherbst: cmake changes behaviour according to the minimu version specified as well
13:17 karolherbst: mupuf: so yeah... there is nothing newer than trusty
13:18 mupuf: loool
13:18 mupuf: faiiiiiilll!!!
13:18 karolherbst: "As of July, 18th 2017, we’re switching the default Linux distribution on Travis CI from Ubuntu Precise 12.04 LTS to Ubuntu Trusty 14.04 LTS. " :D
13:18 karolherbst: any other questions?
13:19 karolherbst: wait a second
13:19 mupuf: karolherbst: well, maybe we want to add a workaround for travis then
13:19 karolherbst: but we can get a newer cmake somehow
13:19 karolherbst: the osx image has cmake 3.9.4
13:22 mupuf: sounds good then, we could compile on osx :D
13:22 karolherbst: nope
13:22 karolherbst: no libpciaccess
13:23 karolherbst: *sigh*
13:24 karolherbst: mupuf: https://github.com/travis-ci/travis-ci/issues/7437 ;)
13:25 mupuf: hilarious
13:25 mupuf: but yeah, go for it
13:26 mupuf: they are hilarious seriously, They probably run virtual machines with containers inside for running the tests anyway
13:26 mupuf: so how is that much work to maintain multiple distros? :D
13:27 karolherbst: mupuf: :D
13:27 mupuf: or they are just swamped in infrastructure work to keep the jobs running. But seriously, trusty is not going to be supported for too much longer
13:27 mupuf: it is > 3 yo
13:27 karolherbst: mupuf: well usually you have some container clusters
13:27 karolherbst: you don't need virtual machines for this except you only host virutal machines
13:28 mupuf: yeah, they could have chroots too
13:29 karolherbst: well, who knows _how_ they build those containers
13:29 mupuf: anyway, let's go back to this modelling effort
13:29 mupuf: I hate these off by ones
13:29 mupuf: 0: Check freq=59812, duty_max=244, duty_offset=2963, fan speed=0: INVALID DUTY (real=16, computed=15) :s
13:30 karolherbst: mupuf: https://github.com/travis-ci/travis-ci/issues/5821 ;)
13:30 karolherbst: for reference
13:30 mupuf: karolherbst: yeah, that's the one I read
13:31 karolherbst: mupuf: are you 100% you round the same as nvidia? From what I know they tend to round always up with stuff like that
13:31 karolherbst: so even if the "computer" duty is 15.001 they use 16
13:31 karolherbst: *computed
13:31 mupuf: well, that's what a fuzzer is supposed to allow me to know :p
13:31 karolherbst: :D
13:32 karolherbst: reminds me of my voltage work, but I think you have more parameters?
13:33 karolherbst: or more parameters you have to change in the vbios?
13:34 mupuf: I have 3
13:34 mupuf: for now,but I will get 6 at some point
13:35 karolherbst: yeah, I had 1...
13:37 karolherbst: mupuf: the fuck, their "solution" to this problem: just use proper docker images. ... the fuck
13:37 mupuf: yep :D
13:37 mupuf: and can we do this for free?
13:37 karolherbst: sure
13:37 mupuf: just use osx, seriously
13:37 mupuf: and only care about clang
13:37 karolherbst: everybody can push to the public docker repository...
13:37 karolherbst: or not?
13:37 mupuf: oh, if there are already images, then use that
13:52 karolherbst: lol
13:53 karolherbst: I should get me a hipster CI job: "write working travis-ci docker CI scripts on first try"
13:53 karolherbst: mupuf: https://travis-ci.org/karolherbst/envytools/builds/297551820
13:54 mupuf: karolherbst: well done!
13:55 karolherbst: not sure if the build fails if something went wrong though
13:56 karolherbst: mhh "Warning: nvapy won't be built because of un-met dependencies (python3 and cython)"
13:56 karolherbst: do I need to install cython3?
13:56 karolherbst: sure...
14:48 imirkin: karolherbst: ping re running that piglit on blob
14:48 karolherbst: imirkin: yeah, it is on my todo list for today :)
14:49 imirkin: hehe ok
14:49 imirkin: i sent a v2 of it: https://patchwork.freedesktop.org/patch/186449/
14:49 karolherbst: yeah, already seend that
14:49 karolherbst: *seen
14:50 imirkin: both with and without -compat would be nice.
15:13 tobijk: imirkin: so if you want piglit results on the blob, do you now have recent ones for kepler? :D (i want to compare them with my pascal results)
15:13 imirkin: tobijk: i just want the result for that one piglit
15:14 tobijk: meh :/
15:14 imirkin: which isn't really a test in the first place, it's just the only way i know how to write GL apps :)
15:14 imirkin: [and there are convenient helpers like piglit_build_simple_program() and so on]
15:15 tobijk: :)
15:15 mupuf: Stats: pass=621/1478 (42.02%), fail=857/1478 (57.98%)
15:16 mupuf: I guess I still have more work to do on this fan calibration model :D
15:16 mupuf: but not so bad!
15:16 imirkin: tobijk: looks like last time i ran piglit on kepler was: 585552 Jul 7 2015 nv108-2015-07-06-ilia.xz
15:16 karolherbst: mupuf: nice
15:16 karolherbst: mupuf: well, depends on how big the error is
15:16 tobijk: imirkin: yeah thats a bit old
15:16 mupuf: karolherbst: yeah, I need to quantify that. A lot of them are likely to be off-by-ones
15:17 imirkin: i have someone's gm204 run from april 2016
15:17 mupuf: which I will likely accept as they are not the end of the world
15:17 karolherbst: mupuf: yeah, we can ignore all where we get the right result by rounding up
15:17 imirkin: mupuf: keep in mind ... no floating point
15:17 mupuf: imirkin: yes! There are no floating points in my current algo :)
15:18 imirkin: ok good
15:18 imirkin: also... round numbers get used a lot more often than random ones
15:18 imirkin: can you show your current algo?
15:19 mupuf: imirkin: https://github.com/envytools/envytools/commit/e2fccb8f306c60c6e7d30e2bcc78073ea724fb07#diff-9b8faddf90d0a947a6cb6243bb8f1411R290
15:19 imirkin: i thought you said no float
15:20 imirkin: 13.5e5 = float
15:20 karolherbst: I thought it as first too, but ther is no float ;)
15:20 karolherbst: imirkin: ;) mupuf was just lazy and "safed" some zeros
15:20 imirkin: which means that first *div division is a float one
15:20 imirkin: and then converted to int
15:20 mupuf: fair point, I'll change that
15:21 imirkin: also, how sure are you about that number?
15:21 imirkin: e.g. are you sure it's not 0x150000 ?
15:21 mupuf: imirkin: this is crystal clock / 2
15:21 mupuf: I can't get that in nvbios
15:22 feaneron: i have a nvidia 920m (kepler appearently). tried to change the power mode by "# echo 0f > /sys/kernel/debug/dri/1/pstate", and it worked the first time. after a reboot, though, when i try to do that again, the shell hangs really hard (blocks even rebooting).
15:22 mupuf: but all gpus use a 27 MHz crystal nowadays (AKA, at least starting on geforce 8+)
15:22 karolherbst: feaneron: make sure the GPU isn't suspended
15:22 imirkin: mupuf: well, the number you use is 0x149970. just saying... round numbers are more frequent ;)
15:22 feaneron: now, 'pstate' is showing "DC: core 0 MHz memory 0 MHz"
15:22 feaneron: ok, let me check (not sure how...)
15:22 imirkin: feaneron: that means it's powered off
15:22 karolherbst: feaneron: upstream nouveau doesn't like to reclock while the GPU is off
15:23 karolherbst: feaneron: well clocks being 0 means the gpu is off ,)
15:23 karolherbst: ;)
15:23 tobijk: feaneron: make sure the card is not sleeping while reclocking, e.g run glxgears and then reclock
15:23 mupuf: imirkin: :) I have 0 div-related issues
15:23 mupuf: so far, at least
15:23 imirkin: ok
15:23 feaneron: yeah. i thought either that, or it's hardware blocked (is that even possible?)
15:23 karolherbst: feaneron: the driver is blocked
15:23 karolherbst: waiting for the hardware to do something...
15:24 imirkin: mupuf: also are you sure you're not getting overflow?
15:24 imirkin: you made all those things int16
15:25 karolherbst: mupuf: uhm... are you sure about the div comming out of the second call?
15:25 karolherbst: ohh wait, no. it is fine
15:25 mupuf: I am not testing the C model just yet. I use one in python. But since I just moved away from having synchronous testing, I should probably not waste time with python for verifying the model anymore.
15:26 tobijk: karolherbst: you had luck with mainline and the vblank bug btw? (havent read the logs)
15:26 mupuf: yeah, let's go back to the C model
15:26 karolherbst: tobijk: as it seems on master it just disallows me to rmmod
15:26 tobijk: ow
15:27 tobijk: yet suspend is working now?
15:28 feaneron: "$ DRI_PRIME=1 glxgears" appearently worked, but pstate still shows that the gpu is off... "$ DRI_PRIME=1 glxinfo" shows the correct card and architecture
15:28 feaneron: does it matter i'm on wayland? those tools are x11 centric, but xwayland should have taken care of that
15:30 feaneron:wants to learn how to do that from linux since booting on windows seems to wake the gpu correctly
15:30 tobijk: feaneron: could be the problem, you could try with an XServer session
15:31 feaneron: sure. brb
15:35 feaneron: ok. in a x11 session, running glxgears woke the gpu up indeed
15:36 feaneron: thanks!
15:38 feaneron: perhaps i should've increased fan speed before changing the power mode of the gpu?
15:45 feaneron: oh boy. sorry if someone pinged me. that command halted my whole system :(
15:49 feaneron: AHA. there must be a program running on top of the nvidia gpu to make that command run!
15:49 feaneron: sorry for the noise
15:49 imirkin: and note that when the GPU powers off, the setting is lost, i think
15:50 ejwf: hi imirkin, nice to see you here. Have you took a look into the mesa nv4x problem?
15:50 feaneron: imirkin: that's right
15:50 imirkin: ejwf: nope
15:51 feaneron: do i have to manually set the fan speed to prevent overheating?
15:51 imirkin: feaneron: laptop, right?
15:51 feaneron: yes
15:51 ejwf: imirkin: any plans on that?
15:51 imirkin: feaneron: fan is *generally* controlled by the EC (embedded controller)
15:57 imirkin: or ... what are you comparing?
15:57 mupuf: right now, I am dumping the duty at fixed points: 0, 20, 40, 60, 80, 100%
15:58 imirkin: does the blob also do it 0..100?
15:58 mupuf: yes
15:58 imirkin: i see.
15:58 feaneron: ejwf: i really avoid using windows :) also, looks like nouveau doesn't fully use the gpu yet, so the chances of overheating are somewhat low
15:58 mupuf: I have hundreds of lines like this in my db now: 50573,2763,-69,20,26,3
15:58 imirkin: what are those numbers?
15:58 mupuf: they mean: freq, duty_max, duty_offset, speed, div, duty
15:59 imirkin: k
15:59 imirkin: and that's from blob, yeah?
15:59 mupuf: yes
15:59 mupuf: that give me this ground truth DB
15:59 imirkin: gotcha
15:59 imirkin: if you'd like assistance, send me that full data
15:59 mupuf: and then I simply open that and read line per line and check if I can match the numbers or not
16:00 mupuf: thanks :)
16:00 mupuf: I am now writing the standalone C program that will compute and compare
16:00 imirkin:recommends python
16:00 imirkin: a lot easier to change stuff
16:00 imirkin: and it's not like speed matters - you have a tiny amount of data
16:00 ejwf: feaneron: then try lm-sensors
16:01 mupuf: sure... but then I get issues like rounding and non-16 bit arithmetics
16:01 mupuf: so... not sure it helps
16:01 imirkin: rounding?
16:01 imirkin: python has integers :p
16:01 imirkin: and numpy can help with stuff too
16:01 imirkin: anyways, do whatever you're comfortable with
16:02 mupuf: imirkin: https://pastebin.com/raw/P5jkXm7L <-- that's my python code right now
16:03 imirkin: div = int(1350000 / freq)
16:03 imirkin: so like stuff like that
16:03 imirkin: shouldn't be necessary
16:04 imirkin: dunno, maybe python3 broke stuff
16:04 karolherbst: yeah
16:04 karolherbst: python3 defaults to / producing floats
16:04 imirkin: wow.
16:04 karolherbst: even if both arguments are ints
16:04 imirkin: yet another reason to not touch python3.
16:04 karolherbst: use // if you want an int division
16:05 mupuf: karolherbst: thanks for the info
16:05 imirkin: in py2:
16:05 imirkin: >>> 1/2
16:05 imirkin: 0
16:05 karolherbst: in py3: 1//2
16:05 karolherbst: dunno. I think if I would have designed python from the start I would have choosen the py3 behaviour
16:06 karolherbst: rounding should be explicit
16:06 imirkin: apparently guido agrees with you
16:06 karolherbst: well a programming language is there to help programmers, not to shorten the code
16:07 imirkin: yeah. this isn't helping.
16:07 karolherbst: depends on what your assumptions are
16:07 imirkin: my assumption is... dividing an int by an int gives me int division.
16:07 karolherbst: coming from a C world yeah, it kind of makes sense
16:07 karolherbst: but python behaves more like a dynamic typed language anyhow
16:07 imirkin: coming from any world that has integers.
16:07 karolherbst: so type changes can happen kind of randomly anyhow
16:08 karolherbst: it is even worse for languages like SmallTalk and ObjC, where the object itself changes its type after a method call ;)
16:08 mupuf: karolherbst: that's the thing I do not like about python. I wish I could force a type on demand
16:08 karolherbst: not the variable
16:08 karolherbst: mupuf: don't use a dynamically typed langauge then ;)
16:09 mupuf: karolherbst: I may not want to type everything, but when I want to, I should be able to
16:09 mupuf: and for interfaces, you WANT to type shit
16:10 pmoreau: mupuf: You can give type hints for functions, not sure about variables though https://docs.python.org/3/library/typing.html
16:11 karolherbst: mupuf: well true, but nobody until now figured out a perfect way which fits everybody
16:13 imirkin: mupuf: just use python (aka python2) and move on with life.
16:17 mupuf: imirkin: this will not solve my signedness problem
16:33 karolherbst: mupuf: http://www.mypy-lang.org/
16:44 karolherbst: imirkin: on nvidia both gives the same result
16:44 karolherbst: and few output
16:45 karolherbst: so nvidia does "0*inf = 0" for compat and core
17:16 imirkin: karolherbst: ok, thanks for confirming.
17:17 imirkin: karolherbst: this is on GK106 btw?
17:17 karolherbst: yes
17:17 imirkin: mupuf: numpy
17:17 imirkin: mupuf: has types like uint32 and int32 which do the things you expect
17:17 karolherbst: imirkin: if you want I can check onf gm107 as well
17:17 imirkin: (and probably int16/uint16)
17:17 imirkin: karolherbst: nah, i'm sure that's the same. tesla would be good to check tho
17:18 karolherbst: imirkin: ask again on Monday, I _might_ be able to test on tesla
17:18 imirkin: mupuf: you still got that nva3 plugged in?
17:18 karolherbst: imirkin: tesla as in gtXXX right?
17:18 imirkin: karolherbst: as in G80 - GTxxx
17:19 karolherbst: imirkin: I have some 8600 GS cards here, I could check in the office on those
17:19 imirkin: i have a G80 trace here that indicates that the 0*inf=0 behavior is used
17:19 imirkin: but i'd like to confirm in both core and compat
17:19 mupuf: imirkin: nope, not right now
17:19 imirkin: although... if nvidia does this for kepler, it probably also did it for tesla
17:19 mupuf: but I could plug that
17:20 imirkin: mupuf: well, your call. it's not super important.
17:20 karolherbst: mupuf: do you have a detailed list of GPUs you have? I might be interested in some of them if you have duplicates.
17:20 mupuf: hmm, the vbios repo is a good start, but no, I do not have a clear inventory
17:20 mupuf: maybe I could start a google doc for that
17:21 karolherbst: mupuf: I should visit you, then I force you to do that :p
17:21 mupuf: hehe
17:21 mupuf: better, I would rather have them plugged and ready for people to use
17:22 karolherbst: mupuf: well, I am most likely won't exactly need reator anymore in the near future, depending on what cards I have access to
17:22 mupuf: sounds good :)
17:22 mupuf: more control for you!
17:22 karolherbst: currently I have direct access to a pascal/maxwell1/kepler1/and those old teslas
17:23 mupuf: the problem with randomizing the parameters for the fan is that they are often loud, and sometimes abnoxiously high-pitched!
17:24 karolherbst: mupuf: maybe it is a good thing I never got involved into that fan stuff, because then I won't annoy others at the office as well :D
17:24 karolherbst: now that I think about that
17:35 mupuf: hehe
17:53 imirkin: oh. that's why i never looked into bioshock. i don't have a copy.
17:53 imirkin: o well.
17:53 karolherbst: imirkin: what is wrong with bioshock?
17:54 imirkin: the perf seems to be *especially* bad
17:54 karolherbst: imirkin: most likely the shader cores were caped
17:54 karolherbst: I mean we don't use the boost clocks by default
17:54 imirkin: huh?
17:54 karolherbst: the NvBoost config
17:55 imirkin: i mean relative to blob, it's much worse than other games
17:55 imirkin: nvboost means it's off by like ... 10%?
17:55 karolherbst: 20%
17:55 karolherbst: for me it is 705 vs 862 MHz
17:55 imirkin: that's with =1 or =2?
17:56 karolherbst: 862 on 2
17:56 karolherbst: in avarage it should be around 15%
17:56 karolherbst: but yeah, the result still looks bad
17:56 imirkin: anyways, feels like it's extra-bad
17:56 karolherbst: For me I think it was much faster? dunno
17:56 karolherbst: the benchmark is crappy anyhow
17:57 karolherbst: I got like 40% difference between the worst and the best run
17:57 karolherbst: we have a lot of stuttering though
17:57 karolherbst: ohh yeah, that explains
17:57 karolherbst: right...
17:57 imirkin: anyways, i really should look at getting a non-shit gpu so that i can look at perf things
17:57 imirkin: coz it's a bit of a joke to do it on my GK208 with DDR3
17:58 karolherbst: imirkin: I think with bioshock we are slow when the game loads scene data or something like that
17:58 karolherbst: as long as you are kind of in the same area, perf is pretty good
17:58 karolherbst: but moving too much causes sudden stutters
17:58 karolherbst: wondering what that's about
17:59 imirkin: what's like the shittiest GPU i could get that's still supposedly "decent perf"?
17:59 imirkin: GTX 660?
17:59 mupuf: yeah, the X60 series are usually OK performance-wise
18:00 karolherbst: imirkin: 650 Ti shouold be fine as well
18:00 karolherbst: my GPU is around the perf of a 650 Ti
18:00 imirkin: man, these things are huge
18:00 karolherbst: yeah
18:01 imirkin: 650ti is more compact
18:04 mupuf: imirkin: make sure it has GDDR5, that's a good indication that the performance won't be the absolute worsr ;)
18:04 imirkin: i only have one GPU with GDDR5 - the GT215 :)
18:04 imirkin: which naturally doesn't support reclocking
18:17 pmoreau: imirkin: Really? Shouldn’t all Tesla past G94 be reclockable? Or just not the ones with GDDR5?
18:18 imirkin: pmoreau: yep :)
18:18 pmoreau: Ah, too bad :-(
18:20 mupuf: pmoreau: the tesla gpus are not properly reclockable
18:20 mupuf: you may or may not be lucky
18:20 RSpliet: pmoreau: also, GT21x with DDR2 are "reclockable", but I observed display corruption in the highest perflvl
18:20 imirkin: mupuf: no, they are
18:20 mupuf: imirkin: the nvaX series is OK (except GDDR5)
18:21 mupuf: but before that, no guarantees
18:21 RSpliet: mupuf: (GDDR3) G98 and G200 should be fine as well
18:21 pmoreau: RSpliet: OK
18:21 RSpliet: MCP7x reclocks well
18:21 imirkin: RSpliet: what about G94/G96?
18:22 RSpliet: The GDDR3 ones I own change DRAM freqs fine, but there's some work that I wanted to pick up that was mandatory for G92 which I suspect will benefit all other cards too
18:24 mupuf: RSpliet: but I guess you do not have all the necessary GPUs too
18:24 mupuf: maybe we could do something about this next time we see
18:24 RSpliet: Yeah... don't know when that will be
18:25 mupuf: I guess there is no urgency anyway
18:25 RSpliet: PhDs eat up a remarkable amount of time too... I'm generally keeping myself entertained quite well
18:26 mupuf: yep :D They do have this tendency!
18:28 mupuf: gosh I hate these off by ones!
18:32 mlankhorst: I imagine it's easier than learning finnish though :)
18:34 mupuf: mlankhorst: I see similarities, when you make a mistake, it could be from anywhere ;)
18:34 mupuf: and speaking about it, I should probably do my exercises :s
18:35 karolherbst: mupuf: going into a pub and speaking finnish to random people ;)
18:35 karolherbst: it is getting easier the later the night is :P
18:35 RSpliet: that's what I always do after a couple pints of whiskey
18:36 RSpliet: or
18:36 imirkin: RSpliet: go into a pub and speak finnish to random people?
18:36 imirkin: you might come off as eccentric if you do that in the uk...
18:36 RSpliet: that's what mupuf always tries, but he talks so much they never let him finnish
18:36 karolherbst: my eye...
18:37 RSpliet: imirkin: well, I don't know what language you speak after a bottle of whiskey, but I bet it bears little resemblance to anything Anglo-Saxon
18:37 imirkin: hehe
18:37 imirkin: yeah, a couple pints of whiskey, might start sounding finnish
18:49 mlankhorst: learn swedish, get drunk, talk danish :)
18:51 pmoreau: :-D
18:51 pmoreau: I should try that
18:56 mupuf: mlankhorst: ah ah
18:56 mupuf:thinks pmoreau and mlankhorst should try their swedish on each other ;D
19:02 mupuf: well, seems like I was not crazy before. The behaviour of the duty cycle when I vary the duty_offset parameter with the fan speed set to 0 is still discontinuous
19:05 imirkin: perhaps there's another value you're missing?
19:06 mupuf: imirkin: perhaps, but I am only changing one parameter at a time here :s
19:08 imirkin: hmmm ok
19:09 imirkin: but i mean, perhaps the gpu has another output which makes this all "continuous"?
19:12 mupuf: imirkin: http://fs.mupuf.org/pwm_offset.png
19:12 mupuf: this is for fan_speed=0%
19:12 mupuf: and duty cycle, is what you need to write to the HW
19:13 imirkin: feels like it's a bitfield and is being misinterpreted. or there's a bit missing. or something.
19:13 mupuf: there is no HW involved otherwise here. I am trying to find the function that maps fan_speed to the duty cycke
19:13 imirkin: (it = pwm offset)
19:13 pmoreau: mupuf: Before or after drinking? :-)
19:14 imirkin: mupuf: anyways, if you give me the raw data, i can play around with it
19:14 mupuf: imirkin: yes... except ... the frequency influences linearly the period of this oscilation
19:14 imirkin: otherwise i doubt i can contribute much
19:14 mupuf: yep
19:14 mupuf: I will package this up
19:15 imirkin: what's "duty cycle"?
19:15 imirkin: [i know what it is generically, i mean specifically here.]
19:16 mupuf: how many cycles the output should be ON
19:16 mupuf: and DIV == period in cycle
19:16 imirkin: that's the generic explanation
19:16 imirkin: i mean specifically here... where is that value coming from?
19:18 mupuf: not sure what else I can say. It really is just that. But maybe this will help: https://pastebin.com/raw/U4Um1syh
19:19 imirkin: ret = sscanf(line, "%hu,%hu,%hi,%hu,%hu,%hu\n", &e.pwm_freq, &e.duty_max,
19:19 imirkin: &e.duty_offset, &speed, &div, &duty);
19:19 imirkin: where does that "duty" value come from?
19:19 mupuf: this one comes from the blob, read from register e118
19:19 mupuf: div == register e114
19:20 imirkin: so it's literally nv_rd32(e118) ?
19:20 mupuf: yes
19:20 imirkin: k cool
19:20 imirkin: that's what i meant ;)
19:20 mupuf: :)
19:20 mupuf: that's the only two regs that matter
19:20 imirkin: but there's no (strong) reason that it MUST be the "duty" value. could be a bitfield or whatever.
19:21 imirkin: anyways, i'll have a look when you send me that CSV
19:21 mupuf: it is still acquiring data
19:21 mupuf: the image I sent you was from last year
19:21 mupuf: and the format was crap and unusable
19:22 mupuf: each data point takes ~1s to make, that makes it quite annoying
19:22 mupuf: but that also means I can collect data in the background
19:22 imirkin: how many data points do you have?
19:23 mupuf: 1 frequency, 400 points
19:23 mupuf: pwm frequency*
19:23 imirkin: should be more than enough
19:23 mupuf: ah ah, well, if you want, I can give that to you right now
19:26 mupuf: imirkin: http://fs.mupuf.org/nvidia/fan_calib/
19:26 mupuf: 6b11_sweep_new.csv is when changing only the pwm_offset (offset 6b11 in my vbios)
19:26 mupuf: and ground_truth_db is random datapoints
19:27 imirkin: i just want the csv right?
19:27 mupuf: yeah, probably
19:28 mupuf: I honestly do not think that the DUTY register is a bitfield
19:28 mupuf: it would make very little sense and would mean that they change the chipset based on which fan is connected
19:29 mupuf: (as duty functions just as expected on most GPUs)
19:31 imirkin: interesting
19:32 imirkin: it's mostly groupped in 6's
19:32 imirkin: which does not lend a lot of credibility to the bitfield thing
19:32 mupuf: imirkin: I uploaded a new one, with the frequency set to 2048Hz
19:32 mupuf: and ... they are grouped in 3's ;)
19:33 imirkin: yeah
19:33 imirkin: but kinda 6's
19:34 mupuf: if you try another frequency, you'll see it change... And the pwm_,max had no impact on the value when the fan speed is 0%
19:35 imirkin: yeah wait, why are all these speed values 0?
19:35 mupuf: because this is what I am trying to reverse engineer here. What is the impact of the pwm_offset?
19:36 mupuf: and when speed is set to 0%, only the frequency and the pwm_offset influence the duty cycke
19:36 imirkin: right yeah
19:37 mupuf: any other value depend on this base... which makes it hard
19:39 mupuf: I spent so long on trying to figure out this parameter... I think I will just give up and ask nvidia
19:39 mupuf: but at least I can show everything I gave
19:39 mupuf: have*
19:41 mupuf: imirkin: FYI, this was the best model I found: http://fs.mupuf.org/nvidia/fan_calib/pwm_offset_model.py
19:41 imirkin: yeah, that's not logical though
19:41 imirkin: better to ask yourself ... why on earth does it even look like that
19:41 mupuf: as you say
19:42 mupuf: well, I wish I knew
19:42 imirkin: and i think understanding that will present a more logical answer
19:42 mupuf: So, this table is trying to say what range of the duty cycle you should use to represent 0 -> 100%
19:43 imirkin: i.e. there's no reason for this periodicity without further understanding
19:43 mupuf: the first parameter is how high it can get
19:43 mupuf: and it makes sense
19:43 mupuf: and this is the pwm_max parameter
19:43 imirkin: unless you're varying more than just the pwm offset
19:44 imirkin: duty_max,duty_offset are in the vbios right?
19:44 mupuf: the second one is an offset on the x axis, that allow shifting the values in both directions
19:44 imirkin: adjacent 2-byte things?
19:44 mupuf: yes, and so is the frequency
19:44 mupuf: and yes, the 3 of them are next to each other
19:44 imirkin: and the fact that you call them that is just for convenience
19:45 imirkin: i.e. a totally different interp is theoretically possible right?
19:45 mupuf: although the frequency is ignored and read from the thermal table, as it used to before this table got introduced
19:46 mupuf: imirkin: well, given that nvidia always interpolates linearly between the low and high points, it makes sense to think that you are simply mapping a linear function into an affine one
19:46 mupuf: which only requires two parameters
19:46 mupuf: and to some extent, this is what they have... except discontinuous
19:47 imirkin: have you tried varying duty_max while keeping duty_offset constant (and speed = 0)
19:47 imirkin: ideally that should have no effect on the output
19:47 imirkin: (assuming your interp is correct)
19:47 mupuf: yes
19:47 mupuf: and this is what I saw
19:47 mupuf: let me check again
19:48 imirkin: ok. what about futzing with some of those other unks?
19:48 mupuf: no impact has been found :s
19:48 imirkin: basically, other than changing duty_offset, with speed = 0, can you cause anything to change?
19:48 mupuf: I suspect they are for the other modes
19:48 imirkin: also, i wonder if speed = 0 is a valid thing to even look at
19:48 imirkin: iirc most things limit down to like 30
19:49 imirkin: so perhaps it's a weird artifact of some other computation
19:49 mupuf: well, I found that there is always a linear interpolation between the 0 and 100% points
19:49 mupuf: so... I need to find these two points
19:50 imirkin: do you see the weird alternating behavior at speed = 100?
20:47 imirkin: RSpliet: was your scheduling stuff pre- or post-RA?
20:47 imirkin: RSpliet: i'm like 99% ready to write the dumbest instruction scheduler known to man
20:48 imirkin: because i'm sick of these stupid issues
20:48 RSpliet: Pre-RA
20:49 imirkin: so basically i want to just move all loads closest to their first use
20:49 imirkin: and nothing else
20:49 imirkin: i realize this isn't perfect, but it's a whole lot better than loads being super-far from their first use and taking up a reg
20:49 RSpliet: closest?
20:49 imirkin: so like x = load(); other op; use(x);
20:50 imirkin: becomes other op; x = load(); use(x)
20:50 RSpliet: Hmm... but too close and you'll stall up to 400 cycles
20:50 imirkin: and yeah ... latency bla bla bla
20:50 karolherbst: imirkin: I am all for it
20:50 imirkin: but the alternative is the current situation
20:50 imirkin: which is
20:50 imirkin: 75 loads
20:50 imirkin: which starts spilling/etc
20:50 imirkin: and then those loads get used
20:52 karolherbst: imirkin: what do you think has a bigger perf impact: reordering those loads or disabling merging of 32bit loads to bigger ones?
20:52 RSpliet: Could you check whether my branch makes improvements first? It should do something slightly more clever - try and enforce a distance of idk 400 or 600. There's some params to play with there
20:52 imirkin: RSpliet: pointer?
20:52 imirkin: karolherbst: reordering the loads
20:53 RSpliet: imirkin: https://github.com/RSpliet/mesa/tree/insn_sched
20:53 imirkin: RSpliet: ok. i haven't looked yet, but please don't be offended if i end up going with the simple thing.
20:53 karolherbst: imirkin: mhh maybe we should stop merging those loads then, because this workarounds a serious bug within the spilling code
20:53 imirkin: basically if your thing is too clever it may prove too much to review :)
20:53 RSpliet: imirkin: wouldn't be offended
20:53 karolherbst: don't know if there is another bug related to spill such stuff
20:53 RSpliet: It's all quite ad-hoc
20:54 imirkin: the thing i'm proposing is trivial to implement and understand
20:54 imirkin: karolherbst: the spilling issues are unrelated.
20:54 karolherbst: my solution is even more trivial ;)
20:54 imirkin: karolherbst: your solution is for an unrelated problem.
20:54 karolherbst: imirkin: ahh, then why reordering?
20:54 RSpliet: but it does have the "keep like 400 cycles distance" (or w/e I have in the ArsePull Cost Model(tm) at this time of day)
20:54 karolherbst: imirkin: to improve perf where we have bad orders?
20:54 imirkin: RSpliet: yeah, i definitely don't like the idea of sticking them in next to each other
20:54 karolherbst: or does it have other impacts?
20:55 imirkin: RSpliet: but the alternative is more complicated.
20:55 imirkin: karolherbst: no. to decrease reg usage when we have loads that are far away from uses.
20:55 karolherbst: yeah, okay, that's what I meant
20:55 imirkin: which, in extreme cases, can lead to spilling which is doubly-idiotic
20:55 imirkin: esp since we don't have rematerialization
20:55 imirkin: so you end up storing a 1 in lmem and then loading it from lmem
20:56 karolherbst: ....
20:56 karolherbst: I see
20:56 imirkin: imagine you load up values 1..255 into regs
20:56 imirkin: etc
20:56 imirkin: (moral of the story: don't do that.)
20:56 karolherbst: yikes
20:57 imirkin: but this is what ends up happening
20:57 imirkin: due to no fault of the shader writer
20:57 karolherbst: okay
20:57 RSpliet: imirkin: got a shader that does that? Sounds like a fun shader to play with, 'cos I'm pretty sure my branch still is shite in that case if the loads are all in the same BB
20:58 RSpliet: (on account of "highest cost first, line 227)
21:04 imirkin: mmmmmm
21:04 imirkin: i'll find it
21:04 imirkin: some piglit obviously
21:04 imirkin: or deqp test
21:04 imirkin: but basically it's like store(vec4(1,2,3,4)); store(vec4(5,6,7,8)) etc
21:05 imirkin: and somehow the immediate loads end up on top
21:05 imirkin: and all the stores on bottom
21:05 RSpliet: Oh yeah, the immediates need to be pushed wa-ha-hay down afaik
21:05 imirkin: right. but they have to be in registers
21:05 imirkin: coz that's how store works :)
21:06 RSpliet: sure! Think my "highest potential liveness reduction" heuristic should take some care of that
21:08 RSpliet: because they're low-cost and increase liveness by 1. Same doesn't hold for a string of ld's though, because they're high cost and that trumps the liveness-effect currently
21:09 RSpliet: not clever enough to realise that 75 lds in a row isn't particularly useful for masking latency (bet there aren't that many MSHR equivalents)
21:09 imirkin: might have only been 64 loads :)
21:09 imirkin: but still ... i'd rather have the 64 regs
21:10 RSpliet: any idea whether NVIDIA is clever with that shader?
21:10 imirkin: no
21:10 imirkin: [no idea]
21:10 imirkin: i can't seem to locate it
21:11 imirkin: i think it was in dEQP somewhere (and it was failing)
21:26 imirkin: RSpliet: but a simple example is https://hastebin.com/hihuyuteza.cs -- see how with my latest patch to use the LOAD_CONSTBUF stuff, those const loads end up on top instead of near the exports
21:26 imirkin: RSpliet: same basic idea
21:26 imirkin: RSpliet: i.e. this is the result now: https://hastebin.com/ebanuqinin.bash
21:27 imirkin: just because of some changes in the code regarding precisely where the load ops are getting inserted
21:28 RSpliet: as long as reg usage stays below 32 that looks desirable, but I got the general idea yes :-)
21:28 imirkin: constbuf loads are not *that* slow
21:29 RSpliet: is it a separate SRAM, or "merely" a cached read-only buffer (so no coherence traffic)?
21:30 RSpliet:regrets setting his pupils such tedious questions... because it makes for tedious marking
21:30 imirkin: mmm
21:30 imirkin: i think it's pre-cached
21:31 imirkin: i.e. it all gets loaded into L2 or something
21:31 imirkin: not sure though
21:31 imirkin: there's definitely a staging buffer for all the UBO stuff
21:31 imirkin: outside of the VRAM-backed stuff
21:31 imirkin: that also enables parallel draws to work
21:32 imirkin: (with different uniform values for each draw)
22:02 pmoreau: RSpliet: Ah ah! Good luck with the marking! I’m almost done with mine. O:-)
22:08 RSpliet: Thanks. Just finished... but I realise I'm a timezone behind you
22:10 pmoreau: I’ll be finishing tomorrow anyway, as I don’t have access to the other half of the exam.