01:57karolherbst: Tom^: wanna test some dynamic reclocking stuff?
01:58karolherbst: or anybody else?
01:58karolherbst: should be rather stable now
02:08karolherbst: mupuf: so I think all PMU related issues are fixed now :) the only thing missing for dynamic reclocking is now the power budget stuff :p
02:09pmoreau: karolherbst: Detail! :-)
02:09karolherbst: :D
02:09karolherbst: pmoreau: though my dyn reclocking stuff is only for gpus with a pmu
02:09karolherbst: so gt215+
02:09karolherbst: works for me :D
02:09pmoreau: Grrrr…
02:10pmoreau: I'll cancel your room for FOSDEM! :-p
02:10karolherbst: :D
02:10karolherbst: pmoreau: well the PMU only triggeres the reclocking
02:10karolherbst: so for pre gt215 cards we only need a different trigger mechanism
02:10karolherbst: pmoreau: that's the pmu code: https://github.com/karolherbst/nouveau/commit/effcdb2a1ab980bff62139ccd473320eeb85f023#diff-5e5cb4582f6faff078d1cad6144b248a
02:10pmoreau: I'll be able to test your patches once I set up my own Reator.
02:11pmoreau: (Which I should really start looking into!)
02:11karolherbst: pmoreau: what is the pmu thing before gt215 by the way?
02:11karolherbst: or does the code have to run on the host?
02:11karolherbst: ohhh wait
02:11karolherbst: I also use the pmu counters :/
02:11pmoreau: I have no idea. sorry :-)
02:12karolherbst: mhhh
02:12karolherbst: now that I think of it, my code makes little sense for those old gpus anyway
02:12karolherbst: because cstates weren't a think then
02:12karolherbst: *thing
02:13pmoreau: Dunno whether my laptop has any cstates
02:13karolherbst: don't think so
02:16RSpliet: cstates aren't even a thing on GT21x
02:16karolherbst: RSpliet: sure? :/
02:17karolherbst: because I think I saw some
02:18RSpliet: find me a counterexample and I'll believe you :-P
02:18RSpliet: I'm quite sure about that yes
02:19karolherbst: k, didn't find any
02:29mupuf: karolherbst: what was the issue with the IRQ then?
02:29karolherbst: mupuf: 1. https://github.com/karolherbst/nouveau/commit/c62c59910255b8ffaaa7c8945c6156be3af145c6
02:29karolherbst: mupuf: 2. https://github.com/karolherbst/nouveau/commit/d9f1c7ec2e46c9059c8376e0c3155d998ec0185f
02:30karolherbst: both pretty much bs like issues :D
02:31karolherbst: well the second was important when the kernel schedules processes a bit too freely
02:32karolherbst: but the pmu acking those interrupts was a good one
02:35mupuf: :o
02:36mupuf: good catch, for sure!
02:45karolherbst: anybody complaints about these 3 commits? https://github.com/karolherbst/envytools/commits/nvbios_unk50
02:45karolherbst: I want to push them so that I don't have to manage my local installation :D
02:48mupuf: unk5c is ok
02:48mupuf: hiding base clock entries is ok too
02:49mupuf: actually, please give a better name for both the 50 and 5c
02:49karolherbst: ohh right
02:50mupuf: and please state what t0 and t1 are for unk50
02:50mupuf: then you can push
02:50mupuf: oh, and get rid of the WIP too
02:50karolherbst: no idea, I just know they are temperatures
02:51karolherbst: and that t0 is lower than t1
02:52karolherbst: to be honest, I have no clue what the unk50 table is about, there is some more stuff in that, but besides that, no idea
02:53karolherbst: the entries are quite big though
02:59karolherbst: mupuf: UNK TEMPERATURE and FAN MANAGEMENT?
03:00karolherbst: though management somehow sounds wrong
03:00mupuf: karolherbst: why divide it by 32 then?
03:01mupuf: Fan trip points?
03:01mupuf: fan policy?
03:01karolherbst: mupuf: because after dividing it by 32 I get usefull data: 82.53 and 95.00
03:02mupuf: I see
03:02mupuf: keep the unk50 then
03:02mupuf: and rename the other one to fan management
03:02karolherbst: k
03:02mupuf: FAN_MGMT
03:06karolherbst: https://github.com/karolherbst/envytools/commits/nvbios_unk50
03:09karolherbst: mupuf: and the unk5c also has those divides by 32, so I guess the numbers are multiplied by 32 for precision, like they also increased precision with the pwm for volting
03:09karolherbst: mupuf: now I am curious: maybe there is another temperature sensors somewhere with this increased precision, too?
03:10mupuf: doubt that, there is a .5°C resolution already
03:10mupuf: and it is really noisy
03:10mupuf: and you can always read straight from the ADC and use the parameters yourself if you need the most accurate reading
03:11mupuf: possible
03:12karolherbst: so any complaints left about the commits?
03:26dodomorandi: Hello everyone. I am having lots of troubles with a damn nVidia card... Can I ask you some help about MMIO tracing?
03:29karolherbst: ....
03:29karolherbst: now I wanted to answer :D
03:30karolherbst: dodomorandi: what kind of problems do you have?
03:33dodomorandi: I have an old onboard nvidia, and the proprietary driver seems to have troubles with composition. Nouveau, unfortunately, does not seem to like the card, and the screen flickers a lot. I was trying to trace mmios to provide some info, but i only get a black screen
03:34karolherbst: mhhh
03:34karolherbst: what card is that?
03:34dodomorandi: Geforce 7025/ nForce 630a
03:35dodomorandi: I have got two identical pc, and both have the same issue
03:37dodomorandi: The problem is that i do not know if the mmio tracing has been successful or not
03:37karolherbst: dodomorandi: well you should get a _big_ file
03:38dodomorandi: If i try to send a nop to the tracer, i got a "busy resource" error
03:38karolherbst: yeah
03:38karolherbst: you have to stop the cat command first
03:38karolherbst: then nop the tracer
03:38dodomorandi: Oh, ok
03:38karolherbst: but while tracing you should be able to start X and anything else on it
03:38karolherbst: otherwise the trace will be useless
03:38karolherbst: most likely
03:39dodomorandi: From logs, it seems that nvidia driver is stuck after\during 2d hardware acceleration
03:40dodomorandi: However, i killed the cat, but the "echo nop" is stuck. Damn
03:41dodomorandi: Any idea of some parameter i could try to pass to nvidia module to start X when tracing?
03:43karolherbst: dodomorandi: nope
03:43karolherbst: but I bet there is something not right related to how nouveau drivers the monitors on your gpu
03:43karolherbst: dodomorandi: try to reduce the resolution and see if the flickering disappears with a lower one
03:44karolherbst: it isn't like the gpu itself supports a high one anyway
03:44dodomorandi: I tried, and yes, it stop flickering
03:44dodomorandi: But i have to stay at a very low resolution
03:44karolherbst: what is the highest one you can use without flickering?
03:44dodomorandi: I wil check in a second
03:54dodomorandi: It works at 1440×900, flickers at 1680×1050
03:54karolherbst: mhhh okay
03:55dodomorandi: No, wait. It flickers a bit a this reso
03:55dodomorandi: Much less, but still a bit
03:57dodomorandi: At 1280×720 it seems to work flawlessly, but it is not 16:10
03:57karolherbst: dodomorandi: k
03:57karolherbst: 1440x900 is pretty high already though
03:58karolherbst: so there is just a small thing left somewhere
03:58karolherbst: who is the expert with such old gpus by the way? :D
03:58karolherbst: imirkin: I guess the pixel clock is just not fast enough
03:59dodomorandi: Mmhh... Do you think i could try to patch ans recompile the nouveau and see if it works better?
04:00karolherbst: dodomorandi: ever tried to reclock that card?
04:00karolherbst: but I have _no_clue_ if reclocking works at all
04:00dodomorandi: Nope. How can i do?
04:00karolherbst: on that gpu
04:02karolherbst: dodomorandi: boot with nouveau.pstate=1
04:03dodomorandi: I will give it a try
04:05dodomorandi: Same issue
04:05karolherbst: dodomorandi: now you have to reclock first ;)
04:06karolherbst: dodomorandi: is there a pstate file inside /sys/class/drm/card0/device
04:06dodomorandi: Yes
04:07karolherbst: can you cat it?
04:07dodomorandi: Sure. It says
04:08dodomorandi: 20: core 425 MHz shader 425 MHz memory 0 MHz
04:08dodomorandi: And another line with everything set to 0
04:08karolherbst: mhhhh
04:08karolherbst: then try to echo 20 > pstate
04:08karolherbst: you have to do that as root by the way
04:09dodomorandi: Sure
04:09dodomorandi: Mh
04:09karolherbst: but if everything is set to 0, then something is odd already
04:09dodomorandi: The second line changed
04:09karolherbst: does it change anything related to the flickering? like now you can use a higher resolution?
04:10dodomorandi: The core is now set to 29 Mhz, shader and memory still to 0
04:10dodomorandi: The flickering is still there
04:11karolherbst: yeah sadly I don't know much about those gpus
04:13dodomorandi: Hey, you are trying to help me, NVidia doesn't care... I am appreciating that! :D
04:13karolherbst: :D
04:13karolherbst: the solution will be trivial though
04:14RSpliet: dodomorandi: I think that GPU doesn't have dedicated memory, hence there's no way to increase memory bandwidth
04:15RSpliet: is that a single monitor set-up?
04:15dodomorandi: Yes
04:16karolherbst: RSpliet: I highly assume that the pixel clock is just not fast enough set or anything stupid like that
04:16karolherbst: becaue 1440x900 works
04:16karolherbst: *because
04:16RSpliet: hmm... I wonder how NVIDIA gets the required bandwidth then; it would be good if we understood the line buffer better
04:17dodomorandi: Do you have any idea about the driver crazyness when i try to trace mmios?
04:17karolherbst: RSpliet: is get_tmds_link_bandwidth the right function for htis?
04:17RSpliet: dodomorandi: mmiotrace is just horrendously slow
04:17karolherbst: but I would assume that for his gpu it should already return 165000
04:17RSpliet: karolherbst: well, that should be fine
04:18RSpliet: no the line buffer is an unknown component, I recall NVIDIA calling that thing the NISO poller
04:18pmoreau: That rings a tiny bell
04:19RSpliet: I assume that when configured correctly, it can increase bandwidth by doing bigger transfers from memory to the scanout logic
04:19karolherbst: ahh so back to traces then
04:19pmoreau: https://github.com/envytools/envytools/commit/f20d89ddbb565adb5fcdd54e27a09c27059a6d7a
04:19RSpliet: results in bigger bursts, in turn higher DRAM utilisation efficiency
04:20karolherbst: ahhh
04:20karolherbst: dodomorandi: you may want to install envytools then
04:20karolherbst: we could play aruond with that stuff and maybe we get it working somehow
04:20dodomorandi: Yup
04:20RSpliet: pmoreau: hmm, yes, that might be applicable to the other embedded GPUs too
04:20pmoreau: But those regs are only valid on MCP77 and MCP79
04:21karolherbst: pmoreau: how sure?
04:21RSpliet: pmoreau: did you test them on an NV68? :-P
04:21RSpliet: (NV63, nV67)
04:21pmoreau: Maybe the NVIDIA dev didn't tell the whole truth, but he only talked of them being valid on MCP77 and MCP79
04:21pmoreau: Let me find the email back
04:22RSpliet: oh, well, we're talking 10+yo GPUs, I wouldn't remember the details of what I designed 10 years ago :-D
04:22karolherbst: :D
04:22dodomorandi: :D
04:22karolherbst: we will know in a few seconds anyway
04:23RSpliet: pmoreau: I also believe that this is not the whole story of the line buffer
04:23karolherbst: dodomorandi: after you installed envytools as root: nvapeek 0x100c00 0x80
04:23pmoreau: http://lists.freedesktop.org/archives/nouveau/2014-November/019266.html
04:24RSpliet: there's some configurable registers in FB that get touched when changing resolutions
04:24pmoreau: and http://lists.freedesktop.org/archives/nouveau/2014-December/019319.html
04:24karolherbst: well
04:24karolherbst: he doesn*t say the don't exist on older gpus
04:24karolherbst: he just states why they exist on those mcp gpus
04:25pmoreau: Right
04:25pmoreau: It's been some time since I read those emails
04:25RSpliet: recalling stuff from a year ago is tricky, let alone 10+ years :-P
04:25pmoreau: :p
04:26dodomorandi: Does it matter the resolution when I run nvapeek?
04:26karolherbst: no
04:26dodomorandi: (Compiling)
04:27karolherbst: dodomorandi: I guess a "nvapeek 0x100c14 0x14" would be enough though
04:27karolherbst: maybe do that even while you run the blob at your highest resolution
04:27karolherbst: then we will already know what we have to poke into that
04:28RSpliet: resolution doesn't matter for those regs
04:28dodomorandi: Nvapeek 0x100c00 0x80 returned "..."
04:28pmoreau: IIRC, those regs were not present on MCP89 (I think I checked on one of those cards), but it doesn't imply they don't exist on previous cards.
04:28karolherbst: dodomorandi: try that with the nvidia driver
04:28RSpliet: dodomorando: nvascan 0x100c18
04:29karolherbst: ohh right
04:29karolherbst: nvascan :)
04:31RSpliet: you can do that with nouveau
04:31dodomorandi: Oh, ok
04:32dodomorandi: Is it normal that it did not return anything? Always "..."
04:33RSpliet: that means the registers do not exist
04:33karolherbst: well shouldn't nvascan not return ...?
04:33karolherbst: or ohh wiat
04:33karolherbst: right
04:33karolherbst: my mistake
04:33karolherbst: no, really
04:33karolherbst: it shouldn't
04:34karolherbst: or it should? :/
04:34karolherbst: meh
04:34karolherbst: I should sleep mroe
04:34karolherbst: okay, other plan then
04:36pmoreau: Maybe the regs are placed somewhere else
04:36RSpliet: tracing time
04:36pmoreau: Yep
04:36pmoreau: I can have a look at the trace tonight
04:38dodomorandi: Anything i can try to help you?
04:38karolherbst: dodomorandi: do a proper mmiotrace ;)
04:40dodomorandi: He, the problem is that nvidia drivers does not seem to want to help me... The trace i did is 1.9k, i do not think is sufficient, isn't it?
04:41karolherbst: dodomorandi: did you enable the mmiotraceer before loading the nvidia module?
04:43dodomorandi: Sending mmiotrace to current_tracker? Sure
04:43dodomorandi: To be sure, i tried to blacklist nvidia module at boot, enabling mmiotrace, loading nvidia and then xinit with a sleep 10
04:44dodomorandi: I also passed maxcpus=1 to kernel
04:44karolherbst: ahh k
04:44karolherbst: no, it is normal that the display just stays black, cause you don't start any applications by default, do you?
04:46dodomorandi: Nope :( it should start xterm if i do not pass anything, and with "sleep 10" it should just terminate after some time. But it is stuck
04:46karolherbst: ohh k
04:47dodomorandi: However, i do not have idea if the trace contain useful information or not
04:48dodomorandi: I imagine that 1.9k is not so much data for a mmio trace...
04:48karolherbst: no
04:48karolherbst: it should be more like 30MB
04:48karolherbst: dodomorandi: what distribution are you using?
04:48dodomorandi: Arch
04:49karolherbst: which login manager?
04:49karolherbst: because you could also just do systemctl start your_login_manager_here
04:49dodomorandi: Normally gdm, but i tried to use just xinit for the trace
04:49karolherbst: well then systemctl start gdm
04:50dodomorandi: I'll try
04:50karolherbst: and check the last lines of dmesg after loading nvidia
04:50karolherbst: something weird could have happened
04:51pmoreau: dodomorandi: When you blacklisted nvidia, did you boot into multi-user or graphical mode?
04:52pmoreau: And when launching X with modprobing, it can take some time. I would say ~30sec-1min on my laptop IIRC.
04:54dodomorandi: Multiuser console mode
04:54dodomorandi: I waited more than 5 minutes
04:57dodomorandi: Argh, sorry guys, but it seems that the update to the kernel i just made made a mess with nvidia... Downgrading... O.o
05:04dodomorandi: -.- kernel dump
05:06dodomorandi: Mmmhh... I will try again with only one core active
05:09dodomorandi: Ok, now the nvidia module crashes
05:11karolherbst: I bet the same issue I have
05:11pmoreau: Starting mmiotrace should automatically pause other cores
05:11dodomorandi: Unfortunately, i have to go. Guys, thank you so much! I will come back in a few days, maybe when the nvidia makes its 304xx driver compatibile with linux 4.3... Maybe i will be more lucky...
05:11karolherbst: dodomorandi: issue like that? https://gist.github.com/karolherbst/f69e2a7b9c372e049525
05:12karolherbst: "mmiotrace: unexpected secondary hit for"
05:13dodomorandi: The backtrace is a bit different
05:13karolherbst: but you got the "mmiotrace: unexpected secondary hit for" message?
05:13pmoreau: I would have expected 304xx to be compatible with 4.3, 340xx is at least.
05:14dodomorandi: Yes, there is a unexpected secondary hit
05:15dodomorandi: However, i am having a mtrr related issue with the new kernel while loading the module
05:15dodomorandi: Now i dont remember what it said, in detail
05:17dodomorandi: Ah, if it can be useful, after the mmiotracer error, there is a unable to handle kernel paging request
05:17karolherbst: yeah, it is the same issue I also get
05:17karolherbst: no idea why though
05:19dodomorandi: Now i have to run :( i will keep in touch, trying to run a trace on this damn proprietary bloabware
05:19dodomorandi: Thank you again, guys :)
05:23pmoreau: Probably the same issue as what I experienced last time I tried to trace the blob. :-/
05:41karolherbst: pmoreau: I think the issue is that there is just a paging request while handling those mmio stuff
06:12tacchinotacchi: is there a reason the code has so few comments? asd at least the last time i've seen it
06:14RSpliet: taccinotacchi: I think developers will agree that it's always difficult to understand the level of documentation required for their code. Function and variable names should be self-explanatory, but generally the writer of those has a better understanding of the terminology than the novice reader
06:15RSpliet: luckily, we're a very approachable bunch (albeit slightly grumpy, it's the dark months of the year in the northern hemisphere)
06:15tacchinotacchi: well, there was a function yesterday i have no idea what it is
06:15tacchinotacchi: trying to find
06:16tacchinotacchi: what do you mean approachable?
06:16RSpliet: always in for pointing you in the right direction if documentation does exist, or to answer questions if it doesn't
06:16tacchinotacchi: ram_wr32
06:16tacchinotacchi: here it is
06:17tacchinotacchi: i think it's got something to do with writing 32 bits to something, but to what?
06:17RSpliet: to the "ram" subsystem. Most of what you see will use nvkm_wr32, which is mapped to a 32-bit write to the MMIO area
06:18RSpliet: ram_wr32 instead adds a write operation to a script, and this script gets uploaded later for execution by a dedicated "falcon" core
06:18tacchinotacchi: i sense your trying to help but i'm helpless
06:19tacchinotacchi: is this falcon core in every architecture?
06:19RSpliet: this falcon core is part of modern NVIDIA GPUs (from GT21x on)
06:20tacchinotacchi: so it is the core that manages clocks
06:20tacchinotacchi: through scripts sent by the driver
06:20RSpliet: takes care of performance-related tasks, so in our case monitoring performance counters, executing the scripts I mentioned earlier used for changing memory clocks, temperature monitoring
06:21RSpliet: that it, I think we refer to this core as the "PMU"
06:21tacchinotacchi: yesterday they sent me this http://envytools.readthedocs.org/en/latest/nvrm/pmu/index.html
06:21RSpliet: (there's other falcon cores available too)
06:21tacchinotacchi: i wonder how did whoever wrote this figure out in the first place
06:22tacchinotacchi: like how did he figure out the bits sent to the card were opcode
06:22RSpliet: what you're looking at is the script language that NVIDIA uses
06:22RSpliet: (not the same as implemented in nouveau_
06:22tacchinotacchi: what? nvidia actually shared this info?
06:23RSpliet: no
06:23RSpliet: we started by identifying the register/data pairs that were obvious
06:23RSpliet: then working our way up to identify opcodes (luckily everything is 32-bit aligned, so pretty obvious from trace)
06:24RSpliet: followed by a bit of binary disassembly of the firmware implementing this scripting language to understand a lot of the missing opcodes
06:26tacchinotacchi: can you give me an example of "obvious" opcode?
06:28RSpliet: the format is simple, one 32-bit word containing the opcode and the number of parameters, followed by each parameter as 32-bit words. We *know* such a script should write eg. the MR value in 0x1002c0, so identifying a data-value pair for that is simple
06:29pmoreau: IIRC, Nouveau didn't started from scratch, but from the (now dead) open-source X driver that NVIDIA had some time ago. Though it only supported 2d accel and no card newer than Fermi?
06:29RSpliet: reverse engineering - like debugging - is a skill that either takes a lot of time to develop or a brain the size of a planet. Don't feel bad for not getting in two days what took years and years of effort
06:30RSpliet: pmoreau: I think it was even more limited, but even understanding and deobfuscating that code took forever ;-)
06:30tacchinotacchi: i feel like i'm producing inconvienience
06:31tacchinotacchi: what's MR? lel
06:31pmoreau: RSpliet: Right, but that gives a (tiny) starting point
06:31tacchinotacchi: i'd rather have some written documentation instead of wasting your time
06:31RSpliet: tacchinotacchi: MR is documented in the RAM specifications and in envytools
06:32RSpliet: RAM docs are a better source of information, as replicating documentation would've been a waste of time
06:32karolherbst: tacchinotacchi: personally I think most of the nouveau code is quite well understandable, except the memory code :D
06:32tacchinotacchi: i have envytools docs here
06:32RSpliet: karolherbst: that's mostly because... nobody really understands what it does, apart from mimicking the blob :-P
06:32tacchinotacchi: can i find RAM docs in th enouveau homepage?
06:32karolherbst: mhh never looked into them except for the falcon stuff
06:33tacchinotacchi: it seems like nobody ever "looked at themn"
06:33RSpliet: tacchinotacchi: no, but google some DDR3 spec sheets for instance
06:33tacchinotacchi: aw so RAM your talking about is random access memory
06:33tacchinotacchi: i thought it was some nvidia specific stuff
06:33tacchinotacchi: sry
06:34tacchinotacchi: anyway all people i ever heard from here said "i never looked at reclocking code"
06:34karolherbst: you know what, I will make a trace of my fermi card while reclocking :)
06:35tacchinotacchi: i made some traces with overclocking on my fermi laptop
06:35tacchinotacchi: the only thing i could do with it was send to you though
06:36RSpliet: tacchinotacchi: I wrote most of the RAM reclocking code for pre-kepler, so can't claim that
06:37RSpliet: and that stuff is ill-documented, because a lot of the docs really don't exist. It's just mimicking the behaviour of the blob as well as possible
06:38tacchinotacchi: and does it work?
06:38RSpliet: for various definitions of work
06:38karolherbst: ....
06:38RSpliet: I have a pile of GPUs in the range of G94-GT218 that can succesfully change their clock speed using this yes
06:39karolherbst: I really don*t get this stupid stuff :/ like one or two months ago I had no problem creating mmiotraces
06:39karolherbst: but now...
06:39RSpliet: The kepler code has recently had an important fix from karolherbst, so most of those cards work now
06:39RSpliet: only Fermi hasn't had a lot of love yet
06:39RSpliet: (love takes time, unfortunately)
06:39karolherbst: RSpliet: do we want to rush this and have it ready in two weeks? :D
06:40RSpliet: karolherbst: I don't have time to rush it and finish in two weeks
06:40karolherbst: mhhh well I think I do
06:40karolherbst: may be the last month I get that much time
06:40RSpliet: well, you know where to find my repository, have fun :-P
06:42tacchinotacchi: karolherbst: talking about fermi?
06:42tacchinotacchi: ew can't find any docs explaining MR value
06:42tacchinotacchi: can't find any docs on ddr3 actually
06:44RSpliet: https://www.google.co.uk/search?q=DDR3+device+operation+timing first hit
06:44RSpliet: and probably second hit as well
06:45tacchinotacchi: my best guess in the search box was "ddr3 specs sheet"
06:45tacchinotacchi: might be also that i use duckduckgo instead of google
06:47RSpliet: anyway, there's the documentation for nvkm/subdev/fb/sddr3.c , there's similar specifications for the other RAM types
06:50karolherbst: RSpliet: the memory stuff will be in the pmu script or is there something else?
06:50karolherbst: and the PBFB stuff I assume
06:52tacchinotacchi: am i supposed to know how electricity works?
06:53tacchinotacchi: because i don't
06:57RSpliet: karolherbst: "memory stuff" is a bit of a fuzzy term, could you clarify?
06:57karolherbst: well memory relcocking regs
06:58karolherbst: I thought it is inside the pmu scripts + those *FB areas
06:58RSpliet: yes, a memory reclock is fully contained within a PMU script
06:58karolherbst: ahh k
06:58RSpliet: so clock, timing, MR and all the other unknown FB registers
06:58karolherbst: okay
06:58karolherbst: so if I have the script I have everything I need?
06:58karolherbst: well "need"
06:58RSpliet: pretty much, NVIDIA does a few writes just outside the script
06:59RSpliet: we merged them in
06:59karolherbst: but I could take the script, execute it on my gpu and it should just work
06:59RSpliet: almost, for that reason
06:59RSpliet: but that script only works if the NVIDIA firmware is loaded
07:00karolherbst: yeah I know
07:00karolherbst: I just meant the reg writeing parts of it
07:01karolherbst: but the script seems to be smaller than the one I saw on my kepler
07:01RSpliet: the scripts are generated "on the fly", so registers that have already been configured correctly are untouched
07:02karolherbst: ohh okay
07:02tacchinotacchi: now i can't even understand c
07:02karolherbst: RSpliet: what are those reg_last and val_last thingies?
07:02karolherbst: tacchinotacchi: time to learn it then :p
07:02karolherbst: no programming language is hard to learn anyway
07:02tacchinotacchi: static const struct ramxlat ramddr3_wr[] = {...
07:02RSpliet: karolherbst: consider them registers. It keeps track of the last touched reg/value for masking purposes etc.
07:02tacchinotacchi: what's with ramxlat
07:03karolherbst: tacchinotacchi: name of the type
07:03RSpliet: tacchinotacchi: get a decent IDE that indexes those symbols, you'll click right through to it's definition
07:03karolherbst: :D
07:03RSpliet: or do as I do and use a non-decent IDE called eclipse
07:03karolherbst: :O
07:03tacchinotacchi: this is strange
07:03karolherbst: qt-creator is quite nice though
07:04tacchinotacchi: oh i'm so stupid i was reading it as c++
07:04RSpliet: anyway, too much distraction for me right now, bbl
07:04tacchinotacchi: i thought it was the definition of the type, not of an actual varaible
07:04tacchinotacchi: bb
07:05tacchinotacchi: thanks for the hussle
07:05tacchinotacchi: hassle*
07:10karolherbst: hehe, the sqe script is missing stuff :/
07:20tacchinotacchi: qt creator literally did the same reading mistake i did
07:20karolherbst: somebody broke the SEQ parser :/
07:20tacchinotacchi: the guy who wrote this gave the same name to struct ramxlat and function int ramxlat
07:23tacchinotacchi: apparently it was RSpliet :D
07:24karolherbst: yes, he was :D
07:24karolherbst: RSpliet: this broke the SEQ parser ...
07:25karolherbst: RSpliet: https://gist.github.com/karolherbst/2ca68aae24dc381c9986
07:30tacchinotacchi: is NV_MEM_TYPE_STOLEN shared plain RAM?
07:31karolherbst: I would assume this
07:34tacchinotacchi: quite surprised this syntax for array initialization is accepted
07:34tacchinotacchi: static const char *name[] = {
07:35tacchinotacchi: [NV_MEM_TYPE_UNKNOWN] = "unknown",
07:35karolherbst: why shouldn't it?
07:35karolherbst: wtf is wrong with printf...
07:36tacchinotacchi: i knew *name[] = {"foo", "tie"};
07:36tacchinotacchi: with foo being index 0 and tie 1
07:36tacchinotacchi: i didn't know you could specify indexes like that
07:45karolherbst: .... nooooo
07:46karolherbst: I guess nobody checks printf return codes right?
07:48tacchinotacchi: i would check the specification for printf
07:50tacchinotacchi: i wonder what kind of error it could give
07:50tacchinotacchi: that said, i don't think anyone would ever check for it, but you should ask someone else
07:50karolherbst: ohh it returns the string lenght :/
07:51tacchinotacchi: lel
07:51Tom^: karolherbst: yes.
07:51tacchinotacchi: negative numbers for errors
07:51karolherbst: Tom^: you need some commits for that
07:51Tom^: isnt my branch still around? apply it there :P
07:52karolherbst: Tom^: nah, it doesn't make sense anymore to have it around :p
07:52Tom^: ;_;
07:52karolherbst: Tom^: are you on 4.4?
07:52Tom^: nah
07:53karolherbst: Tom^: fetch and cherry-pick those commits: c62c59910255b8ffaaa7c8945c6156be3af145c6, d9f1c7ec2e46c9059c8376e0c3155d998ec0185f and effcdb2a1ab980bff62139ccd473320eeb85f023
07:53Tom^: ok
07:54karolherbst: now I think not the seq parser is broken, but printf...
07:54Tom^: i guess volt isnt fixed yet
07:58karolherbst: you still need your branch
07:58karolherbst: just apply those commits on top of it
07:58tacchinotacchi: m sorry
07:59tacchinotacchi: is there a difference between ddr3 and gddr3?
07:59karolherbst: yes
08:00tacchinotacchi: 'bout the initialization syntax i was surprised by before
08:00tacchinotacchi: guides don't seem to know about it
08:00tacchinotacchi: neither does the IDE
08:13tacchinotacchi: bb
08:17RSpliet: karolherbst, tacchinotacchi: nope, the ramxlat name comes from one of the other developers. Some wise words of wisdom though:
08:17RSpliet: 1) don't blame people personally prematurely, it might piss them off
08:17RSpliet: 2) don't blame people personally for issues other than maybe regressions, there might be a very good reason for what is written (which includes legacy and new insights)
08:17RSpliet: 3) don't assume that things are a big issue just because your IDE doesn't like it.
08:17RSpliet: 4) if you don't like code-style, be proactive and propose patches. Looking backwards needlessly diminishes peoples morales
08:18karolherbst: RSpliet: sorry
08:20karolherbst: RSpliet: I just did this: https://gist.github.com/karolherbst/1dc30613bcfb8834bad2 and the message doesn't get printed and bunch of others too
08:20RSpliet: It's all right, just trying to maintain a healthy relationship between people in #nouveau. CompSci's are people too ;-)
08:20karolherbst: any ideas though?
08:20karolherbst: yeah I know
08:20karolherbst: I also know that I am sometimes a bit too fast with my conclusions (and with sometimes I really mean like 95% of all cases)
08:21karolherbst: still, when I let a "" write into seq_out, it works
08:21karolherbst: so it's not your fault, but since your commit not all regs are printed ;)
08:21karolherbst: I bet there is some stupid pointer stuff going on or something ugly
08:21RSpliet: which commit broke things?
08:22karolherbst: https://github.com/karolherbst/envytools/commit/ddcd223f74b58418dc2d4ac3eeac906b29d0633b
08:22karolherbst: I have no clue why though
08:23karolherbst: I think I will bisect it though
08:23RSpliet: can you double-check with demmio -c, to strip out the colours
08:23RSpliet: less sometimes does silly things when escape chars are parsed
08:23karolherbst: yeah I already did
08:23karolherbst: the colors aren't at fault here
08:23karolherbst: and I don't use less
08:23karolherbst: I just pipe it out
08:23RSpliet: ok fair enough
08:24karolherbst: printf returns the right length though
08:24karolherbst: mhhh
08:24karolherbst: maybe I shall pollute the line
08:24RSpliet: I'm a bit busy right now, I hope to have some time later tonight to go fishing
08:24karolherbst: k
08:24karolherbst: I try to take care of the issue
08:25RSpliet: btw, the https://gist.github.com/karolherbst/2ca68aae24dc381c9986 has pound-symbols behind every line, those were introduced in the patch. Instead of old, it seems like you just removed the last parameter?
08:26karolherbst: yeah
08:26karolherbst: I just remvoed the name thing and add ""
08:26RSpliet: either way, it could fail if rnndb doesn't have a name for the register
08:26karolherbst: but why?
08:26karolherbst: I already checked gdb
08:26karolherbst: the string contains the stuff in my patch
08:26karolherbst: and the printf also doesn't print anything
08:26RSpliet: I believe I tested it to verify it returns an empty string, so doesn't make much sense to me
08:26RSpliet: printf is routed to a different channel I guess
08:27karolherbst: seq_out calls printf
08:28RSpliet: I guessed wrong :-D
08:28karolherbst: this just sound like a stupid glibc bug
08:28karolherbst: or another stupid bug
08:28karolherbst: something stupid anyway
08:28RSpliet: route your prints to stderr for debugging purposes
08:28karolherbst: good idea
08:28RSpliet: they might not get interleaved nicely, but it's a start
09:08karolherbst: Tom^: did you already test that stuff? :p
09:08Tom^: nope :p
09:08Tom^: sauna is hot in uh 15 min, then im afk for ~40min. maybe after that
09:09karolherbst: :D
09:59karolherbst: RSpliet: what should info->mspll be?
10:00karolherbst: gf100_clk_info.mspll
10:00karolherbst: because I get some compile errors with your changes
10:15Tom^: karolherbst: so i had some deep thinking done in the sauna, should i remove the nvaboost option im setting , will these patches actually "boost" or just jump between pstates.
10:15Tom^: karolherbst: and if its boosting will it drop just like blob around 80C ?
10:18Tom^: karolherbst: and if i get 4.4 will these patches apply on your 4.4 branch. :p
10:23karolherbst: Tom^: nvboost just decides which max clock can be used
10:26Tom^: right so these patches will simply jump between the pstates then depending on load.
10:27karolherbst: and cstates, yes
10:28Tom^: so yea, these patches will apply on your 4.4 branch?
10:28Tom^: master_4.4 that is
10:29Tom^: oh nvm, that one dosent have all the goodies from stable_reclock_kepler yet it seems.
10:30karolherbst: right
10:31Tom^: karolherbst: d9f1c7ec2e46c9059c8376e0c3155d998ec0185f didnt apply so yea :P
10:32karolherbst: right
10:32karolherbst: then something is missing
10:32Tom^: or im not doing this right
10:32karolherbst: pick 6302fc929e62aef4f997db153231df69c1b04c1a before
10:33karolherbst: which is also kind of important for that to work :D
10:37karolherbst: Tom^: does it work now=
10:39Tom^: nope
10:40Tom^: https://github.com/gulafaran/nouveau/tree/tom my branch is back ! =D but yea. they all report some conflict on it
10:40karolherbst: :D
10:41karolherbst: meh, then wait
10:42karolherbst: mhhh ohhh
10:42karolherbst: wait
10:42karolherbst: are you still on 4.3?
10:42Tom^: well yes, didnt you ask this like an hour ago =D
10:42Tom^: i can get 4.4 too if that makes things easier
10:42karolherbst: mhh because your tom branch seems odd :/
10:43Tom^: my tom branch is from your stable_reclock_kepler
10:43Tom^: so dont blame me
10:43karolherbst: ohhh
10:44karolherbst: then you have a 4.4 based branch now :p
10:44Tom^: im fetching mainline then, :P
10:45karolherbst: isn't there a 4.4 package already?
10:45karolherbst: because making it 4.3 compatible is piece of cake
10:45Tom^: not in arch
10:45Tom^: il just compile it anyways
10:45Tom^: takes at most 15 minutes
10:47karolherbst: because I am done now :p
10:47karolherbst: and in 10 seconds it would be also 4.3 compatible
10:48Tom^: mk :p
10:49tacchinotacchi: makepkg -s :v
10:49tacchinotacchi: i think 25 minutes for me
10:49karolherbst: tacchinotacchi: done
10:49karolherbst: ..
10:49karolherbst: Tom^: done
10:49karolherbst: :D
10:49tacchinotacchi: lel
10:49tacchinotacchi: i'm thinking i should change distro
10:50Tom^: get arch
10:50karolherbst: well I need like 5 minutes to compile my kernel
10:50tacchinotacchi: i have arch
10:50Tom^: karolherbst: fine il refork you then :P
10:50Tom^: because learning rebasing and what not is not for today.
10:50tacchinotacchi: that's why it takes me 25 mins to compile kernel
10:50karolherbst: do you actually configure the kernel on arch?
10:50tacchinotacchi: no
10:50Tom^: uhm why would it go faster on any other distro :/
10:50Tom^: i do
10:50karolherbst: but my kernel is pretty small
10:50tacchinotacchi: i use default set with ck
10:51tacchinotacchi: do you actually go through disabling ALL modules everytime?
10:51karolherbst: ?
10:51karolherbst: no?
10:51karolherbst: you can just take your old config?
10:52karolherbst: but my kernel is like 7.5MB lz4 compresses big
10:52tacchinotacchi: ok then i just don't feel like configuring the kernel
10:52tacchinotacchi: :D
10:52karolherbst: but I also have most of the stuff builtin
10:52karolherbst: and nearly no modules
10:52tacchinotacchi: well mine is 4,1
10:53tacchinotacchi: do you mean just vmlinuz?
10:53karolherbst: tacchinotacchi: I guess everything is a module?
10:53Tom^: karolherbst: did you make it 4.3 or is it 4.4?
10:53karolherbst: Tom^: your tom branch is 4.3
10:53Tom^: im aborting this kernel compilation then ;)
10:53tacchinotacchi: yes probably all drivers are modules
10:53karolherbst: tacchinotacchi: well vmlinux is 21MB here, but I disabled Os
10:53karolherbst: and practiacally everything is built in and no module
10:54tacchinotacchi: disabled Os?
10:54karolherbst: well you can optimize for size
10:54karolherbst: but who wants that actually
10:54karolherbst: my entire modules directy uses like 16 MB
10:54Tom^: bootup speeds.
10:54imirkin: karolherbst: people with cpu's that don't have infinite caches
10:54Tom^: =D
10:54tacchinotacchi: pff boot speed
10:54karolherbst: imirkin: :O
10:55karolherbst: imirkin: but then you also don't compile your own kernel
10:55tacchinotacchi: how much time does it take even a slow hdd to load a kernel image
10:55imirkin: karolherbst: huh?
10:55karolherbst: tacchinotacchi: <5 seconds
10:55tacchinotacchi: Plasma instead is a killer, takes 40 seconds at least from my boot
10:55Tom^: "Startup finished in 8.586s (firmware) + 988ms (loader) + 1.551s (kernel) + 1.725s (userspace) = 12.852s"
10:55karolherbst: :D
10:55tacchinotacchi: the kernel image is not even something fragmented
10:56tacchinotacchi: it's continous
10:56tacchinotacchi: reading is super-fast
10:56karolherbst: crappy HDD: Startup finished in 3.227s (firmware) + 3.530s (loader) + 8.368s (kernel) + 31.895s (userspace) = 47.021s
10:56karolherbst: but something is weird with my loader, seriously
10:56tacchinotacchi: Startup finished in 4.184s (firmware) + 5.704s (loader) + 4.527s (kernel) + 22.611s (userspace) = 37.027s
10:56tacchinotacchi: this one didn't take plasma boot in consideration though
10:56karolherbst: yeah and I have mysql and samba starting
10:57karolherbst: :D
10:57Tom^: boys, ssd.
10:57Tom^: ;)
10:57karolherbst: nah, ssds are dead for me
10:57karolherbst: I broke three already
10:58tacchinotacchi: fast hdd
10:58karolherbst: and I have like 5 working HDDs wich each of them is older than my 3 broken ssds together
10:58Tom^: O_o i still got my corsair f60 from like 5 years ago. according to smartdata ive written around 10TB to it.
10:58karolherbst: 10TB is like nothing :D
10:58tacchinotacchi: programmer = many writes uh?
10:58MichaelLong: karolherbst, you are "using it wrong" :P
10:58karolherbst: one of my hdd has like 11025 hours power on :D
10:59tacchinotacchi: doesn't sound like much to me though
11:01karolherbst: well 11000 hours is already a big deal for "non-professional" ones though
11:02karolherbst: though I bet 50k would be impressive
11:03Tom^: Power_On_Hours -O--CK 040 040 000 - 52985
11:03Tom^: 320gb wd black.
11:03Tom^: ;)
11:04karolherbst: :D
11:04karolherbst: yeah those blacks
11:04karolherbst: awesome
11:04karolherbst: I think I only have a blue
11:04Tom^: got two of them with nearly identical poweron time. never had an issue with them
11:04karolherbst: yeah WD is awesome
11:05karlmag: I've only bought NAS ones lately (except ssds, that is)
11:05karolherbst: Tom^: ohh no, mine is an AV-25 one :/
11:06karolherbst: the 1TB one that is, my 500GB is a blue and my 320GB one is a seagate, which is crappy because it is a slim one
11:07tacchinotacchi: i need help on this even though it's the introduction
11:07tacchinotacchi: http://envytools.readthedocs.org/en/latest/conventions.html
11:07tacchinotacchi: (on the ram specs completely clueless)
11:08imirkin: hm, i'm only at ~49k hours
11:08Tom^: karolherbst: you broke pstatectl
11:08tacchinotacchi: "2-input operation 0x4 (0b0100) is ~v1 & v2"
11:08imirkin: a bunch of 2T drives
11:09Tom^: karolherbst: its in debugfs which is only readable by root, and libnotify cant show as root because of missing user dbus :P
11:09imirkin: tacchinotacchi: what's the question?
11:09tacchinotacchi: how exactly did it get "~v1 & v2" from 0x4
11:09tacchinotacchi: and how can 0x4 be an operation
11:09karlmag:wonders how smart it would be to use a ssd for compile disk...
11:10tacchinotacchi: karlmag: lel
11:10orbea: ysy, I compiled a 4.4.0 kernel and now I can clock up to 0d :)
11:10orbea: *yay
11:10imirkin: tacchinotacchi: experimentally
11:10imirkin: tacchinotacchi: there are ops aka "logic op", which take a number, which indicates which op they are. this is the table of those ops.
11:11Tom^: karolherbst: yea its working otherwise, besides this annoying flicker that sort of has to be fixed. dynamic reclocking sort of makes it flicker quite often ;)
11:11mwk: tacchinotacchi: 0x4 is 0b0100 in binary
11:12mwk: and you're thinking of a 2-input binary function
11:12mwk: there are 4 2-bit combinations: 00, 01, 10, 11
11:12mwk: so, there are 4 possible inputs to such a function
11:12Tom^: karolherbst: im not going above 1072 core clock tho
11:12mwk: the functions are simply numbered by encoding outputs for all possible inputs
11:13mwk: bit 0 of function number is output for 0b00 input, bit 1 of function is output for 0b01, bit 2 is 0b10, bit 3 is 0b11
11:13mwk: now, consider 0x4 == 0b0100
11:13mwk: bit 0 is 0, so function(0, 0) == 0
11:13mwk: bit 1 is 0, so function(1, 0) == 0
11:14mwk: bit 2 is 0, so function(0, 1) == 1
11:14mwk: er, bit 2 is 1
11:14mwk: bit 3 is 0, so function(1, 1) == 0
11:14karolherbst: Tom^: you won't need it anymore anyway :p
11:14Tom^: karolherbst: nor do i seem to drop down from 0d
11:14tacchinotacchi: no i don't get the part of 0b0100 in this :\
11:14tacchinotacchi: two inputs
11:14imirkin: tacchinotacchi: 0bXXXX is a way to denote binary
11:15tacchinotacchi: you change all bits to v1, than do an AND with v2
11:15karolherbst: Tom^: k
11:15tacchinotacchi: why is 0x4 any important in this other than being the opcode of the operation
11:15karolherbst: Tom^: dmesg
11:15imirkin: tacchinotacchi: it defines the op
11:15mwk: tacchinotacchi: because it's also the truth table of the operation
11:15imirkin: tacchinotacchi: 0x4 is effective the truth table of the boolean op
11:15mwk: you know what a truth table is, right?
11:16imirkin: (if mwk and i said it at the same time, it _must_ be right ;) )
11:16karolherbst: Tom^: and I think I forgot something
11:16tacchinotacchi: yes you're right
11:16Tom^: karolherbst: ignore the temp, i did nvaforcetemp to see if its the gpu fan vibrating. https://gist.github.com/anonymous/81304abd5ab931ae27b2
11:16karolherbst: ahh no, shoult be fine
11:16tacchinotacchi: did the programmers just make that up or the gpu knows what operation to do based on the truth table?
11:17mwk: tacchinotacchi: the GPU just uses the operation code as a truth table
11:17karolherbst: Tom^: boot with nouveau.debug=clk=debug
11:17Tom^: mk
11:17tacchinotacchi: so it doesn't care of what operations brought you to that truth table
11:17mwk: I've listed the 2-input operations for an example, but GPU also uses 3-input and 4-input functions
11:18mwk: yes
11:18mwk: if two operations have the same truth table, they're the same operation
11:18mwk: like ~(a | b) and ~a & ~b
11:18mwk: same thing
11:19Tom^: karolherbst: https://gist.github.com/anonymous/199456170ca617d9c2c3
11:20mwk: tacchinotacchi: basically, the GPU computes the operation exactly as I specified it in the big pseudocode
11:20mwk: think of the operation code as a truth table
11:20mwk: 0x4 == 0b0100
11:20mwk: better yet, lets' make it an array of bits, starting from the right: [0, 0, 1, 0]
11:21karolherbst: Tom^: well if you don't have any load it should be at the lowest clock
11:21mwk: it constructs an index into this array from the input operands, and just picks the cell corresponding to the combination
11:21karolherbst: Tom^: cat current_load in the debugfs directory
11:21Tom^: karolherbst: mk il try again then
11:21mwk: eg. inputs 0 and 1 are encoded as 0b10 (again from the right), which is 2... and entry 2 of table [0, 0, 1, 0] is 1
11:22mwk: so, the operation code is really a LUT
11:22karolherbst: Tom^: I mean I could have done something odd though
11:22Tom^: karolherbst: uhm how do i parse current load?
11:22tacchinotacchi: no i really don't get it this way
11:23Tom^: karolherbst: https://gist.github.com/anonymous/05e33c22c4d3359c16fa
11:23mwk: which, when performed in hardware, is a simply a multiplexer... fast
11:23tacchinotacchi: how does the gpu know which one to alternate (which input to do 1100 for, which 1010)
11:24mwk: what do you mean by "do 1100"?
11:24karolherbst: Tom^: 1/255
11:24tacchinotacchi: you have column for input one
11:24tacchinotacchi: the rows are, from top to bottom, 1100
11:24tacchinotacchi: for input two 1010
11:25karolherbst: Tom^: k, then nvapeek 0x10a500 0x80 on the blob
11:25karolherbst: acutally I never saw the values from the blob for those regs
11:25tacchinotacchi: and then you make the operation and see the four values as a result
11:25mwk: tacchinotacchi: it always goes 010101010101010101.... for the first
11:25mwk: 0011001100110011 second
11:25mwk: 000011110000111100001111... third
11:25mwk: and so on
11:25tacchinotacchi: thanks
11:25Tom^: karolherbst: core 0 , mem 94 but im still at 0d on 548mhz and 6999 mem
11:26karolherbst: yeah cause of the mem load :/
11:26karolherbst: I need to know those counter configuration from the blob driver
11:26karolherbst: memory is like real hard to get right
11:29Tom^: karolherbst: https://gist.github.com/anonymous/92b6c8cc8d25d7535885
11:30orbea: and im still getting video hiccups :\
11:31mwk: tacchinotacchi: btw, this means that eg. 0xaaaa is a 4-input function that just returns the first input
11:31mwk: 0xcccc is return second input, 0xf0f0 is third, 0xff00 fourth
11:31karolherbst: Tom^: those memory stuff, all right
11:32mwk: and a useful way of obtaining the function codes is such: suppose you want the code for v1 & v2
11:32mwk: then you just compute 0xaaaa & 0xcccc == 0x8888, the code for 4-input AND
11:32karolherbst: Tom^: update tom branch and recompile
11:33mwk: likewise, the code for a XOR (v1 ^ v2) is 0xaaaa ^ 0xcccc == 0x6666
11:33tacchinotacchi: why a and c
11:33tacchinotacchi: look like hexadecimal to me
11:33tacchinotacchi: but it wouldn't make any sense
11:33mwk: a proper 4-input AND (v1 ^ v2 ^ v3 ^ v4) is: 0xaaaa & 0xcccc & 0xf0f0 & 0xff00 == 0x8000
11:34mwk: 0xa is 0b1010, ie. alternating bits
11:34mwk: 0xc is 0b1100, alternating every second bit
11:34mwk: again, think of a truth table
11:34tacchinotacchi: facepalm
11:34tacchinotacchi: you are very patient
11:35mwk: hmm
11:36mwk: I'd really love an example right now...
11:36tacchinotacchi: why always 4 hex digits ? should't they be dependent on the number of inputs?
11:36tacchinotacchi: like for 2 input = 4 possibilities = 1 hex digit
11:37mwk: that's correct
11:37mwk: 2-input functions have 1 hex digit, 3-input have 2 hex digits, 4-input have 4
11:38mwk: and I haven't seen any bigger size so far
11:38Tom^: karolherbst: ugh i need a new fan for my gpu
11:39tacchinotacchi: intuition suggest since 16 is power of two, it should always be possible with an integer number of hex digits
11:39tacchinotacchi: but i'm not sure
11:40mwk: tacchinotacchi: here's a 4-input example: https://gist.github.com/koriakin/d221225bab17e7f8fa03
11:40mwk: so, there's a truth table
11:40mwk: the first column of the truth table (input 0) is always 0, 1, 0, 1, 0, 1
11:40mwk: second is always 001100110011...
11:40mwk: third is 0000111100001111 and so on
11:41tacchinotacchi: thanks again
11:41mwk: to get the operation code for whatever operation you want, you need to write such a truth table
11:41mwk: then read the column of your operation
11:41mwk: and write it down from the right
11:41tacchinotacchi: well i'd guess it's not like that for every opcode
11:42mwk: in the example, v0 & v1 & v2 & v3 is 0, 0, 0, 0, ...., 0, 0, 1
11:42tacchinotacchi: what about add when you have carry?
11:42tacchinotacchi: maybe boolean ones
11:42mwk: which is 0b1000000000000000 == 0x8000
11:42mwk: add? add is not a bitwise operation, you can't encode it like that
11:43tacchinotacchi: yes that's what i was saying
11:43tacchinotacchi: but needed someone to be sure
11:44mwk: I'm not sure which use of the bitwise ops you have in mind
11:44mwk: the encoding is used in at least 4 places that I know of
11:44mwk: are you working on the perf counters, raster ops, one of the ISAs with bitwise op operation?
11:45tacchinotacchi: not working on anything, i'm reading the documentation to learn
11:45mwk: then you started in a scary place :p
11:45tacchinotacchi: but as you can see when it comes to low level i'm already lost in the introduction
11:46mwk: that page just contains a few common parts I've extracted from perf counter / vp1 ISA / raster output docs
11:46mwk: I had nowhere to stuff that
11:46tacchinotacchi: is it the same for bitwise operations when you do regular assembly for the cpu
11:46tacchinotacchi: ?
11:46mwk: depends on the CPU
11:47mwk: some of them do use a similiar encoding
11:47karolherbst: Tom^: why do you need a new fan?
11:47tacchinotacchi: all x86 should do the same with same opcodes
11:47tacchinotacchi: same with all arms, with amd64 ecc.
11:47mwk: but mostly they conserve opcode space by only exposing a handful of these operations
11:48mwk: and CPUs usually don't have 3-input bitwise ops, too rare to be useful
11:48imirkin: or any 3-input ops for that matter
11:48mwk: fused multiply-add is 3-input...
11:48RSpliet: karolherbst: Microsoft PLL(c)
11:48imirkin: how many CPUs have that?
11:48mwk: compare-and-swap too
11:48RSpliet: or seriously, "memory source PLL"
11:49mwk: imirkin: modern x86 does
11:49RSpliet: aga, the one that goes into the mpll
11:49RSpliet: *aka
11:49tacchinotacchi: when the gpu does? is it atomic?
11:49imirkin: mwk: right :) so... not a lot
11:49karolherbst: RSpliet: k, I just got a lot of compile errors I have to dig through, I bet most of them are caused by a crappy 4.4 porting or something
11:49mwk: touche
11:49imirkin: or rather, it's a very recent development
11:49tacchinotacchi: i mean, gpus have 3 input ops in their lookup tables or they simulate that with multiple operations?
11:50RSpliet: karolhebrst: no, I just never did a compile-check
11:50karolherbst: :D
11:50karolherbst: k
11:50imirkin: tacchinotacchi: gpu's have 3-input ops... have for a while
11:50RSpliet: like formatting for documents, that's usually one of the last things I do
11:50mwk: tacchinotacchi: depends on the place in the GPU
11:50karolherbst: RSpliet: I bet you won't find much time to clean it up? At least not the next days?
11:50mwk: generally, none of the on-gpu processors have 3-input bitwise ops... they tend to be too useless
11:50tacchinotacchi: well i hope the driver guys don't have to deal with that
11:51tacchinotacchi: i mean knowing wether a gpu has 3 input bitwise op or simulate
11:51mwk: the only use of 3-input ops that I know of is in the 2d raster output, which is ancient stuff
11:51RSpliet: karolherbst: maybe I'll see if I can do a pass tonight if you like
11:51karolherbst: would be awesome, then I have somethig todo for the next weeks
11:51tacchinotacchi: i'll leave you in peace for a while
11:51tacchinotacchi: bb
11:52karolherbst: Tom^: so any changes yet?
11:52RSpliet: karolherbst: in particular, if you could make a start on DDR3 reclocking for Fermi that'd be swell :-P
11:52karolherbst: I have a ddr3 fermi card here, yes :p
11:53tacchinotacchi: oh i happen to have one
11:53tacchinotacchi: lel
11:53RSpliet: karolherbst: the memory script I have is derived from a single GDDR5 card, so I think you can do some serious work there
11:54tacchinotacchi: if you wait a few years and my laptop doesn't die you might see an imp of mine of the core reclocking lol
11:54karolherbst: RSpliet: k, I would like to have that one and check against mine DDR3 scriprt
11:54karolherbst: then
11:54karolherbst: RSpliet: by the way, have one upclock and one downclock one?
11:55RSpliet: script? no, one script for every action
11:55karolherbst: k
11:55RSpliet: see GT215 for how that is done the right way
11:55karolherbst: I am already thinking about redoing the entire thing for fermi, because it just didn't fit with my script :/
11:56RSpliet: I wouldn't go that far
11:56karolherbst: RSpliet: I thought you meant a SEQ script from the trace
11:56RSpliet: yeah, one seq script for "reclock" whether it's up or down
12:04karolherbst: ä #
12:08Tom^: karolherbst: yea now im always on 07 :p
12:09Tom^: karolherbst: core: 253 , mem: 36, video: 0, pcie: 40 , and on 07, i do however saw it flicker so i suspect it went up a pstate but down again.
12:10karolherbst: Tom^: yeah well, depends on what you do
12:10karolherbst: if you vsync it should be not too high clcoked, because it only has to calculate 60 fps anyway
12:13Tom^: karolherbst: 5 glxspheres, the fifth one rat at like 3 mpixel :p
12:13Tom^: karolherbst: and after a while it all froze https://gist.github.com/0797e021cbc2a99ddab4
12:13karolherbst: Tom^: well you could also check the pstate file how high it clocks
12:13Tom^: i was, 07
12:14karolherbst: yeah there seems to be a bug in the request handling code :/
12:15karolherbst: Tom^: could you the next time boot with nouveau.debug=clk=trace
12:15Tom^: sure, i however have to go to bed :/
12:16Yoshimo: karol, what are you working on these days?
12:17karolherbst: reclocking stuff
12:19Tom^: karolherbst: clk=trace doesnt seem to change dmesg output
12:20karolherbst: it should :O
12:20Tom^: hehe having just one glxspheres open kinda makes it constantly flicker because it jumps between some cstate
12:21karolherbst: but dmesg should be filled with all kind of stuff
12:21Tom^: oh wait i typoed the module
12:23Tom^: karolherbst: https://gist.github.com/anonymous/f29193049b805184e7be
12:24karolherbst: better
12:24Tom^: il leave it at that, time for bed.
12:25karolherbst: yeah I did something wrong :D
12:56RSpliet: wow, this is an actual thing, isn't it: https://pbs.twimg.com/media/CYV07YpWYAA67Be.png ?
13:31tacchinotacchi: RSpliet: maybe it's only for intel graphics?
14:03mwk: RSpliet: nice one
14:03mwk: with a lock... fitting
14:37RSpliet: karolherbst: I pushed a mountain of compile fixes, but bear in mind
14:37RSpliet: - half of what you find in the GDDR5 script is wrong, old, stale or needs to be verified; use with caution and don't try to blindly run it
14:38RSpliet: - I most likely fucked up the voltage GPIO naming and conditions
14:38RSpliet: - none of it is tested on an actual machine; random broken-ness is plausible
14:40RSpliet: ... oh, hm. well, if anyone sees him again, tell him to check the IRC logs
14:41hakzsam: okay :à
14:41hakzsam: *:)
15:49chillfan: hm so i'm using the stable_reclocking_kepler branch.. just updated it and the kernel, missing the 'pstate' interface from sys
15:49chillfan: has it maybe moved?
15:51chillfan: ah, i have in dmesg: [ 118.259195] nouveau: unknown parameter 'pstate' ignored
15:54chillfan: brb
15:54hakzsam: chillfan, yeah, this has been recently removed
15:54hakzsam: https://github.com/karolherbst/nouveau/commit/bf55b04043389bb648236e6be7d643f60bc4dcb6
15:54imirkin_: chillfan: the file is in debugfs now, no need for the param
15:54hakzsam: btw, the wiki should be updated
15:54hakzsam: http://nouveau.freedesktop.org/wiki/KernelModuleParameters/
15:55imirkin_: yeah, that page needs some serious updates since the 4.3 rewrite where ben changed all the names... again =/
15:56chillfan_: hm still missing it, tried rebuilding
15:57imirkin_: chillfan_: it's gone. you don't need it anymore.
15:57chillfan_: oh, so erm, what's the right way to check pstate etc?
15:57imirkin_: it's in debugfs now
15:57imirkin_: /sys/kernel/debug/dri/0/pstate
15:58chillfan_: ah okay, i see it thanks
15:59chillfan: brb then, have a bug to report anyway :)
16:11chillfan: ok, it's basically a screen corruption problem, the issue happens either with or without compton enabled using vsync
16:12chillfan: will find somewhere to post the desktop recording
16:13chillfan: http://expirebox.com/download/f24cbfea91788a91f476b2c7b07fb42c.html .. graphics card is gtx 780ti, latest nouveau stable_reclocking_stable branch
16:13chillfan: kernel 4.4
16:13chillfan: from karols branch
16:18chillfan: also occured before update, just updated there to make sure it wasn't already fixed
16:19chillfan: brb
17:53mwk: well that was fun.
17:55mwk:managed to get a bit-perfect software implementation of G80's rcp instruction after lots of poking
17:56mwk: lovely squaring algorithm and likewise lovely fudge factor in the rounding step
17:56imirkin_: that sounds like it was fun
17:57imirkin_: i wonder if that's how it actually works
17:57mwk: it is
17:57imirkin_: or if you missed some simpler explanation
17:57mwk: there's an actual paper from nv describing how it works
17:57mwk: I just had to figure out the details
17:57imirkin_: ah :)
17:57mwk: http://pctuning.tyden.cz/ilustrace3/soucek/g80/paper-164.pdf
17:58mwk: they do mention an ugly fudge factor for rounding, but the square thing took me by surprise
17:59mwk: imagine a 17x17 squarer, where you discard the lower 17 bits [because you're operating on fractions]
17:59mwk: they mention doing hacking off lower input bits in the paper
18:00mwk: what they really do... take all partial products in the squarer that contribute less than 0.5 to the output, and discard them
18:01mwk: it works like that: https://github.com/envytools/envytools/blob/master/nvhw/sfu.c#L41
18:01mwk: imirkin_: it must be working exactly this way, changing any smallest detail results in lots of test failures
18:02mwk: fudge factor 0x47e7 is exactly correct, both 0x47e6 and 0x47e8 have dozens of failures in [1, 2) range
18:03imirkin_: i believe you :)
18:05mwk: now I just need the other 9 SFU functions
18:06mwk: I wonder if they ever changed the algorithms, I only checked on a G200 so far
19:26orbea: I got this trying to play a video with mpv, any ideas? http://dpaste.com/13MZ2CT
19:28orbea: hmm, it only happens when the video is in a .rar (Which mpv normally can play fine)
19:29orbea: and another video in a .rar works fine
19:30orbea: and I cant repeat it...
19:46imirkin: this scientific experiment isn't going the way i planned...
19:46imirkin: i might have to take matters into my own hands
19:47imirkin: my theory was that by disabling opt in st_glsl_to_tgsi i could speed up compiles
19:47imirkin: but... it's not panning out
19:49airlied: imirkin: didn't we discover disabling it caused bugs before?
19:50imirkin: airlied: yeah, but i figured out why
19:50imirkin: airlied: so i left in the critical bit
19:50imirkin: which is the last part of the dead code elimination
19:50imirkin: which just does a once-over the instructions and deletes totally dead ones
19:51airlied: maybe the overhead of converting tgsi->nv50 ir is > than the overhead of removing dead stuff
19:52imirkin: airlied: well, i haven't benchamrked it with *just* the dce thing
19:52imirkin: only larger blocks
19:52imirkin: and i found a bug in the GlobalCSE thing which took up most of the time i was going to futz with the various conversion logic
19:53imirkin: i did recover a lot of the perf loss by disabling some extra additional stuff... i'm also wondering if i can nuke the common optimizations
19:54imirkin: and i'm going to make LocalCSE not be O(n^2) anymore, maybe that'll help
19:57imirkin: airlied: there's also the various glsl opt that gets done which could be removed... i have things to play with :)
20:01imirkin: although one interesting tidbit is that taking overall timing stats, nouveau is 20% faster at compiling than i965
20:01imirkin: obviously a bunch of that is shared logic
20:58Guest11139: hm,i want to use nouveau,but it is not working,it is my xorg.log http://codepad.org/jw7z5b8f ,please help me .thanks I looked up a lot of information, but did not solve the problem,if u need more,please tell me ,thanks
21:02Guest11139: dmesg |grep nouveau http://codepad.org/vTDxpwFF
21:04imirkin: Guest11139: do you have any screens connected to the nvidia card?
21:06Guest11139: no
21:06imirkin: Guest11139: that's probably why X doesn't start on its own... what are you trying to achieve?
21:08Guest11139: i have two cards ,i use intel ..but nouveau is not work
21:15imirkin: Guest11139: i still have no clue what you're trying to achieve
23:15tiny001: 1
23:18tiny001: i have two videocards(intel & nvidia),intel working normal,nvidia (use nouveau)is not working ,it is my xorg.log http://codepad.org/AF5N0bs3 ,if u need more ,please tell me ,thanks
23:22towo^work: if you can't disable the igp in your firmware, you can't use the nvidia chip in that way
23:23towo^work: if you want the nvidia chip, you can use dri offloading for rendering programms on the nvidia chip
23:24tiny001: it is my kernel configure http://codepad.org/50jSNsA5
23:25imirkin: tiny001: you still haven't said what it is you're trying to do and what isn't working
23:25imirkin: try to be specific
23:27towo^work: tiny001, the kernel has not many to do with your situation
23:28tiny001: ok,,thanks,,i am a chinese,my english is poor,but i will try my best to specific
23:29tiny001: I don't know where to set the wrong,
23:30towo^work: you use a xorg.conf snipped, where you set nouveau as driver
23:30imirkin: tiny001: what is your goal? why are you trying to force the nouveau ddx to load in your xorg config? what are you hoping will happen?
23:30towo^work: this can't work in optimus setups, if you can't disable the IGP
23:31imirkin: tiny001: perhaps you might benefit from looking at http://nouveau.freedesktop.org/wiki/Optimus/
23:31imirkin: there's an automatic translation feature, no idea how well it works
23:32tiny001: if xorg.conf setting videocards=nouveau,not use intel ,it is not working
23:33tiny001: i hope just one nouveau ,it is to working
23:33imirkin: what would you expect would happen? you're telling it to use a chip which doesn't have any displays connected
23:34imirkin: as far as i can tell, everything's working as expected
23:34imirkin: read the wiki page i linked to.
23:35tiny001: i read it yet,but I can't fully understand
23:36imirkin: try using the translate widget that's built into it... it uses google translate
23:48tiny001: 1
23:50tiny001: I do not want to switch between the two, as long as one(just nouveau ) can use
23:53tiny001: i know that page can use google translate,but I don't know where to set the right after I finish reading it.