USB3FPGA: (1) More memory (2) Logic analyzer app (sump.org)

    Diese Seite verwendet Cookies. Durch die Nutzung unserer Seite erklären Sie sich damit einverstanden, dass wir Cookies setzen. Weitere Informationen

    • USB3FPGA: (1) More memory (2) Logic analyzer app (sump.org)

      Hello world,

      I have inherited an unfinished application where the USB3FPGA may be a good fit, subject to positive answers to a couple of questions, but I am new to FPGAs and VHDL so am looking to increase confidence before suggesting USB3FPGA to Management. This had been a "start from scratch" project but due to resource changes we are now looking at our options to buy rather than build in order to save time and decrease risk.

      The application isn't trivial but seems quite simple in FPGA terms, conceptually similar to a slightly cut down version of the Spartan-based logic analyzer at sump.org [1] which I discovered very recently (long after this project began). One difference is that the data to be recorded in any given location is based on the already recorded data from earlier in the run, ie each cycle to be recorded needs a read modify write (an increment) rather than just a write. Another difference is that there is some preprocessing to ignore certain incoming cycles. The external clock will be an external clock at around 40MHz. If you were to imagine this as a "memory access profiling" machine (to help identifying hotspots) you would be surprisingly close.

      So, the two questions:

      Q1: The USB3FPGA uses Cypress 7106 SRAM @ 10ns which (as a mostly-software person) I think should be up to the job of RMW @ 40MHz and indeed was the RAM picked earlier when we were intending this as a start-from-scratch project. Am I safe to assume RMW @ 40MHz? Is this easy to do? I've used the Xilinx tools and a DCM to get a four phase clock derived from our external one but haven't yet found a way to tell ISE how to drive the bus phases needed for the read then the write during one incoming clock cycle. The sump.org analyser uses a very different SRAM with separate ports for input and for output so offers nothing to be inspired by, and I haven't yet found any relevant Xilinx or similar app notes or code snippets.

      Q2: Many SRAM connections seem to come out to the expansion connector, but are shared with other IO. We would want to add extra SRAMs on a daughtercard (maybe 3 extra, ideally up to 7) and still be able to have 30 or so state inputs to the analyser. Again, at first glance this would seem to be possible, but have I understood correctly that all the necessary SRAM connections are available and we'd still have around 30 inputs for recording?

      Many thanks in advance,
      Vielen Dank,
      JohnW

      [1] sump.org/projects/analyzer/
    • Hello John,

      FPGA designs are synchronous, meaning FPGA internal memory elements and control signals are switching at reference clock edges. There is a fast 10 ns asynchronous external SRAM, thats right, but there are setup and hold timing requirements to meet with this 10 ns (see SRAM datasheet).
      Example using USB3FPGA 48 MHz system clock directly:
      1 clock cycle for address and data setup, 1 clock cycle write strobe or output enable strobe => SysClk/2 = 24 MHz => 16 bit data bus => 48 MBytes per sec
      Maybe 1 additional clock cycle is needed for databus turnaround, if changing read/write direction to avoid multiple drivers.

      For fully utilization of the fast SRAM access time and data throughput optimization, you could double system clock (using a DCM) locally in your memory controller module and then transfer data to original system clock domain using asynchronous FIFOs.

      The other thing is, that bandwidth not only depends on memory speed. Memory controller and data processing logic are normally adding latency and so decreasing data throughput.

      By the way, an USB3FPGA successor board is in development. A Spartan-6 with embedded memory controller and megabytes of dynamic RAM will be used.

      Best regards
      SF
      CESYS development engineer / FPGA design
    • Thanks for that.



      I'm still interested to know whether we can add extra CY7C1061 to the
      USB3FPGA, and whether we still have ~ 30 usable input pins if we do
      that.

      DRAM didn't look like an attractive option for this
      particular application though may be of interest later for further
      projects; details below.

      Perhaps it would be helpful if I were to simplify slightly.

      We don't need the RMW on memory (or the related arithmetic in the FPGA) till V2 of this project, so I'll stick to V1 for now.

      The
      short term requirement for V1 is that for each incoming address on the
      bus being observed, a value needs to be written to the analyser memory. Bus cycles are happening at 40MHz. The SRAM address which is to be used is the same
      as the incoming bus address, the value to be written depends on some of the
      incoming bus status bits. The incoming bus address is unpredictable (depends on
      program flow in the box being analysed) which afaik makes DRAM
      inapplicable because of the high latency to get to a new DRAM page;
      SRAM does not have this unpredictable latency.

      In V1 there is
      no dependence on previously written data. So at first sight it would
      seem that in each 40MHz cycle we have to do one complete SRAM write,
      with no intervening reads, no SRAM bus turnarounds, hopefully just the setup
      phase, then the write+hold phase, and then round again, as per the
      CY7C1061 datasheet. We don't necessarily even care about latencies or
      processing in the FPGA, so long as the write completes in time for the
      next one to arrive.

      When the recording run is complete, the
      intention is that the card moves to read mode, where again we go
      setup/read/setup/read etc. General applications want a proper memory
      controller but maybe this is so simple we don't need one as such, if we
      use a DCM so we've got a four phase 40MHz clock???? I happily admit
      this may be less than ideal practice, but occasionally there are
      reasons for exceptions.

      My perhaps over-simplified thinking for a write was based on the CY7C1061 timing diagrams and tables:
      Phase 0: assert SRAM address
      Phase 1: assert SRAM WE/CE/BHE/BLE and pre-calculated data
      Phase 2: (no change)
      Phase 3: deassert CE, tristate data

      However,
      although I've configured the DCM to do a 4 phase clock and I can see
      its outputs as expected on a scope (!), my limited VHDL skills don't
      yet reach to translating the trivial description above into
      synthesisable VHDL to drive the SRAM sequencing.

      Otherwise
      presumably I'll have to find a proper memory controller from somewhere
      (the book "FPGA Prototyping in VHDL" nearly has one but no one around
      here has the book) and I haven't yet found a Xilinx app note or similar

      Thanks again
      JohnW.
    • Hello John,

      you can find a somewhat minimized but regarding SRAM connections complete schematic diagram of USB3FPGA inside the CESYS download area. Here is the direct link:
      cesys.com/resources/ce028_schematics.pdf .

      On VG96 expansion connector 9 Data I/O and 13 Address lines of onboard SRAM CY7C1061 are availlable. To connect an external SRAM on the same "bus" - selecting the active device by asserting the corresponding #CE signal - some more connections on VG96 are needed: 7 Data I/O, 7 Address lines and the 5 control signals #BHE, #BLE, #WE, #OE and #CE. As all signals must be controllable by the FPGA, only I/O pins can be used, that is a total of 19 I/O for the first external SRAM and 1 I/O extra (#CE signal) for every extra SRAM device, as long as all SRAM buses are shared.

      As all auxiliary I/O on VG96 are connected to onboard SRAM, a total of 44 I/O and 15 Input pins stays availlable when using SRAM. Connecting one external SRAM as described above leaves 15 Input and 25 I/O pins for other purposes.

      Please keep in mind, that the SRAM connections an VG96 expansion connector are not intended to be a external SRAM bus. In fact FPGA I/Os should be made availlable for people not using SRAM but in the need of some extra I/O pins. You can still use these connections together with extra I/O to drive external SRAM devices, but with adding more and more devices to the same data/address buses you might introduce timing issues.

      Please also keep in mind that USB3FPGA is one of our last boards not compatible to our UDK framework. As we intend to use UDK as the standard for all our newer boards there will not be as good a software support as for example for the forthcoming Spartan-6 board. Perhaps USBV4F is what you need: Virtex-4™ XC4VLX25-10, 8 MByte SRAM, 206 I/O signals on expansion connector (various user definable I/O standards), USB 2.0 and UDK software with examples. If you need a greater amount of boards (about 20), perhaps it is best, we do a special adaptation of one of our UDK compatible FPGA boards that answers all your needs, for example more SRAM. Please feel free to request a quote at cesys.com .

      Best regards,
      Michael
      Michael Hufnagel
      [Dipl.-Ing. (Univ.) Elektrotechnik]

      CESYS Gesellschaft für angewandte Mikroelektronik mbH
      Hardware- Entwicklung und Validierung
    • Hello and thank you Michael.

      That sounds reasonably promising for the extra memory. This is a short term low volume need, initially for a proof of concept, it is accepted that SRAM on the expansion bus is less than ideal, but we are not at this stage looking for a production solution, we are looking for something that is a reasonable fit without having to do much hardware engineering (a simple daughtercard with some SRAMs is probably OK, a complete board from scratch would not be OK at the moment). We wouldn't be worrying about the lack of UDK compatibility on the USB3FPGA, but my bosses would worry about the price of the USBV4F ;)

      Where this leads, assuming a successful proof of concept, remains to be seen. Our perhaps rather odd requirement for truly random memory access means that (afaik) we need SRAM rather than DRAM. Fortunately we don't need DRAM-style quantities of memory (maybe 16MB organised as longwords?), but obviously this much SRAM conflicts with management's inevitable requirement for low price. Things we don't need (yet) are lots of high performance IO and compute power. Probably an unusual combination? The potential follow on volumes are currently unclear but will likely be in double figures rather than more (this is an in-house tool rather than something to sell on), so something based on an existing design may be a good idea all round.

      Thoughts/pointers re working sample VHDL code for SRAM access still most welcome.

      Vielen dank,
      John